Well lets start with the name, the meaning is Not Only SQL, so trying to explain it in one phrase is simple: "An attempt to handle big amounts of data using non-relational solutions", a better name will be "Not only relational"
So let's be more specific & a bit more formal, what are we trying to solve? What are the problems RDBMS today cannot help?
Todays
- Relational DBs Cannot Handle Web-Scale, the amounts of data are just too big for them, try to imagine how many posts Tweeter handles every day? Facebook? How many site google scans & index every day?
- They are not distributed, they were never designed to be, thus they are not fault tolerance, you usually have several DB servers, if one goes down – you're in BIG trouble.
- RDBMS gives use ACID operations:
o
Atomic – All of the
work in a transaction completes (commit) or none of it completes.
o
Consistent – A
transaction transforms the database from one consistent state to another
consistent state. Consistency is defined in terms of constraints.
o
Isolated – The
results of any changes made during a transaction are not visible until the
transaction has committed.
o
Durable – The
results of a committed transaction survive failures
That is great, but what if we don't must have those? What if most of our
CRUD operations are Read? Then we don't really benefit from ACID (unless if
you're a developer lives in Goa J)
BASE & CAP
The CAP theorem states that a distributed computer system cannot
guarantee all of the following three properties at the same time:
- Consistency - all nodes see the same data at the same time
- Availability - a guarantee that every request receives a response about whether it was successful or failed
- Partition tolerance - the system continues to operate despite arbitrary message loss or failure of part of the system
Basically available indicates
that the system does guarantee availability, in terms of the CAP
theorem.
Soft state indicates
that the state of the system may change over time, even without input. This is
because of the eventual consistency model.
Eventual consistency indicates
that the system will become consistent over time, given that the system doesn't
receive input during that time.
NoSQL DataBase System characteristics
- Scalable replication and distribution - potentially thousands of machines distributed around the world
- Queries need to return answers quickly
- Mostly query, few updates
- Asynchronous Inserts & Updates
- NoSQL does not use SQL as its query language.
- Do not necessarily follow a fixed schema.
- NoSQL cannot necessarily give full ACID guarantees instead it gives us BASE.
- NoSQL has a distributed, fault-tolerant architecture.
- Open source development
Wow… now that we've passed that boring definition part, let's dive into the
goodies: types & implementations
There are many types of NoSQL DB's, let's talk about 4 of the most
common ones:
- Column Store (Tabular) – Each storage block contains data from only one column
- Document Store – stores documents made up of tagged elements
- Key-Value Store – Hash table of keys
- Graph DB - designed for data whose relations are well represented as a graph.
Column Store
A column-oriented DBMS stores data tables as sections of columns of
data rather than as rows of data, like most relational DBMSs. This
has advantages for data warehouses, CRM systems, and library card
catalogs, and other ad-hoc inquiry systems where aggregates are
computed over large numbers of similar data items.
A relational database management system must show its data as
two-dimensional tables, of columns and rows, but store it as one-dimensional
strings. For example, a database might have this table.
EmpId
|
Lastname
|
Firstname
|
Salary
|
1
|
Smith
|
Joe
|
40000
|
2
|
Jones
|
Mary
|
50000
|
3
|
Johnson
|
Cathy
|
44000
|
This table exists in the computer's memory (RAM) and storage (hard drive), A row-oriented database serializes all of the values in a row together, then the values in the next row, and so on.
1,Smith,Joe,40000;
2,Jones,Mary,50000;
3,Johnson,Cathy,44000;
A column-oriented database serializes all of the values of a column together, then the values of the next column, and so on.
1,2,3;
Smith,Jones,Johnson;
Joe,Mary,Cathy;
40000,50000,44000;
1. Column-oriented organizations are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns of data, because reading that smaller subset of data can be faster than reading all data.
2.
Column-oriented
organizations are more efficient when new values of a column are supplied for
all rows at once, because that column data can be written efficiently and
replace old column data without touching any other columns for the rows.
3.
Row-oriented organizations
are more efficient when many columns of a single row are required at the same
time, and when row-size is relatively small, as the entire row can be retrieved
with a single disk seek.
4.
Row-oriented organizations
are more efficient when writing a new row if all of the column data is supplied
at the same time, as the entire row can be written with a single disk seek.
Column store examples: Apache HBase, Google BigTable
Document Store
This kind of DB's store the "document" itself, where each
document-oriented database implementation differs on the details of this
definition, but in general, they all assume that documents are encoded in some
standard formats like XML, YAML, JSON, and BSON, as well as binary forms like
PDF and Microsoft Office documents (MS Word, Excel, and so on).
There are different ways to organize those document in the DB
- Collections
- Tags
- Non-visible Metadata
- Directory hierarchies
Document Store characteristics:
- Documents in a collection may have fields that are completely different.
- Documents are addressed in the database via a unique key that represents that document.
- Beyond the simple key-document (or key–value) lookup that you can use to retrieve a document, the database will offer an API or query language that will allow retrieval of documents based on their contents.
{
"_id":
"guid goes here",
"_rev": "314159",
"type": "abstract",
"author": "Keith W. Hare"
"title": "SQL Standard and NoSQL Databases",
"body": "NoSQL databases (either no-SQL or Not Only SQL)
are currently a hot topic in some parts of
computing.",
"creation_timestamp": "2011/05/10 13:30:00 +0004"
}
Document Store examples: MongoDB, Apache CouchDB, Oracle NoSql DB
Key-Value Store
Key–value stores allow the application to store its data in a
schema-less way. The data could be stored in a datatype of a programming
language or an object. Because of this, there is no need for a fixed data model.The
following types exist:
It is a single table with two columns: one being the (Primary) Key, and
the second thing being the Value. And that's it, that's all the NoSQL magic.
user3371_color Blue
user4344_color Brackish
user1923_height 6' 0"
user3371_age 34
error_msg_457 There is
no file %1
here
error_message_1 There is
no user with %1 name
1923_name Jim
user1923_name Jim Smith
user1923_lname Smith
Application_Installed true
log_errors 1
install_path C:\Windows\System32\Restricted
ServerName localhost
test test
test1 test
test123 Brackish
Key-Value Store examples: Apache Cassandra, Oracle Coherence,
FreeBase
Graph DB
This kind of database is designed for data whose relations are well
represented as a graph (elements interconnected with an undetermined number of
relations between them). The kind of data could be social relations, public
transport links, road maps or network topologies, for example.
One of the common language to use that kind of DB is SPARQL
PREFIX
abc: <http://example.com/exampleOntology#>
SELECT
?capital ?country
WHERE
{
?x abc:cityname ?capital ;
abc:isCapitalOf ?y .
?y abc:countryname ?country ;
abc:isInContinent abc:Africa .
}
Graph DB examples: Neo4j, IBM DB2, AllegroGraph
The Big Picture
This post was just a glimpse from the big world
of big data:
References:
- Wikipedia
- “Scalable
SQL”, ACM Queue, Michael Rys, April 19, 2011
- http://queue.acm.org/detail.cfm?id=1971597
- “a
practical guide to noSQL”, Posted by Denise Miura on March 17, 2011 at http://blogs.marklogic.com/2011/03/17/a-practical-guide-to-nosql/
- NoSQL
News websites: http://nosql.mypopescu.com, http://www.nosqldatabases.com
- http://dba.stackexchange.com/questions/607/what-is-a-key-value-store-database
- http://stackoverflow.com/questions/3342497/explanation-of-base-terminology
- http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772
Excellent review! I'm forwarding it to my colleagues!
ReplyDelete