Monday, 15 May 2017

NoSQL databases

RDBMS vs NoSQL


RDBMS :

  • Structured and organized data 
  • Structured query language (SQL) 
  • Data and its relationships are stored in separate tables. 
  • Data Manipulation Language, Data Definition Language 
  • Tight Consistency


NoSQL 

  • Stands for Not Only SQL
  • No declarative query language
  • No predefined schema 
  • Key-Value pair storage, Column Store, Document Store, Graph databases
  • Eventual consistency rather ACID property 
  • Unstructured and unpredictable data
  • CAP Theorem 
  • Prioritizes high performance, high availability and scalability
  • BASE Transaction


Brief history of NoSQL


  • The term NoSQL was coined by Carlo Strozzi in the year 1998. He used this term to name his Open Source, Light Weight, Data Base which did not have an SQL interface.


CAP Theorem (Brewer’s Theorem):

  • Consistency - This means that the data in the database remains consistent after the execution of an operation. For example after an update operation all clients see the same data.
  • Availability - This means that the system is always on (service guarantee availability), no downtime.
  • Partition Tolerance - This means that the system continues to function even the communication among the servers is unreliable, i.e. the servers may be partitioned into multiple groups that cannot communicate with one another.



NoSQL pros(advantages)/cons(dis-advantages):

Advantages :


  • High scalability
  • Distributed Computing
  • Lower cost
  • Schema flexibility, semi-structure data
  • No complicated Relationships


Disadvantages:


  • No standardization
  • Limited query capabilities (so far)
  • Eventual consistent is not intuitive to program 


The BASE

The CAP theorem states that a distributed computer system cannot guarantee all of the following three properties at the same time:


  • Consistency
  • Availability
  • Partition tolerance
  • A BASE system gives up on consistency.
  • Basically Available indicates that the system does guarantee availability, in terms of the CAP theorem.
  • Soft state indicates that the state of the system may change over time, even without input. This is because of the eventual consistency model.
  • Eventual consistency indicates that the system will become consistent over time, given that the system doesn't receive input during that time.


NoSQL Categories:


  • Key-value stores
  • Column-oriented
  • Graph
  • Document oriented


Key-value stores


  • Key-value stores are most basic types of NoSQL databases.
  • Designed to handle huge amounts of data.
  • Based on Amazon’s Dynamo paper.
  • Key value stores allow developer to store schema-less data.
  • In the key-value storage, database stores data as hash table where each key is unique and the value can be string, JSON, BLOB (Binary Large OBjec) etc.
  • A key may be strings, hashes, lists, sets, sorted sets and values are stored against these keys.
  • For example a key-value pair might consist of a key like "Name" that is associated with a value like "Robin".
  • Key-Value stores can be used as collections, dictionaries, associative arrays etc.
  • Key-Value stores follow the 'Availability' and 'Partition' aspects of CAP theorem.
  • Key-Values stores would work well for shopping cart contents, or individual values like color schemes, a landing page URI, or a default account number.



  • Example of Key-value store Data Base : Redis, Dynamo, Riak. etc....





Column-oriented databases:

  • Column-oriented databases primarily work on columns and every column is treated individually.
  • Values of a single column are stored contiguously.
  • Column stores data in column specific files.
  • In Column stores, query processors work on columns too.
  • All data within each column data file have the same type which makes it ideal for compression.
  • Column stores can improve the performance of queries as it can access specific column data.
  • High performance on aggregation queries (e.g. COUNT, SUM, AVG, MIN, MAX).
  • Works on data warehouses and business intelligence, customer relationship management (CRM), Library card catalogs etc.

  • Example of Column-oriented databases : BigTable, Cassandra, SimpleDB etc.


Graph databases

  • A graph data structure consists of a finite (and possibly mutable) set of ordered pairs, called edges or arcs, of certain entities called nodes or vertices.
  • The following picture presents a labeled graph of 6 vertices and 7 edges.



What is a Graph Databases?

<ul class="w3r_list">
  • A graph database stores data in a graph.
  • It is capable of elegantly representing any kind of data in a highly accessible way.
  • A graph database is a collection of nodes and edges.
  • Each node represents an entity (such as a student or business) and each edge represents a connection or relationship between two nodes.
  • Every node and edge are defined by a unique identifier.
  • Each node knows its adjacent nodes.
  • As the number of nodes increases, the cost of a local step (or hop) remains the same.
  • Index for lookups.
</ul>

  • Example of Graph databases : OrientDB, Neo4J, Titan.etc.


Document Oriented databases

  • A collection of documents
  • Data in this model is stored inside documents.
  • A document is a key value collection where the key allows access to its value.
  • Documents are not typically forced to have a schema and therefore are flexible and easy to change.
  • Documents are stored into collections in order to group different kinds of data.
  • Documents can contain many different key-value pairs, or key-array pairs, or even nested documents.

  • Example of Document Oriented databases : MongoDB, CouchDB etc.


There is a large number of companies using NoSQL. To name a few :

  • Google
  • Facebook
  • Mozilla
  • Adobe
  • Foursquare
  • LinkedIn
  • Digg
  • McGraw-Hill Education
  • Vermont Public Radio

No comments:

Post a Comment