Basics of NoSQL

NoSQL eliminates the need of a schema, hence pushing the data handling capacity by a huge margin by compromising on ACID. However, with accurate supporting applications in place, NoSQL provide a good combination of high data handling capacity with good accuracy. In this article we will discuss popular NoSQL databases.
What is CAP Theorem?
CAP theorem states that at best we can aim for two of three following properties. CAP stands for:

  • Consistency – This means that data in the database remains consistent after the execution of an operation.
  • Availability – This means that the database system is always on to ensure availability.
  • Partition Tolerance – This means that the system continues to function even if the transfer of information amongst the servers is unreliable.

The various databases and their relations with CAP theorem is shown below:

Following are some common features of NoSQL databases :

1. They are not relational

2. Mostly open-source

3. They are all cluster-friendly

4. They are schema less

5. They emerged out of 21st century Web world

However, we often refer to NoSQL as schema less, this does not mean that these database do not adhere to any kind of schema. For example consider the following NoSQL expression :

Tab1[“Revenue” ] * Tab1[“Total count”]

NoSQL has an implicit schema, which might not be constant throughout the database. Such thing is both a boon and a curse. The base thing is whenever we want to modify a field we need to understand this implicit schema. The good thing is that with a changing schema it asks for much lesser effort to append this database compared to RDBMS. Also RDBMS is not great with a distributed network which is not the case with NoSQL.

Types of NoSQL databases

However, in literature NoSQL has been broken down into 4 major types, I found a very interesting way suggested by Martin Fowler to categorize NoSQL. Based on the way NoSQL stores data, it is primarily of two types :

1. Aggregate based Database

2. Graph based Database

The primary difference between the two is that in aggregate type, database tries to store all the information for a particular ID (this can be an individual or transaction or product etc.) as a single object. Whereas graph type follows the exact opposite philosophy. Graph type database tries to cut the data into highly granular information and stores them with all the shared relations or edges. We will discuss the aggregate based databases, which are more common today, in this article.

Column-oriented Databases :

Imagine that you have a RDBMS orders table with 1 Million rows and 100 columns. Now you want to pull all the customer names with orders of more than $500. You essentially need command on only two columns, but to do this query you will essentially have to browse through all 100 columns. Column oriented databases give a solution to this problem. A detailed discussion on this type of database is out of scope for this article, but what you need to understand is just the underlying concept.

Please share you views/opinion/comments with us in the comments section below.

Saurav has an extensive experience concentrated on developing Big Data, IoT , NLP , Machine learning API’s and solutions including large scale data lake systems. He is also experienced in Data Science and Business Intelligence roles across industries. Feel free to reach out to him.
Free WordPress Themes, Free Android Games