Big Data with Database NoSQL: a useful introduction!

April 23, 2015 by Daniele Turcato

What is Big Data?

The word “Big Data” has become very popular but its real meaning is often confused. Basically we can say that the word “Big Data” is related to the way to use big volumes of data in constant increase using brand new analysis, storage and collection tools.

E.G
All the data in Facebook, Twitter and other Social Network or the data-analysis of an airplane manage system…

Which solutions can analyze and process all these data?

In this article we’re going to explore some simple tool installed on a virtual server or on a hosting server. They are the NoSQL Database.

There are many companies all around the world which offers storage solutions and Rackone is an Italian company which offers high-quality NoSQL services to manage Big Data.

What is NoSQL?

The word “database” has been used - since a long time ago – as a “SQL” and “RDBMS” synonymous, and in the past 40 years, it seemed that there were no different options.
Recently the data storage world turn to new interesting option called NoSQL.

NoSQL is the acronym of “Not Only SQL” to emphasize the compatibility between the “Structure Query Language” (SQL) and the NoSQL, in fact we can use the same SQL language to query a NoSQL database paying attention to some essential limitation like the JOIN.

NoSQL database were initially in-house solutions created to solve problems experienced by large companies such as Google, Amazon and Facebook. These companies discovered that with the SQL technologies they couldn’t solve three key point:

• Massive business data
• Necessity of low ping for data access
• An almost perfect service in a highly unpredictable trade sector.

Why Use NoSQL area?

NoSQL is not suggested for a RDBMS database replacement but can help them in different functions or in part replace them, but sometimes the NoSQL technology became essential.

What are NoSQL technology benefits?

Data representation without scheme
Almost every NoSQL implementation doesn’t require a complex scheme to define a structure and is perfect for changes over time like the JSON representation.

Develop time
NoSQL technology tighten up the develop time due to the usage of simple SQL query to extract structured data. In some cases there are data ready for a JSON usage.

Speed
NoSQL databases give back data very quickly compared to a traditional database. Due to this data speed NoSQL database are used for web and mobile applications.

How can I start now?

Big Data is often correlate to Apache Hadoop; in this article we’re going to explore only NoSQL databases, easier to implement on a dedicated server or on a VPS machine.

NoSQL databases are splitted in four categories:

• Key-Valued Stores
• Column Family Stores
• Document Databases
• Graph Databases

Key-Valued Stores

This category allows to quickly storage and restore couple of key and value, infact this is the only structure allowed.

These databases are the highest performance and easiest to implement because of their easy structure and they appear as the more suitable when the data access is based on keys.

The most popular key-valued database are Redis, Voldemort, Tokyo and Amazon Dynamo. Amazon uses Dynamo for the “cart” management.

Column Family Stores

This databases family storages all the search data on column and are designed to manage large quantity of data allocate over multiple servers.

The most famous column family is Cassandra, initially developed in Facebook became part of Apache Software Foundation project.

This is a typical column structure:

The most popular and common used column family database are Cassandra, HBase and Riak. Big Data powered by Google is part of column family but is for Google usage only.

Column Family Stores databases are used on:

Google Earth, Maps
The New York Times
eBay
Twitter
Facebook (inbox mail only).

Document Databases

Document Databases manage data in a documental semi-structured way.
In these databases each data record can be different. The application (and not the database) must detect the complex structure of the data.

The data can be obtained in JSON format and can contain document below.

This kind of database is suggested for different data type storage or for applications objects storage; moreover they provide some high fidelity functions: Sharding Indexing, geo-space data storage etc.

The most used Document Databases are MongoDB and CouchDB.

Document Databases are used on:

LinkedIn
Dropbox Mailbox
Craigslist
The New York Times (photo manage)
eBay
Leroy Merlin Italia

Graph Databases

A Graph Database uses nodes and arcs to store information.

They are used for manage relations between different objects.

Examples of Graph Databases are Neo4j, Infinite Graph and InfoGrid.

Graph databases are used on Social Network.

Conclusions

We discovered that everybody can use a Big Data; there are Open Source tools usable on dedicated server and on virtual server. Stay tuned because we’re going to explain MongoDb installation and test it.