SlideShare ist ein Scribd-Unternehmen logo
1 von 5
Downloaden Sie, um offline zu lesen
© VH Education Services Pvt. Ltd.
http://venturehire.co
Tools for Hadoop and NoSQL
In our previous blogs, we defined Big Data and Big Data Analytics. In this we are going to
discuss the tools being used for solving big data from technology standpoint – Hadoop
(HDFS, MapReduce) which is an open source computing framework and NoSQL which is
non-relational database.
Big Data Technologies and Tools
HADOOP
High-availability distributed object-oriented platform or “Hadoop” is a software
framework which analyze structured and unstructured data and distribute applications on
different servers. Below is an overall Hadoop architecture-
Source: Cisco
Basic Application of Hadoop
Hadoop is used in maintaining, scaling, error handling, self healing and securing large
scale of data. These data can be structured or unstructured. What I mean to say that if the
data is large then, traditional systems are unable to handle and finally Hadoop comes in
the picture. Below are some basic features of Hadoop -
© VH Education Services Pvt. Ltd.
http://venturehire.co
 Hadoop maintains and secures the data by storing and keeping its replica.
 It is focused on scaling according to data usage.
 It can detect and delete the failed task and as well as failed transaction of data.
 It not only recovers the data but also automatically restores the data at its place.
Typical Hadoop Platform Stack – HDFS + Hive + HBase + Pig
HDFS stands for Hadoop Distributed File System – is part of Hadoop and known as a
special file system which deals with distribution and storage of large sets of data. HDFS
stores file as sequence of same size of block except the last block. It also deals with
hardware failure and smoothen the data handling.
Hive – Hive was initiated by Facebook. Hive is data warehouse tool which is based on
Hadoop and converts query language into MapReduce jobs. It deals with the storage ,
analysis and queries of large set of data. Query language in hive used as HQL statement.
Hive Query Language is similar to standard SQL statement.
Hbase – Hbase is a Hadoop application which runs on top of HDFS. Hbase system
represents set of table but Hbase is column oriented database management system i.e.
different from the row oriented database management system. Generally if we talk about
database then we think of relational database system but unlikely Hbase is not relational
database at all and also it doesn’t support Structured Query Language like SQL. Java is
prefered language use for Hbase application. One most important feature of Hbase is to
real time read or write to large set of data.
Pig – initiated by Yahoo and in 2007 it became open source. Do you know why it is named
as Pig? It is called as Pig because it can handle any type of data!! strange but true. Pig is
high level procedural programming platform developed for simplifying large data sets
query in Hadoop and MapReduce. Pig have two component one is PigLatin which is
programming language and other is run time environment where PigLatin programs are
executed.
Advantage and Disadvantage of Hadoop:
© VH Education Services Pvt. Ltd.
http://venturehire.co
NoSQL
As the term says NoSQL, it means non relational or Non-SQL database, refer to Hbase,
Cassandra, MongoDb, Riak, CouchDB. It is not based on table formats and that’s the
reason we don’t use SQL for data access. A traditional database deals with structured data
while a relational database deals with the vertical as well as horizontal storage system.
NoSQL deals with the unstructured, unpredictable kind of data according to the system
requirement.
NoSQL Technologies HBase (part of the Hadoop
ecosystem),Cassandra, MongoDB, Riak, CouchDB.
Cassandra database is used to handle the large set of data when we need to scale the
database with high performance. Cassandra deals with the fault tolerance and replication
of the data. With this we can go deeper in columns, supercolumns and more. It is a partial
relational database system, supports best query capability but don’t have joins feature. It
follows the column family model map with two dimensional and 3 dimensional. 2D model
includes column family with some column in it, while 3D model created by associating
super column in column family.
© VH Education Services Pvt. Ltd.
http://venturehire.co
MongoDB is an agile NoSQL document database, unlike the traditional database which
store the data in rows and column, MongoDB stores the document data in binary form of
JSON document which is also known as BSON format. It is used for high scalability,
availability and performance. In MongoDB dynamic schemas are the unit of database,
which found in document where set of documents are found in collection while set of
collection makes the database.
Riak is open source NoSQL database system which is designed for availability, fault
tolerance, scalability and high performance. It provides three kind of storage key/value
store, document oriented store and web shaped store. It also store document in the JSON
format. when we talk about data modeling then we will see that there is no ‘Master’, only
nodes are there. All nodes are same and don’t have different responsibility.
CouchDB is open source NoSQL database ,distributed, and schemaless.It stores the
document data in the JSON format. It also provide feature related with web like access of
document from the web browser through HTTP. Javascript can also be use to modify the
document. In CouchDB document is combination of strings “keys” and “values”.
Advantage and Disadvantage of NoSQL:
Tools from Companies – Cassandra, Riak, Redis, HBase, Oracle, membase, mongoDB.
© VH Education Services Pvt. Ltd.
http://venturehire.co
Related Article:
1. What is Big Data and How Fast is it Growing?
2. Why Big Data is Next Big Thing?
Links which can be useful to you regarding this topic are
Big Data Training Course in Bangalore
Big Data Analytics Training in Bangalore

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Kürzlich hochgeladen (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Empfohlen

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Empfohlen (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Tools for hadoop and no sql

  • 1. © VH Education Services Pvt. Ltd. http://venturehire.co Tools for Hadoop and NoSQL In our previous blogs, we defined Big Data and Big Data Analytics. In this we are going to discuss the tools being used for solving big data from technology standpoint – Hadoop (HDFS, MapReduce) which is an open source computing framework and NoSQL which is non-relational database. Big Data Technologies and Tools HADOOP High-availability distributed object-oriented platform or “Hadoop” is a software framework which analyze structured and unstructured data and distribute applications on different servers. Below is an overall Hadoop architecture- Source: Cisco Basic Application of Hadoop Hadoop is used in maintaining, scaling, error handling, self healing and securing large scale of data. These data can be structured or unstructured. What I mean to say that if the data is large then, traditional systems are unable to handle and finally Hadoop comes in the picture. Below are some basic features of Hadoop -
  • 2. © VH Education Services Pvt. Ltd. http://venturehire.co  Hadoop maintains and secures the data by storing and keeping its replica.  It is focused on scaling according to data usage.  It can detect and delete the failed task and as well as failed transaction of data.  It not only recovers the data but also automatically restores the data at its place. Typical Hadoop Platform Stack – HDFS + Hive + HBase + Pig HDFS stands for Hadoop Distributed File System – is part of Hadoop and known as a special file system which deals with distribution and storage of large sets of data. HDFS stores file as sequence of same size of block except the last block. It also deals with hardware failure and smoothen the data handling. Hive – Hive was initiated by Facebook. Hive is data warehouse tool which is based on Hadoop and converts query language into MapReduce jobs. It deals with the storage , analysis and queries of large set of data. Query language in hive used as HQL statement. Hive Query Language is similar to standard SQL statement. Hbase – Hbase is a Hadoop application which runs on top of HDFS. Hbase system represents set of table but Hbase is column oriented database management system i.e. different from the row oriented database management system. Generally if we talk about database then we think of relational database system but unlikely Hbase is not relational database at all and also it doesn’t support Structured Query Language like SQL. Java is prefered language use for Hbase application. One most important feature of Hbase is to real time read or write to large set of data. Pig – initiated by Yahoo and in 2007 it became open source. Do you know why it is named as Pig? It is called as Pig because it can handle any type of data!! strange but true. Pig is high level procedural programming platform developed for simplifying large data sets query in Hadoop and MapReduce. Pig have two component one is PigLatin which is programming language and other is run time environment where PigLatin programs are executed. Advantage and Disadvantage of Hadoop:
  • 3. © VH Education Services Pvt. Ltd. http://venturehire.co NoSQL As the term says NoSQL, it means non relational or Non-SQL database, refer to Hbase, Cassandra, MongoDb, Riak, CouchDB. It is not based on table formats and that’s the reason we don’t use SQL for data access. A traditional database deals with structured data while a relational database deals with the vertical as well as horizontal storage system. NoSQL deals with the unstructured, unpredictable kind of data according to the system requirement. NoSQL Technologies HBase (part of the Hadoop ecosystem),Cassandra, MongoDB, Riak, CouchDB. Cassandra database is used to handle the large set of data when we need to scale the database with high performance. Cassandra deals with the fault tolerance and replication of the data. With this we can go deeper in columns, supercolumns and more. It is a partial relational database system, supports best query capability but don’t have joins feature. It follows the column family model map with two dimensional and 3 dimensional. 2D model includes column family with some column in it, while 3D model created by associating super column in column family.
  • 4. © VH Education Services Pvt. Ltd. http://venturehire.co MongoDB is an agile NoSQL document database, unlike the traditional database which store the data in rows and column, MongoDB stores the document data in binary form of JSON document which is also known as BSON format. It is used for high scalability, availability and performance. In MongoDB dynamic schemas are the unit of database, which found in document where set of documents are found in collection while set of collection makes the database. Riak is open source NoSQL database system which is designed for availability, fault tolerance, scalability and high performance. It provides three kind of storage key/value store, document oriented store and web shaped store. It also store document in the JSON format. when we talk about data modeling then we will see that there is no ‘Master’, only nodes are there. All nodes are same and don’t have different responsibility. CouchDB is open source NoSQL database ,distributed, and schemaless.It stores the document data in the JSON format. It also provide feature related with web like access of document from the web browser through HTTP. Javascript can also be use to modify the document. In CouchDB document is combination of strings “keys” and “values”. Advantage and Disadvantage of NoSQL: Tools from Companies – Cassandra, Riak, Redis, HBase, Oracle, membase, mongoDB.
  • 5. © VH Education Services Pvt. Ltd. http://venturehire.co Related Article: 1. What is Big Data and How Fast is it Growing? 2. Why Big Data is Next Big Thing? Links which can be useful to you regarding this topic are Big Data Training Course in Bangalore Big Data Analytics Training in Bangalore