3. Brief Explanation
Bigdata (Horton Works HDP-2.0)
Big Data is nothing but collection of data, data sets which
is of unstructured data .
Big Data is useful under the large growing data where it is
unable to manage.
Hadoop V2.0 is for running mapreduce job of the particular
task.
Hadoop V2.0 consists of Mapreduce ,Yarn and HDFS .
Above mentioned three components are important in
Hadoop.
4. Continue……
HDFS:
Hadoop Distributed file system.(HDFS).
Handles large data with streaming data Access.
Runs on top of all file system.
Uses Blocks to store files.
Mapreduce:
Frame work for performing calculations on data in HDFS.
Map&Reduce Function.
YARN:
Distributed Data Processing.
Resource and Scheduler Manager.
5. Machine Learning Language
Python V2.7.6 Modules on Dev,staging and
Production.
PIP-1.5.4
NLTK-2.0.4
Setup tools-3.3
Easy install-2.7
Numpy-1.8.1
pyYaml-3.11
Mrjob-0.4.2
6. Cloud
Cloud Components for Staging & Production.
Centos V6.4 Instance .
Bucket for Storage of files.
WordPress Blog with Version 3.8.1.
JQuery on WordPress.
Visualization on instance.
.pem file for connecting Cloud from Local machine.
.ppk file for moving data from Local system to cloud
through FileZilla.
Public Ip (Elastic Ip for the Instance).
8. MongoDB(Database)
Description of Database:
Handles structured, unstructured and polymorphic.
NOSQL.....
Scale up with Bigdata.
MongoHQ for MongoDB server.
Backup & Restore Data from DB.