5. “Big Data” Definition
HADOOP
“Big data is high volume, high velocity, and/or
high variety information assets that require new
forms of processing to enable enhanced decision
making, insight discovery and process optimization”
2012, Gartner
6. Problem & Solutions
HADOOP
Storage
• HDFS, NoSql DB, google file system, big
table.
Data Processing
• MapReduce, Stream processing - storm,
Dremel – Drill.
7. Big Data Examples
HADOOP
Twitter
- Data in 2010: “1 trillion tweets. Today, we are seeing
50 million tweets per day”.
- Platform: Hadoop, Pig, Protocol Buffers
- Type of applications: Analysis, People search …
See full: http://goo.gl/y7rEw7
8. Big Data Examples
HADOOP
Facebook data warehouse
- Data: 200GB per day in March 2008.
12+TB(compressed) raw data per day in 2010.
- Platform: Hadoop, Hive
- Type of applications: Reporting, Analysis, Machine
learning…
See full: http://goo.gl/XUHD9k
12. HDFS High Level
HADOOP
HDFS is a distributed file system designed for storing
very large files with streaming data access patterns,
running on clusters of commodity hardware
14. HDFS Block
HADOOP
In file system, a block is the minimum amount of data that it
can read or write.
Local File system | HDFS
normally 512 bytes | 64 MB by default
16. HDFS Archives
HADOOP
Hadoop Archives (HAR) are a file archiving facility that
packs files into HDFS blocks more efficiently, thereby
reducing namenode memory usage.
19. MapReduce
HADOOP
Programming model and an associated implementation for
processing and generating large data sets that hides the
messy details of parallelization, fault-tolerance, data
distribution and load balancing.