Apache Hadoop project, and the Hadoop ecosystem has been designed be extremely flexible, and extensible. HDFS, Yarn, and MapReduce combined have more that 1000 configuration parameters that allow users to tune performance of Hadoop applications, and more importantly, extend Hadoop with application-specific functionality, without having to modify any of the core Hadoop code.
In this talk, I will start with simple extensions, such as writing a new InputFormat to efficiently process video files. I will provide with some extensions that boost application performance, such as optimized compression codecs, and pluggable shuffle implementations. With refactoring of MapReduce framework, and emergence of YARN, as a generic resource manager for Hadoop, one can extend Hadoop further by implementing new computation paradigms.
I will discuss one such computation framework, that allows Message Passing applications to run in the Hadoop cluster alongside MapReduce. I will conclude by outlining some of our ongoing work, that extends HDFS, by removing namespace limitations of the current Namenode implementation.
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Extending Hadoop for Fun & Profit
1. Extending Hadoop for
Fun & Profit
Milind Bhandarkar
Chief Scientist, Pivotal Software,
(Twitter: @techmilind)
2. About Me
• http://www.linkedin.com/in/milindb
• Founding member of Hadoop team atYahoo! [2005-2010]
• Contributor to Apache Hadoop since v0.1
• Built and led Grid SolutionsTeam atYahoo! [2007-2010]
• Parallel Programming Paradigms [1989-today] (PhD cs.illinois.edu)
• Center for Development of Advanced Computing (C-DAC),
National Center for Supercomputing Applications (NCSA), Center
for Simulation of Advanced Rockets, Siebel Systems (acquired by
Oracle), Pathscale Inc. (acquired by QLogic),Yahoo!, LinkedIn, and
Pivotal (formerly Greenplum)
12. Extending Input Phase
• Convert ByteStream to List(Key,Value)
• Several Formats pre-packaged
• TextInputFormat<long, Text>!
• SequenceFileInputFormat<K,V>!
• KeyValueTextInputFormat<Text,Text>!
• Specify InputFormat for each job
• JobConf.setInputFormat()
13. InputFormat
•getSplits() : From Input descriptors,
get Input Splits, such that each Split can be
processed independently
•<FileName, startOffset,
length>!
•getRecordReader() : From an
InputSplit, get list of Records
16. Anomaly Detection in
SurveillanceVideo
• Detect anomalous objects in a restricted
perimeter
• Typical large enterprise collectsTB’s video per day
• Hadoop MapReduce runs computer vision
algorithms in parallel and captures violation
events
• Post-Incident monitoring enabled by Interactive
Query
17. Video DataFlow
•TimestampedVideo Files as input
•DistributedVideoTranscoding : ETL in
Hadoop
•DistributedVideo Analytics in Hadoop/
HAWQ
•Insights in relational DB
18. Real WorldVideo Data
• Benchmark Surveillance
videos from UK Home
Office (iLids)
• CCTVVideo footage
depicting scenarios
central to Govt
requirements
19. CommonVideo
Standards
• MPEG & ITU
responsible for
most video
standards
• MPEG-2 (1995)
Widely adopted in
DVDs, TV, SetTop
boxes
20. MPEG Standard Format
•Sequence of encoded video frames
•Compression by eliminating:
•Redundancy inTime: Inter-Frame Encoding
•Redundancy in Space: Intra-Frame
Encoding
21. Motion Compensation
• I-Frame: Intra-Frame
encoding
• P-Frame: Predicated
frame from previous
frame
• B-Frame: Predicted frame
from both previous &
next frame
22. Distributed MPEG
Decoding
•HDFS splits large files in 64 MB/128 MB
blocks
•Each HDFS block can be processed
independently by a Map task
•Can we decode individual video frames from
an arbitrary HDFS block in an MPEG File ?
23. Splitting MPEG-2
• Header Information available only once per file
• Group of Pictures (GOP) header repeats
• Each GOP starts with an I-Frame and ends with
an I-Frame
• Each GOP can be decoded independently
• First and last GOP may straddle HDFS blocks
25. MPEG2RecordReader
•Start from beginning of block
•Search for the first GOP Header
•Locate an I-Frame, decode, keep in memory
•If P-Frame, decode using last frame
•If B-Frame, keep current frame in memory,
read next frame, decode current frame
26. Considerations for Input
Format
•Use as little metadata as possible
•Number of Splits = Number of MapTasks
•Combine small files
•Split determination happens in a single
process, so should be metadata-based
•Affects scalability of MapReduce
27. Scalability
•If one node processes k MB/s, then N nodes
should process (k*N) MB/s
•If some fixed amount of data is processed in
T minutes on one node, the N nodes should
process same data in (T/N) minutes
•Linear Scalability
36. Why Shuffle ?
•Often, the most expensive phase in
MapReduce, involves slow disks and network
•Map tasks partition, sort and serialize
outputs, and write to local disk
•Reduce tasks pull individual Map outputs
over network, merge, and may spill to disk
38. Message Granularity
•For Gigabit Ethernet
•α = 300 μS
•β = 100 MB/s
•100 Messages of 10KB each = 40 ms
•10 Messages of 100 KB each = 13 ms
39. Alpha-Beta
• Common Mistake:Assuming that α is constant
• Scheduling latency for responder
• MR daemons time slice inversely proportional to
number of concurrent tasks
• Common Mistake:Assuming that β is constant
• Network congestion
• TCP incast
50. YARN
•Yet Another Resource Negotiator
•Resource Manager
•Node Managers
•Application Masters
•Specific to paradigm, e.g. MR Application
master (aka JobTracker)
51. Beyond MapReduce
•Apache Giraph - BSP & Graph Processing
•Storm onYarn - Streaming Computation
•HOYA - HBase onYarn
•Hamster - MPI on Hadoop
•More to come ...
52. Hamster
• Hadoop and MPI on the same
cluster
• OpenMPI Runtime on
HadoopYARN
• Hadoop Provides: Resource
Scheduling, Process
monitoring, Distributed File
System
• Open MPI Provides: Process
launching, Communication, I/O
forwarding
57. About GraphLab
•Graph-based, High-Performance distributed
computation framework
•Started by Prof. Carlos Guestrin in CMU in
2009
•Recently founded Graphlab Inc to
commercialize Graphlab.org
59. Only Graphs are not
Enough
•Full Data processing workflow required ETL/
Postprocessing,Visualization, Data Wrangling,
Serving
•MapReduce excels at data wrangling
•OLTP/NoSQL Row-Based stores excel at
Serving
•GraphLab should co-exist with other Hadoop
frameworks
62. HCFS
•Hadoop Compatible File Systems
•FileSystem, FileContext
•S3, Local FS, webhdfs
•Azure Blob Storage, CassandraFS, Ceph,
CleverSafe, Google Cloud Storage, Gluster,
Lustre, QFS, EMCViPR (more to come)
63. New Dataset
•Reuse Namenode and Datanode
implementations
•Substitute a different DataSet
implementation: FsDatasetSpi,
FsVolumeSpi
•Jira: HDFS-5194