"Hadoop Analytics on your data in place"
Steve Watt leads engineering for the Hadoop and Big Data program at Red Hat. Most recently Steve has been focusing on Hadoop Interoperability and better enabling Hadoop support for alternative filesystems. Prior to Red Hat, Steve spent 2 years at Hewlett-Packard, first co-founding the Hadoop business and then leading engineering as the Hadoop CTO. Prior to HP, Steve was at IBM for 10 years where he created IBMs first Hadoop Distribution and was part of the team that built BigSheets, the first spreadsheet interface for Hadoop.
3. But tonight I have my community hat on
CC flickr wcdumonts
@wattsteve
4. Hadoop in 2007
Platform Layers Technologies
Computational
Runtimes
FileSystems
HDFS or Amazon S3
Infrastructures
CC flickr wwarby
MapReduce, HBase
x86 or Amazon EC2
@wattsteve
5. Hadoop in 2013
Platform Layers
Technologies
Computational
Runtimes
YARN, GiRAPH, MapReduce,
HBase, Phoenix,
Spark/BDAS, Drill, Impala,
Stinger
FileSystems
HDFS + 13 Other Hadoop
FileSystems
Infrastructures
System on a Chip, x86,
Virtualization and Cloud
CC flickr lowfatbrains
@wattsteve
6. Observation #1: The Hadoop FileSystem Interface is
the keystone of the entire Ecosystem
CC flickr grufnik
@wattsteve
7. Observation #2: Moving data around just to analyze it
is slow and expensive. Especially if it requires a redundant
repository
.
CC flickr traftery
@wattsteve
8. So how does this work?
By leveraging Hadoop’s pluggable FileSystem architecture
Hadoop FS Clients
MapReduce
HBase
YARN
Any Application
Hadoop FileSystem Interface
Hadoop FileSystem Plugin
Hadoop FileSystem
FileSystem
Implementation
@wattsteve
10. What are some examples of where big
data is stored?
- Object Stores
- NoSQL Stores
- Distributed FileSystems
- Network Filers
- Databases
CC flickr birdwatcher63
@wattsteve
11. Network Filer Example
Hadoop FileSystem Configuration for GlusterFS
Hadoop FS Clients
MapReduce
HBase
YARN
Any Application
Hadoop FileSystem Interface
GlusterFS Plugin
Hadoop FileSystem
@wattsteve
12. Network Filer - Apache Hadoop on GlusterFS
Hadoop
Resource
Master Services
Manager
Management
Server
plugin
SWIFT
Hadoop
Node
Node
Node
Workers
Manager
Manager
Manager
plugin
plugin
plugin
NFS
FUSE
GlusterFS
FUSE
FUSE
FUSE
Trusted Peer
Trusted Peer
DAS Brick
DAS Brick
DAS Brick
Server 1
Server 2
Server 50
...
Trusted Peer
@wattsteve
13. Object Store Example
Hadoop FileSystem Configuration for SWIFT
Hadoop FS Clients
MapReduce
HBase
YARN
Any Application
Hadoop FileSystem Interface
SWIFT Plugin
Hadoop FileSystem
SWIFT
@wattsteve
14. NoSQL Example
Hadoop FileSystem Configuration for CassandraFS
Hadoop FS Clients
MapReduce
HBase
YARN
Any Application
Hadoop FileSystem Interface
CassandraFS Plugin
Hadoop FileSystem
@wattsteve
19. Closing Remarks
1. The amount of Hadoop FileSystems available
to you continues to increase
2. This is good! A vibrant ecosystem gives you
choice
3. Evaluate the option of analyzing your data in
place before deploying new environments
CC flickr zoomboy1
@wattsteve