More Related Content Similar to Hadoop - Now, Next and Beyond (20) More from Teradata Aster (20) Hadoop - Now, Next and Beyond1. Apache Hadoop
Now, Next, and Beyond
Shaun Connolly
VP Corporate Strategy, Hortonworks
April 19, 2012
© Hortonworks Inc. 2012
2. Big Data: Transactions + Interactions + Observations
BIG DATA
User Generated Content Sensors / RFID / Devices
Petabytes Mobile Web Social Interactions & Feeds
Sentiment
User Click Stream Spatial & GPS Coordinates
Web logs Web A/B testing External Demographics
Terabytes
Offer history Dynamic Pricing Business Data Feeds
Affiliate Networks
HD Video, Audio, Images
CRM Segmentation
Gigabytes Search Marketing
Offer details Speech to Text
ERP Customer Touches Behavioral Targeting
Product/Service Logs
Purchase detail Support Contacts
Megabytes Purchase record Dynamic Funnels SMS/MMS
Payment record
Increasing data variety and complexity
Page 2
© Hortonworks Inc. 2012
3. What is Apache Hadoop?
• Collection of Open Source Projects One of the best examples of
– Apache Software Foundation (ASF) open source driving innovation
– Loosely coupled, ship early/often and creating a market
• Solution for big data
– Stores petabytes of data reliably
– Runs highly distributed applications
– Enables a rational economics model
– Powers data-driven business
Page 3
© Hortonworks Inc. 2012
4. Key Hadoop Stack Components
Core Components Extended Components
Pig Hive Ambari &
(Columnar NoSQL Store)
(Data Flow) (SQL-like Access) Other Monitoring & Management
HBase
(Cluster Coordination)
MapReduce Oozie &
Zookeeper
(Distributed Programing Framework) Other Workflow Scheduling
HCatalog Sqoop &
(Table & Schema Management) Other Ingest, ETL tools
HDFS Mahout &
(Hadoop Distributed File System) Other Libraries
Page 4
© Hortonworks Inc. 2012
5. Hadoop Now, Next, and Beyond
Apache community, including Hortonworks investing to improve Hadoop:
• Make Hadoop an open, extensible, and enterprise viable platform
• Enable more applications to run on Apache Hadoop
“Hadoop.Beyond”
Integrate w/ecosystem
“Hadoop.Next”
(Hadoop 0.23)
HDP 2
“Hadoop.Now” Next-gen HDFS & MapReduce
(Hadoop 1.0)
HDP 1
Most stable Hadoop ever
Page 5
© Hortonworks Inc. 2012
6. Unifying Classic & Big Data Methods
Classic Method
Structured & Repeatable Analysis
Business determines what IT structures the data to
questions to ask answer those questions
SQL Performance and Structure
“Capture only
what’s needed”
“Capture in case it’s
needed” MapReduce Processing Flexibility
IT delivers a platform for Big Data Method
storing, refining, and Business explores data for
Multi-structured & Iterative Analysis questions worth answering
analyzing all data sources
Page 6
© Hortonworks Inc. 2012
7. Unified Big Data Architecture
Enable Developers, Data Scientists, & Information Workers
Java, C/C++, Pig, JavaScript, Python, R, SAS, SQL, Excel, BI Tools, Reporting, etc.
Capture, Store, Refine, Discover, Analyze, Report, Retain
• Fast data loading • Path & pattern analysis • Operational analysis
• ELT/ETL and refinement • Graph analysis • Transactional analysis
• Image/video analysis • Text analysis • High volume ad-hoc
• Online retention • Iterative discovery • Elastic data marts
Batch Interactive Active
Audio,
Docs & Machine Coords & Social Web &
Video & CRM SCM ERP
Text Logs Sensors Content Mobile
Images
Page 7
© Hortonworks Inc. 2012
8. Hortonworks Vision
We believe that by the end of 2015,
more than half the world's data will
be processed by Apache Hadoop.
Q: How to achieve that vision???
A: Ecosystem enablement around enterprise-
viable open source data platform
Page 8
© Hortonworks Inc. 2012
9. • 2-day event (June 13-14, 2012) in San Jose, CA
• 84 breakout sessions
• Showcasing real-world examples, developments and
best practices of Apache Hadoop
• Plus, Geoffrey Moore to keynote and more to be
announced
• Register now at: http://www.hadoopsummit.org
Page 9