Suche senden
Hochladen
Hadoop Trends
•
19 gefällt mir
•
7,778 views
Hortonworks
Folgen
Eer
Weniger lesen
Mehr lesen
Technologie
Melden
Teilen
Melden
Teilen
1 von 28
Jetzt herunterladen
Downloaden Sie, um offline zu lesen
Empfohlen
Apache Ranger
Apache Ranger
Rommel Garcia
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
Apache Zookeeper
Apache Zookeeper
Nguyen Quang
Introduction to Apache Hive
Introduction to Apache Hive
Avkash Chauhan
Accelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks Autoloader
Databricks
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
Apache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks
Empfohlen
Apache Ranger
Apache Ranger
Rommel Garcia
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
Apache Zookeeper
Apache Zookeeper
Nguyen Quang
Introduction to Apache Hive
Introduction to Apache Hive
Avkash Chauhan
Accelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks Autoloader
Databricks
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
Apache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks
Apache HBase™
Apache HBase™
Prashant Gupta
Big data analytics with Apache Hadoop
Big data analytics with Apache Hadoop
Suman Saurabh
Sqoop
Sqoop
Prashant Gupta
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
Changing the game with cloud dw
Changing the game with cloud dw
elephantscale
Introduction to Hadoop
Introduction to Hadoop
Dr. C.V. Suresh Babu
Intro to HBase
Intro to HBase
alexbaranau
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
Hive 3 - a new horizon
Hive 3 - a new horizon
Thejas Nair
Hive(ppt)
Hive(ppt)
Abhinav Tyagi
Apache hive introduction
Apache hive introduction
Mahmood Reza Esmaili Zand
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
DataWorks Summit
03 hive query language (hql)
03 hive query language (hql)
Subhas Kumar Ghosh
Introduction to sqoop
Introduction to sqoop
Uday Vakalapudi
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
DataWorks Summit
Big Data and Hadoop
Big Data and Hadoop
Flavio Vit
Hive
Hive
Bala Krishna
Less06 networking
Less06 networking
Amit Bhalla
Hive partitioning best practices
Hive partitioning best practices
Nabeel Moidu
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
Md. Hasan Basri (Angel)
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
m_hepburn
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
Hortonworks
Weitere ähnliche Inhalte
Was ist angesagt?
Apache HBase™
Apache HBase™
Prashant Gupta
Big data analytics with Apache Hadoop
Big data analytics with Apache Hadoop
Suman Saurabh
Sqoop
Sqoop
Prashant Gupta
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
Changing the game with cloud dw
Changing the game with cloud dw
elephantscale
Introduction to Hadoop
Introduction to Hadoop
Dr. C.V. Suresh Babu
Intro to HBase
Intro to HBase
alexbaranau
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
Hive 3 - a new horizon
Hive 3 - a new horizon
Thejas Nair
Hive(ppt)
Hive(ppt)
Abhinav Tyagi
Apache hive introduction
Apache hive introduction
Mahmood Reza Esmaili Zand
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
DataWorks Summit
03 hive query language (hql)
03 hive query language (hql)
Subhas Kumar Ghosh
Introduction to sqoop
Introduction to sqoop
Uday Vakalapudi
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
DataWorks Summit
Big Data and Hadoop
Big Data and Hadoop
Flavio Vit
Hive
Hive
Bala Krishna
Less06 networking
Less06 networking
Amit Bhalla
Hive partitioning best practices
Hive partitioning best practices
Nabeel Moidu
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
Md. Hasan Basri (Angel)
Was ist angesagt?
(20)
Apache HBase™
Apache HBase™
Big data analytics with Apache Hadoop
Big data analytics with Apache Hadoop
Sqoop
Sqoop
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
Changing the game with cloud dw
Changing the game with cloud dw
Introduction to Hadoop
Introduction to Hadoop
Intro to HBase
Intro to HBase
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Hive 3 - a new horizon
Hive 3 - a new horizon
Hive(ppt)
Hive(ppt)
Apache hive introduction
Apache hive introduction
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
03 hive query language (hql)
03 hive query language (hql)
Introduction to sqoop
Introduction to sqoop
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
Big Data and Hadoop
Big Data and Hadoop
Hive
Hive
Less06 networking
Less06 networking
Hive partitioning best practices
Hive partitioning best practices
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
Ähnlich wie Hadoop Trends
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
m_hepburn
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
Hortonworks
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
Teradata Aster
Hadoop as data refinery
Hadoop as data refinery
Steve Loughran
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
JAX London
Introduction to Hadoop
Introduction to Hadoop
POSSCON
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
Hortonworks
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks
Why hadoop for data science?
Why hadoop for data science?
Hortonworks
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the Union
Hortonworks
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetup
Roby Chen
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
Slim Baltagi
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
201305 hadoop jpl-v3
201305 hadoop jpl-v3
Eric Baldeschwieler
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Hortonworks
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks
Ähnlich wie Hadoop Trends
(20)
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
Hadoop as data refinery
Hadoop as data refinery
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
Introduction to Hadoop
Introduction to Hadoop
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Why hadoop for data science?
Why hadoop for data science?
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the Union
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetup
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
201305 hadoop jpl-v3
201305 hadoop jpl-v3
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Mehr von Hortonworks
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
HDF 3.2 - What's New
HDF 3.2 - What's New
Hortonworks
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Hortonworks
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Hortonworks
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
Hortonworks
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
Mehr von Hortonworks
(20)
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
HDF 3.2 - What's New
HDF 3.2 - What's New
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Kürzlich hochgeladen
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
Antenna Manufacturer Coco
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Igalia
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Neo4j
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Igalia
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Michael W. Hawkins
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
Slack Application Development 101 Slides
Slack Application Development 101 Slides
praypatel2
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
naman860154
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
The Digital Insurer
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
RTylerCroy
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Delhi Call girls
Kürzlich hochgeladen
(20)
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Slack Application Development 101 Slides
Slack Application Development 101 Slides
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Hadoop Trends
1.
Trends and usage
of Apache Hadoop Eric Baldeschwieler CEO Hortonworks Twitter: @jeric14, @hortonworks January 2012 © Hortonworks Inc. 2011 Page 1
2.
Agenda • Define terms
– What is Hadoop? Why does Hadoop matter? • What drives Hadoop adoption? • Observed Trends Architecting the Future of Big Data Page 2 © Hortonworks Inc. 2011
3.
Hortonworks Vision We
believe that by 2015, more than half the world's data will be processed by Apache Hadoop How to achieve that vision??? Enable ecosystem around enterprise-viable platform. Page 3 © Hortonworks Inc. 2011
4.
What is Apache
Hadoop? • Solution for big data – Deals with complexities of high volume, velocity & variety of data • Set of open source projects • Transforms commodity hardware into a service that: – Stores petabytes of data reliably – Allows huge distributed computations • Key attributes: – Redundant and reliable (no data loss) One of the best examples of – Extremely powerful open source driving innovation – Batch processing centric and creating a market – Easy to program distributed apps – Runs on commodity hardware Page 4 © Hortonworks Inc. 2011
5.
Hortonworks Data Platform
(HDP) Key Components of “Standard Hadoop” Open Source Stack Core Apache Hadoop Related Hadoop Projects Open APIs for: • Data Integration • Data Movement • App Job Management • System Management Pig Hive (Data Flow) (SQL) (Columnar NoSQL Store) HBase MapReduce Zookeeper (Coordination) (Distributed Programing Framework) HCatalog (Table & Schema Management) HDFS (Hadoop Distributed File System) Page 5 © Hortonworks Inc. 2011
6.
Big Data Trailblazers
and Use Cases data analyzing web logs analytics advertising optimization machine learning mail anti-spam text mining web search content optimization customer trend analysis ad selection video & audio processing data mining user interest prediction social media Page 6 © Hortonworks Inc. 2011
7.
Yahoo!, Apache Hadoop
& Hortonworks http://www.wired.com/wiredenterprise/2011/10/how-yahoo-spawned-hadoop Yahoo! embraced Apache Hadoop, an open source platform, to crunch epic amounts of data using an army of dirt-cheap servers 2006 Hadoop at Yahoo! 40K+ Servers 170PB Storage 5M+ Monthly Jobs 1000+ Active Users 2011 Yahoo! spun off 22+ engineers into Hortonworks, a company focused on advancing open source Apache Hadoop for the broader market Page 7 © Hortonworks Inc. 2011
8.
What drives Hadoop
adoption? Architecting the Future of Big Data Page 8 © Hortonworks Inc. 2011
9.
Market Drivers for
Apache Hadoop • Business drivers – High-value projects that require use of more data Gartner predicts 800% data growth – Belief that there is great ROI in mastering big data over next 5 years • Financial drivers – Growing cost of data systems as percentage of IT spend – Cost advantage of commodity hardware + open source – Enables departmental-level big data strategies 80-90% of data produced today is unstructured • Technical drivers – Existing solutions failing under growing requirements – 3Vs - Volume, velocity, variety – Proliferation of unstructured data © Hortonworks Inc. 2011 9 © Hortonworks Inc. 2011
10.
Every Market has
Big Data Digital data is personal, everywhere, increasingly accessible, and will continue to grow exponentially Source: McKinsey & Company report. Big data: The next frontier for innovation, competition, and productivity. May 2011. Page 10 © Hortonworks Inc. 2011
11.
Broader Use Case
Opportunities Financial Services Healthcare • Detect/prevent fraud • Patient monitoring • Model and manage risk • Predictive modeling • Personalize banking/insurance products • Compliance, Archival, text search • Compliance, Archival, … • Data driven research Retail Web / Social / Mobile • Behavior analysis • Sentiment analysis • Cross selling, recommendation engines • Web log, image, and video analysis • Optimize pricing, placement, design • Personalization • Optimize inventory and distribution • Billing, Reporting, Network Analysis Manufacturing Government • Simulation, Analysis, Design • Detect/prevent fraud • Improve service via product sensor data • Security & Intelligence • “Digital factory” for lean manufacturing • Support open data initiatives Page 11 © Hortonworks Inc. 2011
12.
Observed Trends
Architecting the Future of Big Data Page 12 © Hortonworks Inc. 2011
13.
Trend: Agile Data • The
old way – Operational systems keep only current records, short history – Analytics systems keep only conformed / cleaned / digested data – Unstructured data locked away in operational silos – Archives offline – Inflexible, new questions require system redesigns • The new trend – Keep raw data in Hadoop for a long time – Able to produce a new analytics view on-demand – Keep a new copy of data that was previously on in silos – Can directly do new reports, experiments at low incremental cost – New products / services can be added very quickly – Agile outcome justifies new infrastructure Architecting the Future of Big Data Page 13 © Hortonworks Inc. 2011
14.
Traditional Enterprise Data
Architecture Data Silos Traditional Data Warehouses, Serving Applications BI & Analytics Web NoSQL Traditional ETL & Data BI / Serving RDMS … Message buses EDW Marts Analytics Serving Social Sensor Text Logs Media Data Systems … Unstructured Systems Page 14 © Hortonworks Inc. 2011
15.
Agile Data Architecture
w/Hadoop Connecting All of Your Big Data Traditional Data Warehouses, Serving Applications BI & Analytics Web NoSQL Traditional ETL & Data BI / Serving RDMS … Message buses EDW Marts Analytics EsTsL (s = Store) Custom Analytics Serving Social Sensor Text Logs Media Data Systems … Unstructured Systems Page 15 © Hortonworks Inc. 2011
16.
Trend: Data driven
development • Limited runtime logic driven by huge lookup tables • Data computed offline on Hadoop – Machine learning, other expensive computation offline – Personalization, classification, fraud, value analysis… • Application development requires data science – Huge amounts of actually observed data key to modern services – Hadoop used as the science platform Architecting the Future of Big Data Page 16 © Hortonworks Inc. 2011
17.
CASE STUDY
YAHOO! HOMEPAGE • Serving Maps SCIENCE » Machine learning to build ever • Users -‐ Interests HADOOP better categorization models CLUSTER • Five Minute USER CATEGORIZATION Produc7on BEHAVIOR MODELS (weekly) • Weekly PRODUCTION Categoriza7on HADOOP » Identify user interests using models SERVING CLUSTER Categorization models MAPS (every 5 minutes) USER BEHAVIOR SERVING SYSTEMS ENGAGED USERS Build customized home pages with latest data (thousands / second) Copyright Yahoo 2011 17
18.
CASE STUDY
YAHOO! HOMEPAGE Personalized for each visitor Result: twice the engagement Recommended links News Interests Top Searches +79% clicks +160% clicks +43% clicks vs. randomly selected vs. one size fits all vs. editor selected Copyright Yahoo 2011 Hortonworks Inc. 2011 © 18
19.
Trend: Specialization of
Data Systems • Hadoop does not replace existing systems – It adds new capabilities to the enterprise – It can offload things that are not done efficiently in current systems – Especially in scale out situations • Specialization of traditional data components – Use OLTP systems just for transactions – Use OLAP systems for interactive analysis • Hadoop has LOTS of bandwidth to storage and CPU – Pull reporting out OLTP systems – Pull ELT out of OLAP systems Architecting the Future of Big Data Page 19 © Hortonworks Inc. 2011
20.
Hadoop and OLTP
Systems MPP Processing of Online Transactions Hadoop used to Process Reports • Mission critical • Free up 50+% processing power for • Manages transactions & serves reports transaction processing system • Significant cost savings due to commodity nature of Hadoop Web Site Transaction Reports Processing Web Systems Site $$$ Transaction Logs Web Site Page 20 © Hortonworks Inc. 2011
21.
Hadoop and OLAP
Systems Fast loading, raw data staging, ELT & long-term archival Allow analysts to use tools they know (The Agile Data Zone) (Take advantage of huge ecosystem of BI and Analytics tooling) Web Hadoop EDW Mobile Social Online Archival Other logs Page 21 © Hortonworks Inc. 2011
22.
TRENDS: Instrument Clouds
of Things Clouds of things logging to Hadoop HDFS + Map-Reduce Websites Or HBase Mobile phones, Enterprise devices… + Analysis Things Things Things Things Things Things Page 22 © Hortonworks Inc. 2011
23.
Trend: Many POCs,
Few Production Systems • The problem – Hadoop is still a young technology – Hard to find knowledgeable staff – Integration with existing systems • Hadoop market is maturing at speed – Emerging ecosystem of Hadoop platform solutions providers – Apache Hadoop continues to get better – Hadoop training and support available form several vendors Architecting the Future of Big Data Page 23 © Hortonworks Inc. 2011
24.
Growth in Hadoop
Ecosystem • Hardware vendors, Public Cloud (IAAS, PAAS) – Storage, Appliances, Preloaded commodity boxes, cloud • Data Systems – All the major vendors announced Hadoop plans / products in 2011 • BI, Analytics and ETL – Hadoop integrations emerging • Dedicated Hadoop Applications – Datamere, Karmashere, Platfora, … • Systems Integrators – Regional and Global providers available Architecting the Future of Big Data Page 24 © Hortonworks Inc. 2011
25.
Hadoop Continues to
Improve Apache community, including Hortonworks investing to improve Hadoop: • Make Hadoop an Open, Extensible, and Enterprise Viable Platform • Enable More Applications to Run on Apache Hadoop “Hadoop.Beyond” Platform actively evolving “Hadoop.Next” (Hadoop 0.23) HA, Next-gen HDFS & MapReduce “Hadoop.Now” Extension & Integration APIs (Hadoop 1.0) Most stable version ever HBase, security, WebHDFS Page 25 © Hortonworks Inc. 2011
26.
Hortonworks – Approachable
Hadoop • Apache Hadoop Leadership – Delivered every major release since 0.1 – Driving innovation across entire stack – Experience managing world’s largest deployment – Access to Yahoo’s 1,000+ Hadoop users and 40k+ nodes for testing, QA, etc. • Business Focus – Provide 100% open source product – Hortonworks Data Platform Expert Role-based Training – Help customers and partners overcome Hadoop knowledge gaps Full Lifecycle Support and Services – Help organizations successfully develop and deploy solutions based on Hadoop Evaluate Pilot Production Architecting the Future of Big Data Page 26 © Hortonworks Inc. 2011
27.
Trend: Finding More
Value Over Time • Hadoop is usually brought in to solve a specific problem – Build seach indexes for Yahoo – Manage web site logs for Facebook – Users using EC2 to do data processing at Amazon – Simple reporting when existing tools don’t scale • Once your data is in Hadoop more users find value • Once you have Hadoop, folks add more data Architecting the Future of Big Data Page 27 © Hortonworks Inc. 2011
28.
Thank You! Questions? Eric
Baldeschwieler @jeric14 @hortonworks Page 28 © Hortonworks Inc. 2011
Jetzt herunterladen