SlideShare a Scribd company logo
1 of 17
1
Machine Learning Loves Hadoop
Enabling Machine Learning to Accelerate Data Returns
2
Agenda
©2014 Cloudera, Inc. All rights reserved.
Hadoop and Cloudera Overview
Machine Learning + The Enterprise Data Hub
Machine Learning in Practice
Q&A
Speakers
TJ Laher
Product Marketing
Sean Owen
Director of Data Science
Get Social
#ClouderaWebinars
3
Where Hadoop Began
©2014 Cloudera, Inc. All rights reserved.
Web Indexing, Google Earth,
Google Finance
Web Indexing Storing User Generated Data
2006 2008 2010
4
How Cloudera Accelerated Adoption
©2014 Cloudera, Inc. All rights reserved.
2008 2009 2010 2011 2012 2013 2014
CDH
Cloudera
Manager
CLOUDERA
ENTERPRISE
4
ASK BIGGER
QUESTIONS
ENTERPRISE
DATA HUB
Cloudera
Launched
Hadoop
Creator, Doug
Cutting, Joins
Launch CDH:
1st Commercial
Hadoop Distro
Launch
Cloudera
Manager: 1st
Hadoop
Management
Application
Cloudera U
Expands to 140
Countries
100 Customer
in Production
Release
Cloudera
Enterprise 4
300 Partners in
Cloudera
Connect
Introduce
Cloudera
Navigator,
Impala, Search
Realized the
Enterprise Data
Hub
Tom Reilly
Joins as CEO
5
The Enterprise Data Hub
©2014 Cloudera, Inc. All rights reserved.
EDHpoweredby
ApacheHadoop™
Unified
Out-of-the-box capabilities for
infinite scalability for storage,
ingest, access, metadata,
security, governance, and
management
Compliance-Ready
End-to-end security and
governance: authentication,
authorization, encryption, audit,
and lineage
Accessible
Utilize familiar tools and skills to
get value from your data faster
Multiple frameworks, including
batch and stream processing, in-
memory analytic SQL, enterprise
search, machine learning
Open
100% open source
– all components are Apache
licensed
Deploy in the cloud, on-premises,
or with an appliance
Social
Financial
Transactions
Sensor
OR
6
What does an EDH look like?
Model Building BI/ Visualizations Point Solutions
Processing Online
NoSQL
DBMS
Analytic
MPP DBMS
Search
Engine
Batch
Processing
Stream
Processing
Machine
Learning
Unified Management & Distributed Storage
Management &
Storage
Applications
Data Sources
Custom
Solutions
Management
Security & Governance
Metadata
Data
7
Machine Learning + The Enterprise Data Hub
8
Why do we use machine learning?
©2014 Cloudera, Inc. All rights reserved.
Transaction Classification
Recommendation Engine
Dynamic Pricing
…
Drug Discovery
Energy Exploration
Executive Reports
…
Operational AnalyticsInvestigative
9
Machine Learning Breakdown
©2014 Cloudera, Inc. All rights reserved.
Classification
Regression
Clustering
Collaborative Filtering
Category Algorithm Goal
Logistic Regression &
Random Decision Forest
Generalized Linear Models
K-means++
Alternating Least Squares
SupervisedUnsupervised
Pattern Recognition
Predict Future Values
Segment Historic Data
Recommend Items
10
Common Challenges with Machine Learning
©2014 Cloudera, Inc. All rights reserved.
Challenges The Cost
Time
False Positive and Negatives
Uncertainty of Model Quality
Unable to Explain and Improve Models
Bad Results
Traditional
Systems
Feature Generation & Selection
Overfitting
Historic Testing
Dirty Data
Debugging Models
11
How an Enterprise Data Hub Helps
©2014 Cloudera, Inc. All rights reserved.
Challenges The Benefit
Enterprise
Data Hub
Reduce Iteration Time
Eliminate Sampling
Test on Archived Data
Audit Data Trail
Immediate Data Access
Feature Generation & Selection
Overfitting
Historic Testing
Dirty Data
Debugging Models
12
Machine Learning in Practice
13
Fraud Detection
©2014 Cloudera, Inc. All rights reserved.
Data Algorithm
Outcome
Credit Card Transactions K-means++
Machine learning model leads to reduction of false negatives saving
organizations millions of dollars in fraud loss.
Management
Security & Governance
Metadata
Data
Offline Online
Cloudera
Navigator
Distributed Storage Modeling Rules Engine
14
Product Recommendations
©2014 Cloudera, Inc. All rights reserved.
Data Algorithm
Outcome
Customer Purchases
Social Data
Alternating Least Squares
Product recommendation engine, powered by machine learning model,
increases purchase conversation rates.
Management
Security & Governance
Metadata
Data
Distributed Storage Modeling Serve Value
Offline Online
Product #1
Product #2
Product #3
15
Predictive Maintenance
©2014 Cloudera, Inc. All rights reserved.
Data Algorithm
Outcome
Machine Sensors Logistic Regression
Machine learning model alerts employees for early identification of machine
failure reducing onsite visits.
Offline Online
Sensor Data Modeling Custom Application
16
Q&A
17
What’s Next?
©2014 Cloudera, Inc. All rights reserved.
TJ Laher
tlaher@cloudera.com
Sean Owen
sowen@cloudera.com
Contact Us
@Cloudera
1-866-843-7207
Use discount code Analytics10 to save 10% on new
enrollments in classes delivered by Cloudera until Sept ‘14*
Use discount code 15off2 to save 15% on enrollments in two
or more classes delivered by Cloudera until Sept ‘14*
Register now for Data Analyst, Spark,
or Data Science training at
http://university.cloudera.com

More Related Content

What's hot

What's hot (20)

Solr consistency and recovery internals
Solr consistency and recovery internalsSolr consistency and recovery internals
Solr consistency and recovery internals
 
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac... Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera

 
Apache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance UpdateApache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance Update
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
 
Intuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchIntuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with Search
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
 
Cloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep DiveCloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep Dive
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
 
Impala use case @ Zoosk
Impala use case @ ZooskImpala use case @ Zoosk
Impala use case @ Zoosk
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
 

Viewers also liked

Shuffle phase as the bottleneck in Hadoop Terasort
Shuffle phase as the bottleneck in Hadoop TerasortShuffle phase as the bottleneck in Hadoop Terasort
Shuffle phase as the bottleneck in Hadoop Terasort
pramodbiligiri
 
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics ApplicationsHortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applications
russell_jurney
 

Viewers also liked (20)

Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
Artificial Intelligence Layer: Mahout, MLLib, and other projects
Artificial Intelligence Layer: Mahout, MLLib, and other projectsArtificial Intelligence Layer: Mahout, MLLib, and other projects
Artificial Intelligence Layer: Mahout, MLLib, and other projects
 
Shuffle phase as the bottleneck in Hadoop Terasort
Shuffle phase as the bottleneck in Hadoop TerasortShuffle phase as the bottleneck in Hadoop Terasort
Shuffle phase as the bottleneck in Hadoop Terasort
 
Hadoop in Love
Hadoop in LoveHadoop in Love
Hadoop in Love
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and Hadoop
 
Using Data Science for Social Good: Fighting Human Trafficking
Using Data Science for Social Good: Fighting Human TraffickingUsing Data Science for Social Good: Fighting Human Trafficking
Using Data Science for Social Good: Fighting Human Trafficking
 
Social media strategy
Social media strategySocial media strategy
Social media strategy
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop Summit
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
Big Data: Real-life examples of Business Value Generation with Cloudera
Big Data: Real-life examples of Business Value Generation with ClouderaBig Data: Real-life examples of Business Value Generation with Cloudera
Big Data: Real-life examples of Business Value Generation with Cloudera
 
White Paper: Turning Anonymous Shoppers into Known Customers
White Paper: Turning Anonymous Shoppers into Known CustomersWhite Paper: Turning Anonymous Shoppers into Known Customers
White Paper: Turning Anonymous Shoppers into Known Customers
 
Hortonworks Sandbox Startup Guide for VirtualBox
Hortonworks Sandbox Startup Guide for VirtualBoxHortonworks Sandbox Startup Guide for VirtualBox
Hortonworks Sandbox Startup Guide for VirtualBox
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2
 
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics ApplicationsHortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applications
 
Cortana Analytics Workshop: Cortana Analytics -- Security, Privacy & Compliance
Cortana Analytics Workshop: Cortana Analytics -- Security, Privacy & ComplianceCortana Analytics Workshop: Cortana Analytics -- Security, Privacy & Compliance
Cortana Analytics Workshop: Cortana Analytics -- Security, Privacy & Compliance
 
The Benefits of Predictive and Proactive Support for an Enterprise Data Hub
The Benefits of Predictive and Proactive Support for an Enterprise Data HubThe Benefits of Predictive and Proactive Support for an Enterprise Data Hub
The Benefits of Predictive and Proactive Support for an Enterprise Data Hub
 
Cloudera security and enterprise license by Athemaster(繁中)
Cloudera security and enterprise license by Athemaster(繁中)Cloudera security and enterprise license by Athemaster(繁中)
Cloudera security and enterprise license by Athemaster(繁中)
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search
 

Similar to Machine Learning Loves Hadoop

Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-cloudera
inevitablecloud
 

Similar to Machine Learning Loves Hadoop (20)

Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1

 
How Big Data Can Help Marketers Improve Customer Relationships
How Big Data Can Help Marketers Improve Customer RelationshipsHow Big Data Can Help Marketers Improve Customer Relationships
How Big Data Can Help Marketers Improve Customer Relationships
 
A better business case for big data with Hadoop
A better business case for big data with HadoopA better business case for big data with Hadoop
A better business case for big data with Hadoop
 
Understanding Customer Buying Journey with Big Data
Understanding Customer Buying Journey with Big DataUnderstanding Customer Buying Journey with Big Data
Understanding Customer Buying Journey with Big Data
 
Oracle's Cloud Strategy
Oracle's Cloud StrategyOracle's Cloud Strategy
Oracle's Cloud Strategy
 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-cloudera
 
Analytics, Everywhere. Keys to Effective Analytics and Data Discovery
Analytics, Everywhere. Keys to Effective Analytics and Data DiscoveryAnalytics, Everywhere. Keys to Effective Analytics and Data Discovery
Analytics, Everywhere. Keys to Effective Analytics and Data Discovery
 
eFolder Acquires Cloudfinder: The Next Profit Opportunity
eFolder Acquires Cloudfinder: The Next Profit OpportunityeFolder Acquires Cloudfinder: The Next Profit Opportunity
eFolder Acquires Cloudfinder: The Next Profit Opportunity
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
 
Cloudera for Internet of Things
Cloudera for Internet of ThingsCloudera for Internet of Things
Cloudera for Internet of Things
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics?
 
Innovation Without Compromise: The Challenges of Securing Big Data
Innovation Without Compromise: The Challenges of Securing Big DataInnovation Without Compromise: The Challenges of Securing Big Data
Innovation Without Compromise: The Challenges of Securing Big Data
 
Google Cloud Machine Learning
 Google Cloud Machine Learning  Google Cloud Machine Learning
Google Cloud Machine Learning
 
The Journey to Success with Big Data
The Journey to Success with Big DataThe Journey to Success with Big Data
The Journey to Success with Big Data
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to Production
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to use
 

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
anilsa9823
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Recently uploaded (20)

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 

Machine Learning Loves Hadoop

  • 1. 1 Machine Learning Loves Hadoop Enabling Machine Learning to Accelerate Data Returns
  • 2. 2 Agenda ©2014 Cloudera, Inc. All rights reserved. Hadoop and Cloudera Overview Machine Learning + The Enterprise Data Hub Machine Learning in Practice Q&A Speakers TJ Laher Product Marketing Sean Owen Director of Data Science Get Social #ClouderaWebinars
  • 3. 3 Where Hadoop Began ©2014 Cloudera, Inc. All rights reserved. Web Indexing, Google Earth, Google Finance Web Indexing Storing User Generated Data 2006 2008 2010
  • 4. 4 How Cloudera Accelerated Adoption ©2014 Cloudera, Inc. All rights reserved. 2008 2009 2010 2011 2012 2013 2014 CDH Cloudera Manager CLOUDERA ENTERPRISE 4 ASK BIGGER QUESTIONS ENTERPRISE DATA HUB Cloudera Launched Hadoop Creator, Doug Cutting, Joins Launch CDH: 1st Commercial Hadoop Distro Launch Cloudera Manager: 1st Hadoop Management Application Cloudera U Expands to 140 Countries 100 Customer in Production Release Cloudera Enterprise 4 300 Partners in Cloudera Connect Introduce Cloudera Navigator, Impala, Search Realized the Enterprise Data Hub Tom Reilly Joins as CEO
  • 5. 5 The Enterprise Data Hub ©2014 Cloudera, Inc. All rights reserved. EDHpoweredby ApacheHadoop™ Unified Out-of-the-box capabilities for infinite scalability for storage, ingest, access, metadata, security, governance, and management Compliance-Ready End-to-end security and governance: authentication, authorization, encryption, audit, and lineage Accessible Utilize familiar tools and skills to get value from your data faster Multiple frameworks, including batch and stream processing, in- memory analytic SQL, enterprise search, machine learning Open 100% open source – all components are Apache licensed Deploy in the cloud, on-premises, or with an appliance Social Financial Transactions Sensor OR
  • 6. 6 What does an EDH look like? Model Building BI/ Visualizations Point Solutions Processing Online NoSQL DBMS Analytic MPP DBMS Search Engine Batch Processing Stream Processing Machine Learning Unified Management & Distributed Storage Management & Storage Applications Data Sources Custom Solutions Management Security & Governance Metadata Data
  • 7. 7 Machine Learning + The Enterprise Data Hub
  • 8. 8 Why do we use machine learning? ©2014 Cloudera, Inc. All rights reserved. Transaction Classification Recommendation Engine Dynamic Pricing … Drug Discovery Energy Exploration Executive Reports … Operational AnalyticsInvestigative
  • 9. 9 Machine Learning Breakdown ©2014 Cloudera, Inc. All rights reserved. Classification Regression Clustering Collaborative Filtering Category Algorithm Goal Logistic Regression & Random Decision Forest Generalized Linear Models K-means++ Alternating Least Squares SupervisedUnsupervised Pattern Recognition Predict Future Values Segment Historic Data Recommend Items
  • 10. 10 Common Challenges with Machine Learning ©2014 Cloudera, Inc. All rights reserved. Challenges The Cost Time False Positive and Negatives Uncertainty of Model Quality Unable to Explain and Improve Models Bad Results Traditional Systems Feature Generation & Selection Overfitting Historic Testing Dirty Data Debugging Models
  • 11. 11 How an Enterprise Data Hub Helps ©2014 Cloudera, Inc. All rights reserved. Challenges The Benefit Enterprise Data Hub Reduce Iteration Time Eliminate Sampling Test on Archived Data Audit Data Trail Immediate Data Access Feature Generation & Selection Overfitting Historic Testing Dirty Data Debugging Models
  • 13. 13 Fraud Detection ©2014 Cloudera, Inc. All rights reserved. Data Algorithm Outcome Credit Card Transactions K-means++ Machine learning model leads to reduction of false negatives saving organizations millions of dollars in fraud loss. Management Security & Governance Metadata Data Offline Online Cloudera Navigator Distributed Storage Modeling Rules Engine
  • 14. 14 Product Recommendations ©2014 Cloudera, Inc. All rights reserved. Data Algorithm Outcome Customer Purchases Social Data Alternating Least Squares Product recommendation engine, powered by machine learning model, increases purchase conversation rates. Management Security & Governance Metadata Data Distributed Storage Modeling Serve Value Offline Online Product #1 Product #2 Product #3
  • 15. 15 Predictive Maintenance ©2014 Cloudera, Inc. All rights reserved. Data Algorithm Outcome Machine Sensors Logistic Regression Machine learning model alerts employees for early identification of machine failure reducing onsite visits. Offline Online Sensor Data Modeling Custom Application
  • 17. 17 What’s Next? ©2014 Cloudera, Inc. All rights reserved. TJ Laher tlaher@cloudera.com Sean Owen sowen@cloudera.com Contact Us @Cloudera 1-866-843-7207 Use discount code Analytics10 to save 10% on new enrollments in classes delivered by Cloudera until Sept ‘14* Use discount code 15off2 to save 15% on enrollments in two or more classes delivered by Cloudera until Sept ‘14* Register now for Data Analyst, Spark, or Data Science training at http://university.cloudera.com