SlideShare ist ein Scribd-Unternehmen logo
1 von 11
© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
Putting Analytics in
Big Data Analytics
Jake Cornelius, Dir. Of Product Management
Pentaho Corporation
October 12, 2010
010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Traditional BI
Tape/Trash
Data Mart(s)
Data
Source
?
? ?
?
?
??
010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Data Lake(s)
Big Data Architecture
Data Mart(s)
Data
Source
Data WarehouseAd-Hoc
010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Pentaho Data Integration
Hadoop
Pentaho Data
Integration
Data Marts, Data Warehouse,
Analytical Applications
Design
Deploy
Orchestrate
Pentaho Data
Integration
Pentaho Data
Integration
010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Optimize
Visualize
Load
Files / HDFS
Hive
DM & DW
Applications & Systems
Web Tier
RDBMS
Hadoop
Reporting / Dashboards / Analysis
010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Web Tier
RDBMS
Hadoop
Reporting / Dashboards / Analysis
HDFS
Hive
DM
010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Demo
• Pentaho for Hadoop Download Capability
• Includes support for development, production support will follow with GA
• Collaborative effort between Pentaho and the Pentaho Community
• 60+ beta sites over three month beta cycle
• Pentaho contributed code for API integration with HIVE to the open source
Apache Foundation
• Pentaho and Cloudera Partnership
• Combines Pentaho ‘s business intelligence and data integration capabilities
with Cloudera’s Distribution for Hadoop (CDH)
• Enables business users to take advantage of Hadoop with ability to easily and
cost-effectively mine, visualize and analyze their Hadoop data
Pentaho for Hadoop Announcements
Pentaho for Hadoop Announcements (cont)
• Pentaho and Impetus Technologies Partnership
• Incorporates Pentaho Agile BI and Pentaho BI Suite for Hadoop into Impetus
Large Data Analytics practice
• First major SI to adopt Pentaho for Hadoop
• Facilitates large data analytics projects including expert consulting services,
best practices support in Hadoop implementations and nCluster including
deployment on private and public clouds
010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Pentaho for Hadoop Resources & Events
Resources
Download www.pentaho.com/download/hadoop
Pentaho for Hadoop webpage - resources, press, events, partnerships and
more: www.pentaho.com/hadoop
Big Data Analytics: 5 part video series with James Dixon, Pentaho CTO
Events
Hadoop World: NYC - Oct 12, Gold Sponsor, Exhibitor, Richard Daley
presenting, ‘Putting Analytics in Big Data Analysis’
London Hadoop User Group - Oct 12, London
Agile BI Meets Big Data - Oct 13, New York City
010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Thank You.
Join the conversation. You can find us on:
Pentaho Facebook Group
@Pentaho
http://blog.pentaho.com
Pentaho - Open Source Business Intelligence Group

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Pentaho
 
Pentaho Analytics at Tampa Analytics September Meetup
Pentaho Analytics at Tampa Analytics September MeetupPentaho Analytics at Tampa Analytics September Meetup
Pentaho Analytics at Tampa Analytics September MeetupMark Kromer
 
30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho EvaluationPentaho
 
Big Data for Product Managers
Big Data for Product ManagersBig Data for Product Managers
Big Data for Product ManagersPentaho
 
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelMoving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelDataWorks Summit
 
Breakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopBreakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopCloudera, Inc.
 
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightPrecisely
 
All data accessible to all my organization - Presentation at OW2con'19, June...
 All data accessible to all my organization - Presentation at OW2con'19, June... All data accessible to all my organization - Presentation at OW2con'19, June...
All data accessible to all my organization - Presentation at OW2con'19, June...OW2
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSteven Totman
 
Data Mashups for Analytics
Data Mashups for AnalyticsData Mashups for Analytics
Data Mashups for AnalyticsKatharine Bierce
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie
 
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...ArabNet ME
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB
 
Embedded Analytics in Human Capital Management
Embedded Analytics in Human Capital ManagementEmbedded Analytics in Human Capital Management
Embedded Analytics in Human Capital ManagementPentaho
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraCloudera, Inc.
 
2021 gartner mq dsml
2021 gartner mq dsml2021 gartner mq dsml
2021 gartner mq dsmlSasikanth R
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightCloudera, Inc.
 

Was ist angesagt? (20)

Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
 
Pentaho Analytics at Tampa Analytics September Meetup
Pentaho Analytics at Tampa Analytics September MeetupPentaho Analytics at Tampa Analytics September Meetup
Pentaho Analytics at Tampa Analytics September Meetup
 
30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation
 
Big Data for Product Managers
Big Data for Product ManagersBig Data for Product Managers
Big Data for Product Managers
 
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelMoving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
 
Breakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopBreakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with Hadoop
 
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 
All data accessible to all my organization - Presentation at OW2con'19, June...
 All data accessible to all my organization - Presentation at OW2con'19, June... All data accessible to all my organization - Presentation at OW2con'19, June...
All data accessible to all my organization - Presentation at OW2con'19, June...
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 
Data Mashups for Analytics
Data Mashups for AnalyticsData Mashups for Analytics
Data Mashups for Analytics
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoop
 
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
 
Data Process Systems, connecting everything
Data Process Systems, connecting everythingData Process Systems, connecting everything
Data Process Systems, connecting everything
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
 
Embedded Analytics in Human Capital Management
Embedded Analytics in Human Capital ManagementEmbedded Analytics in Human Capital Management
Embedded Analytics in Human Capital Management
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
2021 gartner mq dsml
2021 gartner mq dsml2021 gartner mq dsml
2021 gartner mq dsml
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 
Integrated dwh 3
Integrated dwh 3Integrated dwh 3
Integrated dwh 3
 

Ähnlich wie Pentaho - Jake Cornelius - Hadoop World 2010

Hadoop uk user group meeting final
Hadoop uk user group meeting finalHadoop uk user group meeting final
Hadoop uk user group meeting finalSkills Matter
 
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBICC Thomas More
 
How advanced analytics is impacting the banking sector
How advanced analytics is impacting the banking sectorHow advanced analytics is impacting the banking sector
How advanced analytics is impacting the banking sectorMichael Haddad
 
Putting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data StoresPutting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data StoresDATAVERSITY
 
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...MongoDB
 
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...Alluxio, Inc.
 
Pentaho Roadmap 2011
Pentaho Roadmap 2011Pentaho Roadmap 2011
Pentaho Roadmap 2011Datalytics
 
Pentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopPentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopMark Kromer
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
 
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB
 
What's on Your Wish List?
What's on Your Wish List?What's on Your Wish List?
What's on Your Wish List?MongoDB
 
Open Analytics 2014 - Pedro Alves - Innovation though Open Source
Open Analytics 2014 - Pedro Alves - Innovation though Open SourceOpen Analytics 2014 - Pedro Alves - Innovation though Open Source
Open Analytics 2014 - Pedro Alves - Innovation though Open SourceOpenAnalytics Spain
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachDataWorks Summit
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsKognitio
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarHortonworks
 
Driving Real Insights Through Data Science
Driving Real Insights Through Data ScienceDriving Real Insights Through Data Science
Driving Real Insights Through Data ScienceVMware Tanzu
 
Pentaho Analytics on MongoDB
Pentaho Analytics on MongoDBPentaho Analytics on MongoDB
Pentaho Analytics on MongoDBMark Kromer
 

Ähnlich wie Pentaho - Jake Cornelius - Hadoop World 2010 (20)

Hadoop uk user group meeting final
Hadoop uk user group meeting finalHadoop uk user group meeting final
Hadoop uk user group meeting final
 
Plug 20110217
Plug   20110217Plug   20110217
Plug 20110217
 
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
 
How advanced analytics is impacting the banking sector
How advanced analytics is impacting the banking sectorHow advanced analytics is impacting the banking sector
How advanced analytics is impacting the banking sector
 
Putting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data StoresPutting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data Stores
 
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
 
Big data for product managers
Big data for product managersBig data for product managers
Big data for product managers
 
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
 
Pentaho Roadmap 2011
Pentaho Roadmap 2011Pentaho Roadmap 2011
Pentaho Roadmap 2011
 
Pentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopPentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and Hadoop
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
 
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
 
What's on Your Wish List?
What's on Your Wish List?What's on Your Wish List?
What's on Your Wish List?
 
Open Analytics 2014 - Pedro Alves - Innovation though Open Source
Open Analytics 2014 - Pedro Alves - Innovation though Open SourceOpen Analytics 2014 - Pedro Alves - Innovation though Open Source
Open Analytics 2014 - Pedro Alves - Innovation though Open Source
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 
Driving Real Insights Through Data Science
Driving Real Insights Through Data ScienceDriving Real Insights Through Data Science
Driving Real Insights Through Data Science
 
Pentaho Analytics on MongoDB
Pentaho Analytics on MongoDBPentaho Analytics on MongoDB
Pentaho Analytics on MongoDB
 

Mehr von Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Kürzlich hochgeladen (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Pentaho - Jake Cornelius - Hadoop World 2010

  • 1. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Putting Analytics in Big Data Analytics Jake Cornelius, Dir. Of Product Management Pentaho Corporation October 12, 2010
  • 2. 010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide Traditional BI Tape/Trash Data Mart(s) Data Source ? ? ? ? ? ??
  • 3. 010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide Data Lake(s) Big Data Architecture Data Mart(s) Data Source Data WarehouseAd-Hoc
  • 4. 010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide Pentaho Data Integration Hadoop Pentaho Data Integration Data Marts, Data Warehouse, Analytical Applications Design Deploy Orchestrate Pentaho Data Integration Pentaho Data Integration
  • 5. 010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide Optimize Visualize Load Files / HDFS Hive DM & DW Applications & Systems Web Tier RDBMS Hadoop Reporting / Dashboards / Analysis
  • 6. 010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide Web Tier RDBMS Hadoop Reporting / Dashboards / Analysis HDFS Hive DM
  • 7. 010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide Demo
  • 8. • Pentaho for Hadoop Download Capability • Includes support for development, production support will follow with GA • Collaborative effort between Pentaho and the Pentaho Community • 60+ beta sites over three month beta cycle • Pentaho contributed code for API integration with HIVE to the open source Apache Foundation • Pentaho and Cloudera Partnership • Combines Pentaho ‘s business intelligence and data integration capabilities with Cloudera’s Distribution for Hadoop (CDH) • Enables business users to take advantage of Hadoop with ability to easily and cost-effectively mine, visualize and analyze their Hadoop data Pentaho for Hadoop Announcements
  • 9. Pentaho for Hadoop Announcements (cont) • Pentaho and Impetus Technologies Partnership • Incorporates Pentaho Agile BI and Pentaho BI Suite for Hadoop into Impetus Large Data Analytics practice • First major SI to adopt Pentaho for Hadoop • Facilitates large data analytics projects including expert consulting services, best practices support in Hadoop implementations and nCluster including deployment on private and public clouds
  • 10. 010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide Pentaho for Hadoop Resources & Events Resources Download www.pentaho.com/download/hadoop Pentaho for Hadoop webpage - resources, press, events, partnerships and more: www.pentaho.com/hadoop Big Data Analytics: 5 part video series with James Dixon, Pentaho CTO Events Hadoop World: NYC - Oct 12, Gold Sponsor, Exhibitor, Richard Daley presenting, ‘Putting Analytics in Big Data Analysis’ London Hadoop User Group - Oct 12, London Agile BI Meets Big Data - Oct 13, New York City
  • 11. 010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide Thank You. Join the conversation. You can find us on: Pentaho Facebook Group @Pentaho http://blog.pentaho.com Pentaho - Open Source Business Intelligence Group

Hinweis der Redaktion

  1. In a traditional BI system where we have not been able to store all of the raw data, we have solved the problem by being selective. Firstly we selected the attributes of the data that we know we have questions about. Then we cleansed it and aggregated it to transaction levels or higher, and packaged it up in a form that is easy to consume. Then we put it into an expensive system that we could not scale, whether technically or financially. The rest of the data was thrown away or archived on tape, which for the purposes of analysis, is the same as throwing it away. TRANSITION The problem is we don’t know what is in the data that we are throwing away or archiving. We can only answer the questions that we could predict ahead of time.
  2. When we look at the Big Data architecture we described before we recall that * We want to store all of the data, so we can answer both known and unknown questions * We want to satisfy our standard reporting and analysis requirements * We want to satisfying ad-hoc needs by providing the ability to dip into the lake at any time to extract data * We want to balance balance performance and cost as we scale We need the ability to take the data in the Data Lake and easily convert it into data suitable for a data mart, data warehouse or ad-hoc data set - without requiring custom Java code
  3. Fortunately we have an embeddable data integration engine, written in Java We have taken our Data Integration engine, PDI and integrated with Hadoop in a number of different areas: * We have the ability to move files between Hadoop and external locations * We have the ability to read and write to HDFS files during data transformations * We have the ability to execute data transformations within the MapReduce engine * We have the ability to extract information from Hadoop and load it into external data bases and applications * And we have the ability to orchestrate all of this so you can integrate Hadoop into the rest of your data architecture with scheduling, monitoring, logging etc
  4. Put in to diagram form so we can indicate the different layers in the architecture and also show the scale of the data we get this Big Data pyramid. * At the bottom of the pyramid we have Hadoop, containing our complete set of data. * Higher up we have our data mart layer. This layer has less data in it, but has better performance. * At the top we have application-level data caches. * Looking down from the top, from the perspective of our users, they can see the whole pyramid - they have access to the whole structure. The only thing that varies is the query time, depending on what data they want. * Here we see that the RDBMS layer lets up optimize access to the data. We can decide how much data we want to stage in this layer. If we add more storage in this layer, we can increase performance of a larger subset of the data lake, but it costs more money.
  5. In this demo we will show how easy it is to execute a series of Hadoop and non-Hadoop tasks. We are going to TRANSITION 1 Get a weblog file from an FTP server TRANSITION 2 Make sure the source file does not exist with the Hadoop file system TRANSITION 3 Copy the weblog file into Hadoop TRANSITION 4 Read the weblog and process it - add metadata about the URLs, add geocoding, and enrich the operating system and browser attributes TRANSITION 5 Write the results of the data transformation to a new, improved, data file TRANSITION 6 Load the data into Hive TRANSITION 7 Read an aggregated data set from Hadoop TRANSITION 8 And write it into a database TRANSITION 9 Slice and dice the data with the database TRANSITION 10 And execute an ad-hoc query into Hadoop