SlideShare ist ein Scribd-Unternehmen logo
1 von 22
F**** around with Big Data and Predictive Analytics
Featuring Kettle, Weka & Hadoop.
© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Pentahuh?
2© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
What’s Pentaho exactly?
CENTRAL ADMINISTRATION, AUDITING & MONITORING
DELIVER
When & Where
Users Need It
STREAMLINE
Information Delivery
VISUALIZE
& Report Information
In Any Style
ACCESS
All Enterprise
Data Sources
ISV & Packaged
Applications
SaaS / Cloud
Applications
EMBEDDED
Web
Mobile
Print
E-Mail
STANDALONE
‣ Advanced &
Predictive Analytics
DATA MINING
‣ Interactive
‣ Operational
‣ Enterprise
REPORTING
‣ Ad hoc Exploration
‣ Multi-Dimensional
ANALYSIS
‣ Interactive Metrics
‣ Rich Visualizations
DASHBOARDS
ERP / CRM /
Enterprise Apps
(e.g. SAP, Oracle)
Hadoop &
NoSQL Data
Unstructured &
semi-structured
(XML, Excel, Files, etc.)
Relational
Data Sources
Cloud
(e.g. Salesforce,
Amazon, Dell)
‣ Data Integration
‣ Graphical
ETL Designer
INTEGRATE,
CLEANSE,
& ENRICH DATA
‣ In Memory
Caching
‣ High
Performance
ANALYTICS
ACCELERATOR
‣ Direct Access
‣ Hadoop
Clustering/
Scheduling
‣ Instant OLAP
Cubes
‣ Enterprise
Scalability
We do open source analytics.
4© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Why does Pentaho claim to have
anything to do with Big Data??
5© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Project Kettle
powerful Extraction, Transformation and Loading (ETL) capabilities
using an innovative, metadata-driven approach
6© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Bring the code to the data
7© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
JDBC
Bring the code to the data
8© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
JDBCKettle
KettleKettle
Bring the code to the data
9© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Kettle
Project Weka
a comprehensive set of tools for machine learning and data mining
10© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
11© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
12© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Bring Weka to the data
13© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Kettle
Kettle
JDBCKettle
Kettle
Bring Weka to the data
14© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
JDBC Services for Kettle
runtime optimization and SQL pushdown
15© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
A smart(er) JDBC Layer
16© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Kettle
Kettle
Kettle
Kettle JDBC
SELECT
CUSTOMER_ID,
SUM(UNIT_SALES)
FROM
SALES_FACT
WHERE
AGE_GROUP_ID > 3
GROUP BY
CUSTOMER_ID;
SELECT
CUSTOMER_ID
FROM
SALES_FACT;
SELECT
CUSTOMER_ID,
SUM(UNIT_SALES)
FROM
SALES_FACT
WHERE
AGE_GROUP_ID > 3
GROUP BY
CUSTOMER_ID;
A smart(er) JDBC Layer
17© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Kettle
Kettle
Kettle
Kettle Kettle JDBC
Kettle
Kettle
The gains
18© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
• Job design and
administration becomes
trivial.
• Runs the rich Kettle plugin
environment directly on the
nodes.
• Performs much better than
Hive.
• The JDBC layer is pretty
neat.
The caveats
19© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
• True parallel machine
learning algorithms are rare
and hard to design.
• Not an actual
production-ready design.
• Clients might have
caches, which must be
notified by the BD store for
updates.
© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
20
Demo!
© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
21
Thank you!
Join the conversation. You can find us on:
blog.pentaho.com
@Pentaho
Facebook.com/Pentaho
Pentaho Business Analytics
© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
22
Want to learn more?
Learning Linear Models in Hadoop with Weka
http://markahall.blogspot.ca/2013/03/learning-linear-models-in-hadoop-with.html
Introduction to MapReduce with Pentaho Data Integration
http://www.youtube.com/watch?v=KZe1UugxXcs
`

Weitere ähnliche Inhalte

Was ist angesagt?

Pentaho Analytics at Tampa Analytics September Meetup
Pentaho Analytics at Tampa Analytics September MeetupPentaho Analytics at Tampa Analytics September Meetup
Pentaho Analytics at Tampa Analytics September MeetupMark Kromer
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoMark Kromer
 
30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho EvaluationPentaho
 
Pentaho roadmap 061314
Pentaho roadmap 061314Pentaho roadmap 061314
Pentaho roadmap 061314Stratebi
 
Why Your Product Needs an Analytic Strategy
Why Your Product Needs an Analytic Strategy Why Your Product Needs an Analytic Strategy
Why Your Product Needs an Analytic Strategy Pentaho
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Pentaho
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Denodo
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
 
What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?Xpand IT
 
Oracle Enterprise Metadata Management
Oracle Enterprise Metadata ManagementOracle Enterprise Metadata Management
Oracle Enterprise Metadata ManagementAndrey Akulov
 
Big Data for Product Managers
Big Data for Product ManagersBig Data for Product Managers
Big Data for Product ManagersPentaho
 
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelMoving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelDataWorks Summit
 
Breakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopBreakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopCloudera, Inc.
 
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
All data accessible to all my organization - Presentation at OW2con'19, June...
 All data accessible to all my organization - Presentation at OW2con'19, June... All data accessible to all my organization - Presentation at OW2con'19, June...
All data accessible to all my organization - Presentation at OW2con'19, June...OW2
 
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...DataStax Academy
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionDataWorks Summit
 

Was ist angesagt? (20)

Pentaho Analytics at Tampa Analytics September Meetup
Pentaho Analytics at Tampa Analytics September MeetupPentaho Analytics at Tampa Analytics September Meetup
Pentaho Analytics at Tampa Analytics September Meetup
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 
30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation
 
Pentaho roadmap 061314
Pentaho roadmap 061314Pentaho roadmap 061314
Pentaho roadmap 061314
 
Why Your Product Needs an Analytic Strategy
Why Your Product Needs an Analytic Strategy Why Your Product Needs an Analytic Strategy
Why Your Product Needs an Analytic Strategy
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
The Ecosystem is too damn big
The Ecosystem is too damn big The Ecosystem is too damn big
The Ecosystem is too damn big
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?
 
Oracle Enterprise Metadata Management
Oracle Enterprise Metadata ManagementOracle Enterprise Metadata Management
Oracle Enterprise Metadata Management
 
Big Data for Product Managers
Big Data for Product ManagersBig Data for Product Managers
Big Data for Product Managers
 
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelMoving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
 
Breakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopBreakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with Hadoop
 
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
All data accessible to all my organization - Presentation at OW2con'19, June...
 All data accessible to all my organization - Presentation at OW2con'19, June... All data accessible to all my organization - Presentation at OW2con'19, June...
All data accessible to all my organization - Presentation at OW2con'19, June...
 
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 
Big Data for BI - Beyond the Hype - Pentaho
Big Data for BI - Beyond the Hype - PentahoBig Data for BI - Beyond the Hype - Pentaho
Big Data for BI - Beyond the Hype - Pentaho
 

Andere mochten auch

Machine Learning with Hadoop
Machine Learning with HadoopMachine Learning with Hadoop
Machine Learning with HadoopSangchul Song
 
Optimizing multilingual search in SOLR
Optimizing multilingual search in SOLROptimizing multilingual search in SOLR
Optimizing multilingual search in SOLRBasis Technology
 
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian CarrierOSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian CarrierBasis Technology
 
Rosette Search Essentials for Elasticsearch
Rosette Search Essentials for ElasticsearchRosette Search Essentials for Elasticsearch
Rosette Search Essentials for ElasticsearchBasis Technology
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and HadoopJosh Patterson
 
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...Basis Technology
 
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Simple fuzzy Name Matching in Elasticsearch - Graham MoreheadSimple fuzzy Name Matching in Elasticsearch - Graham Morehead
Simple fuzzy Name Matching in Elasticsearch - Graham MoreheadBasis Technology
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?Jos van Dongen
 
How can iceland produce so many professional players sept 2010
How can iceland produce so many professional players sept 2010How can iceland produce so many professional players sept 2010
How can iceland produce so many professional players sept 2010Robin.Russell
 
The Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsThe Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsAlyona Medelyan
 
Building Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoBuilding Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoAshnikbiz
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Roland Bouman
 
Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Uday Kothari
 
Pentaho-BI
Pentaho-BIPentaho-BI
Pentaho-BIEdureka!
 
Final project report`````
Final project report`````Final project report`````
Final project report`````Arslan Ahmad
 
Smart SMBs: fine-tuning the engines of growth
Smart SMBs: fine-tuning the engines of growth Smart SMBs: fine-tuning the engines of growth
Smart SMBs: fine-tuning the engines of growth Steve Bray
 

Andere mochten auch (20)

Machine Learning with Hadoop
Machine Learning with HadoopMachine Learning with Hadoop
Machine Learning with Hadoop
 
Optimizing multilingual search in SOLR
Optimizing multilingual search in SOLROptimizing multilingual search in SOLR
Optimizing multilingual search in SOLR
 
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian CarrierOSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
 
Rosette Search Essentials for Elasticsearch
Rosette Search Essentials for ElasticsearchRosette Search Essentials for Elasticsearch
Rosette Search Essentials for Elasticsearch
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and Hadoop
 
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
 
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Simple fuzzy Name Matching in Elasticsearch - Graham MoreheadSimple fuzzy Name Matching in Elasticsearch - Graham Morehead
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
 
Final Doc_1.1
Final Doc_1.1Final Doc_1.1
Final Doc_1.1
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?
 
How can iceland produce so many professional players sept 2010
How can iceland produce so many professional players sept 2010How can iceland produce so many professional players sept 2010
How can iceland produce so many professional players sept 2010
 
The Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsThe Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text Analytics
 
Building Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoBuilding Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using Pentaho
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
 
Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Pentaho-BI
Pentaho-BIPentaho-BI
Pentaho-BI
 
Final project report`````
Final project report`````Final project report`````
Final project report`````
 
Dubai Travel Guide
Dubai Travel GuideDubai Travel Guide
Dubai Travel Guide
 
Smart SMBs: fine-tuning the engines of growth
Smart SMBs: fine-tuning the engines of growth Smart SMBs: fine-tuning the engines of growth
Smart SMBs: fine-tuning the engines of growth
 
World com
World comWorld com
World com
 

Ähnlich wie Slides pentaho-hadoop-weka

MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB
 
SQL Server 2016 Everything built-in FULL deck
SQL Server 2016 Everything built-in FULL deckSQL Server 2016 Everything built-in FULL deck
SQL Server 2016 Everything built-in FULL deckHamid J. Fard
 
Leveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessLeveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessDataWorks Summit
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetupWei Ting Chen
 
엔터프라이즈의 AI/ML 활용을 돕는 Paxata 지능형 데이터 전처리 플랫폼 (최문규 이사, PAXATA) :: AWS Techforum...
엔터프라이즈의 AI/ML 활용을 돕는 Paxata 지능형 데이터 전처리 플랫폼 (최문규 이사, PAXATA) :: AWS Techforum...엔터프라이즈의 AI/ML 활용을 돕는 Paxata 지능형 데이터 전처리 플랫폼 (최문규 이사, PAXATA) :: AWS Techforum...
엔터프라이즈의 AI/ML 활용을 돕는 Paxata 지능형 데이터 전처리 플랫폼 (최문규 이사, PAXATA) :: AWS Techforum...Amazon Web Services Korea
 
Milomir Vojvodic - Business Analytics And Big Data Partner Forum Dubai 15.11.
Milomir Vojvodic - Business Analytics And Big Data Partner Forum Dubai 15.11.Milomir Vojvodic - Business Analytics And Big Data Partner Forum Dubai 15.11.
Milomir Vojvodic - Business Analytics And Big Data Partner Forum Dubai 15.11.Milomir Vojvodic
 
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Jeffrey T. Pollock
 
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBICC Thomas More
 
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...jdijcks
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaMarketingArrowECS_CZ
 
Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory Sergio Zenatti Filho
 
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB
 
Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Kent Graziano
 
Sap bw4 hana architecture archetypes
Sap bw4 hana architecture archetypesSap bw4 hana architecture archetypes
Sap bw4 hana architecture archetypesLuc Vanrobays
 
DataStax: Making a Difference with Smart Analytics
DataStax: Making a Difference with Smart AnalyticsDataStax: Making a Difference with Smart Analytics
DataStax: Making a Difference with Smart AnalyticsDataStax Academy
 
Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudKent Graziano
 
Building a marketing data lake
Building a marketing data lakeBuilding a marketing data lake
Building a marketing data lakeSumit Sarkar
 

Ähnlich wie Slides pentaho-hadoop-weka (20)

Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
 
SQL Server 2016 Everything built-in FULL deck
SQL Server 2016 Everything built-in FULL deckSQL Server 2016 Everything built-in FULL deck
SQL Server 2016 Everything built-in FULL deck
 
Big data for product managers
Big data for product managersBig data for product managers
Big data for product managers
 
Leveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessLeveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine Business
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup
 
엔터프라이즈의 AI/ML 활용을 돕는 Paxata 지능형 데이터 전처리 플랫폼 (최문규 이사, PAXATA) :: AWS Techforum...
엔터프라이즈의 AI/ML 활용을 돕는 Paxata 지능형 데이터 전처리 플랫폼 (최문규 이사, PAXATA) :: AWS Techforum...엔터프라이즈의 AI/ML 활용을 돕는 Paxata 지능형 데이터 전처리 플랫폼 (최문규 이사, PAXATA) :: AWS Techforum...
엔터프라이즈의 AI/ML 활용을 돕는 Paxata 지능형 데이터 전처리 플랫폼 (최문규 이사, PAXATA) :: AWS Techforum...
 
Milomir Vojvodic - Business Analytics And Big Data Partner Forum Dubai 15.11.
Milomir Vojvodic - Business Analytics And Big Data Partner Forum Dubai 15.11.Milomir Vojvodic - Business Analytics And Big Data Partner Forum Dubai 15.11.
Milomir Vojvodic - Business Analytics And Big Data Partner Forum Dubai 15.11.
 
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
 
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
 
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management Platforma
 
Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory
 
Extending Hortonworks with Oracle's Big Data Platform
Extending Hortonworks with Oracle's Big Data PlatformExtending Hortonworks with Oracle's Big Data Platform
Extending Hortonworks with Oracle's Big Data Platform
 
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
 
Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)
 
Sap bw4 hana architecture archetypes
Sap bw4 hana architecture archetypesSap bw4 hana architecture archetypes
Sap bw4 hana architecture archetypes
 
DataStax: Making a Difference with Smart Analytics
DataStax: Making a Difference with Smart AnalyticsDataStax: Making a Difference with Smart Analytics
DataStax: Making a Difference with Smart Analytics
 
Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data Cloud
 
Building a marketing data lake
Building a marketing data lakeBuilding a marketing data lake
Building a marketing data lake
 

Kürzlich hochgeladen

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Kürzlich hochgeladen (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Slides pentaho-hadoop-weka

  • 1. F**** around with Big Data and Predictive Analytics Featuring Kettle, Weka & Hadoop. © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 2. Pentahuh? 2© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 3. What’s Pentaho exactly? CENTRAL ADMINISTRATION, AUDITING & MONITORING DELIVER When & Where Users Need It STREAMLINE Information Delivery VISUALIZE & Report Information In Any Style ACCESS All Enterprise Data Sources ISV & Packaged Applications SaaS / Cloud Applications EMBEDDED Web Mobile Print E-Mail STANDALONE ‣ Advanced & Predictive Analytics DATA MINING ‣ Interactive ‣ Operational ‣ Enterprise REPORTING ‣ Ad hoc Exploration ‣ Multi-Dimensional ANALYSIS ‣ Interactive Metrics ‣ Rich Visualizations DASHBOARDS ERP / CRM / Enterprise Apps (e.g. SAP, Oracle) Hadoop & NoSQL Data Unstructured & semi-structured (XML, Excel, Files, etc.) Relational Data Sources Cloud (e.g. Salesforce, Amazon, Dell) ‣ Data Integration ‣ Graphical ETL Designer INTEGRATE, CLEANSE, & ENRICH DATA ‣ In Memory Caching ‣ High Performance ANALYTICS ACCELERATOR ‣ Direct Access ‣ Hadoop Clustering/ Scheduling ‣ Instant OLAP Cubes ‣ Enterprise Scalability
  • 4. We do open source analytics. 4© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 5. Why does Pentaho claim to have anything to do with Big Data?? 5© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 6. Project Kettle powerful Extraction, Transformation and Loading (ETL) capabilities using an innovative, metadata-driven approach 6© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 7. Bring the code to the data 7© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 JDBC
  • 8. Bring the code to the data 8© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 JDBCKettle
  • 9. KettleKettle Bring the code to the data 9© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Kettle
  • 10. Project Weka a comprehensive set of tools for machine learning and data mining 10© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 11. 11© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 12. 12© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 13. Bring Weka to the data 13© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Kettle Kettle JDBCKettle Kettle
  • 14. Bring Weka to the data 14© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 15. JDBC Services for Kettle runtime optimization and SQL pushdown 15© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 16. A smart(er) JDBC Layer 16© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Kettle Kettle Kettle Kettle JDBC SELECT CUSTOMER_ID, SUM(UNIT_SALES) FROM SALES_FACT WHERE AGE_GROUP_ID > 3 GROUP BY CUSTOMER_ID;
  • 17. SELECT CUSTOMER_ID FROM SALES_FACT; SELECT CUSTOMER_ID, SUM(UNIT_SALES) FROM SALES_FACT WHERE AGE_GROUP_ID > 3 GROUP BY CUSTOMER_ID; A smart(er) JDBC Layer 17© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Kettle Kettle Kettle Kettle Kettle JDBC Kettle Kettle
  • 18. The gains 18© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 • Job design and administration becomes trivial. • Runs the rich Kettle plugin environment directly on the nodes. • Performs much better than Hive. • The JDBC layer is pretty neat.
  • 19. The caveats 19© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 • True parallel machine learning algorithms are rare and hard to design. • Not an actual production-ready design. • Clients might have caches, which must be notified by the BD store for updates.
  • 20. © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 20 Demo!
  • 21. © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 21 Thank you! Join the conversation. You can find us on: blog.pentaho.com @Pentaho Facebook.com/Pentaho Pentaho Business Analytics
  • 22. © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 22 Want to learn more? Learning Linear Models in Hadoop with Weka http://markahall.blogspot.ca/2013/03/learning-linear-models-in-hadoop-with.html Introduction to MapReduce with Pentaho Data Integration http://www.youtube.com/watch?v=KZe1UugxXcs `