SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Reducing the Implementation Efforts of
Hadoop, NoSQL & Analytical Databases
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Goals for Today
2
To understand:
•  Landscape of Big Data
Databases Today
•  Current Approaches to
Implementation
•  How Pentaho
Reduces Time, Effort,
and Complexity
The Big Data Landscape
3
- Most businesses have
1 or more of these
databases
- Often used together
to form a complete
solution
- Typically created /
sold / supported by
different vendor
- Different programming
interfaces for
development and
administration
NoSQL
Hadoop Analytical
Transactional /DW
The Big Data Landscape
4
- Most businesses
have 1 or more of
these databases
- Often used together
to form a complete
solution
- Typically created /
sold / supported by
different vendor
- Different
programming
interfaces for
development and
administration
NoSQL
Hadoop
Analytical
Transactional /DW
Transactional Databases
•  Your everyday, garden-variety database
•  Used to store data for line of business applications (ERP, CRM, HR)
•  Serves as the “system of record” for business facts, e.g.
–  Inventory levels
–  Customer account status
–  Payroll data
•  Offers rich sources of analytical information (potential facts and
dimensions)
•  Fast and reliable for INSERT|UPDATE|DELETE operations
5
Hadoop, HDFS and Hive
•  Hadoop is a set of related open source Apache projects
•  MapReduce is a programming framework
•  HDFS is the Hadoop Distributed File System
–  Used to store structured (tabular) and unstructured files
•  Structured files in HDFS can be queried via Hive
–  Hive provides a basic SQL interface on top of HDFS files
–  Hive generates MapReduce programs to fulfill the SQL requests
•  Ideal for:
–  Unstructured data that doesn’t fit in a Transactional or NoSQL database
–  Email, twitter posts, images, voice logs, etc.
6
NoSQL Databases
•  Non-relational databases
–  No (or very limited) support for the SQL language
•  Developers use programs, not queries to access data
–  Flexible definition of structure (“extend as you go”)
•  Limited metadata
–  No referential integrity / limited support for transactions
•  But:
–  They can ingest (capture and persist) very large amounts of data with
very low overhead, across multiple servers
•  1000’s (to 100’s of thousands) of inserts per second
–  Excellent for large numbers of reads based on key values
•  Millions of reads per second
7
Analytical Databases
•  Databases designed for Business Intelligence
–  Offer full support for SQL and relational metadata
–  Use specialized techniques to improve query performance
•  Compression
•  Column-based storage (e.g. “M” or “F”)
•  Clusters of independent servers
•  Advanced mathematics for index & query optimization
–  Support high-speed “bulk” inserts of structured data
–  May be offered as standalone software or via an appliance
•  Ideal for:
–  Answering summary queries requiring complex filters, joins & groupings
–  Online Analytical Processing (OLAP) clients
8
Current Implementation Approaches
Complex and Time Consuming
9
Load/
Ingest
Manipulate/
Process Access Model
Visualize/
Analyze Predict Decisions
Ingest data
into big
data store
Make data
ready for
analysis
• Transform
• Aggregate
• Strucure-ize
Direct to
big data
store OR
Move slice
to DM/DW
Build
metadata
model:
•  Dimensions
•  Measures
•  Hierarchies
Report
Dashboard
Analyze
Pivot
Drill
Act, then
iterate
based on
insights
STRUCTURED DATA
CRM, POS, ERP,
etc.
UNSTRUCTURED DATA
Forecasting
Clustering
Classification
Association
Regression
Coding Coding Coding Coding CodingCoding
Current
Developer
Approach:
ETL Tool BI Tool Model
Builder
BI Tool Data
Mining Tool
Current
IT
Approach:
Coding Coding
A Smarter Approach
10
Load/
Ingest
Manipulate/
Process Access Model
Visualize/
Analyze Predict Decisions
Ingest data
into big
data store
Make data
ready for
analysis
• Transform
• Aggregate
• Strucure-ize
Direct to
big data
store OR
Move slice
to DM/DW
Build
metadata
model:
•  Dimensions
•  Measures
•  Hierarchies
Report
Dashboard
Analyze
Pivot
Drill
Act, then
iterate
based on
insights
STRUCTURED DATA
CRM, POS, ERP,
etc.
UNSTRUCTURED DATA
Forecasting
Clustering
Classification
Association
Regression
Coding Coding Coding Coding CodingCoding
Current
Developer
Approach:
ETL Tool BI Tool Model
Builder
BI Tool Data
Mining Tool
Current
IT
Approach:
Coding Coding
Business Analytics for Big Data
Complete platform from data to analytics
–  Continuity of one platform
–  Eliminate disjointed steps
–  Days instead of weeks
High productivity visual development
–  Use existing skills for Hadoop
–  Visual development for MapReduce jobs
–  Visual development for Sqoop, Oozie
Fast performance within Hadoop
–  Parallel execution of MapReduce
jobs across the Hadoop cluster
Visual Data
Preparation
Coding VS. Pentaho MapReduce
11
Pentaho Visual Development for Big Data
15x Faster to Design & Deploy Big Data Analytics
Sample Scenario
•  A telecommunication company needs to:
–  Ingest, store and retrieve call detail records (CDRs)
from multiple digital switches
–  Maintain a detailed profile of its commercial
customers
–  Analyze the volume of calls over time by SIC code,
region, state, county and city
•  Which database technology would it use for each of
these applications?
12
Pentaho Data Integration for Big Data
13
•  Transform the data for reporting without coding
•  Execute in the Hadoop Cluster without java coding
•  Visually orchestrate job workflows
•  Visualize and analyze the data
Leverage Native Capabilities of Big Data Vendors
Complete Visual Data Management
and Big Data Analytics
Hadoop NoSQL Analytic Databases
Data Discovery,
Visualization
Enterprise & Ad
Hoc Reporting
Data Ingestion,
Manipulation,
Integration
Predictive
Analytics
14
Big Data Fabric

Weitere ähnliche Inhalte

Was ist angesagt?

Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho
 
Embedded Analytics in Customer Success
Embedded Analytics in Customer SuccessEmbedded Analytics in Customer Success
Embedded Analytics in Customer SuccessPentaho
 
Big Data Predictions for 2015
Big Data Predictions for 2015 Big Data Predictions for 2015
Big Data Predictions for 2015 Pentaho
 
Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Pentaho
 
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]Pentaho
 
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...MongoDB
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB
 
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaCloudera, Inc.
 
Open Analytics 2014 - Pedro Alves - Innovation though Open Source
Open Analytics 2014 - Pedro Alves - Innovation though Open SourceOpen Analytics 2014 - Pedro Alves - Innovation though Open Source
Open Analytics 2014 - Pedro Alves - Innovation though Open SourceOpenAnalytics Spain
 
BI congres 2016-4: Hoe groei je als organisatie in analytische maturiteit? - ...
BI congres 2016-4: Hoe groei je als organisatie in analytische maturiteit? - ...BI congres 2016-4: Hoe groei je als organisatie in analytische maturiteit? - ...
BI congres 2016-4: Hoe groei je als organisatie in analytische maturiteit? - ...BICC Thomas More
 
MongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, Pentaho
MongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, PentahoMongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, Pentaho
MongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, PentahoMongoDB
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseDataWorks Summit
 
BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers...
BI congres 2016-2: Diving into weblog data with SAS on Hadoop -  Lisa Truyers...BI congres 2016-2: Diving into weblog data with SAS on Hadoop -  Lisa Truyers...
BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers...BICC Thomas More
 
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBICC Thomas More
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonCapgemini
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data DiscoveryHarald Erb
 

Was ist angesagt? (20)

Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
 
Embedded Analytics in Customer Success
Embedded Analytics in Customer SuccessEmbedded Analytics in Customer Success
Embedded Analytics in Customer Success
 
Big Data Predictions for 2015
Big Data Predictions for 2015 Big Data Predictions for 2015
Big Data Predictions for 2015
 
Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica
 
Big Data for BI - Beyond the Hype - Pentaho
Big Data for BI - Beyond the Hype - PentahoBig Data for BI - Beyond the Hype - Pentaho
Big Data for BI - Beyond the Hype - Pentaho
 
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
 
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
 
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and Cloudera
 
Open Analytics 2014 - Pedro Alves - Innovation though Open Source
Open Analytics 2014 - Pedro Alves - Innovation though Open SourceOpen Analytics 2014 - Pedro Alves - Innovation though Open Source
Open Analytics 2014 - Pedro Alves - Innovation though Open Source
 
BI congres 2016-4: Hoe groei je als organisatie in analytische maturiteit? - ...
BI congres 2016-4: Hoe groei je als organisatie in analytische maturiteit? - ...BI congres 2016-4: Hoe groei je als organisatie in analytische maturiteit? - ...
BI congres 2016-4: Hoe groei je als organisatie in analytische maturiteit? - ...
 
MongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, Pentaho
MongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, PentahoMongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, Pentaho
MongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, Pentaho
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
 
BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers...
BI congres 2016-2: Diving into weblog data with SAS on Hadoop -  Lisa Truyers...BI congres 2016-2: Diving into weblog data with SAS on Hadoop -  Lisa Truyers...
BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers...
 
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data Discovery
 
Stream based Data Integration
Stream based Data IntegrationStream based Data Integration
Stream based Data Integration
 

Andere mochten auch

Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity Pentaho
 
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Introduction to streaming and messaging  flume,kafka,SQS,kinesis Introduction to streaming and messaging  flume,kafka,SQS,kinesis
Introduction to streaming and messaging flume,kafka,SQS,kinesis Omid Vahdaty
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Roland Bouman
 
Kettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolKettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolAlex Rayón Jerez
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Chris Fregly
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introductionmattcasters
 
Data Mashups for Analytics
Data Mashups for AnalyticsData Mashups for Analytics
Data Mashups for AnalyticsPentaho
 

Andere mochten auch (8)

Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity
 
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Introduction to streaming and messaging  flume,kafka,SQS,kinesis Introduction to streaming and messaging  flume,kafka,SQS,kinesis
Introduction to streaming and messaging flume,kafka,SQS,kinesis
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
 
Kettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolKettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration tool
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introduction
 
Data Mashups for Analytics
Data Mashups for AnalyticsData Mashups for Analytics
Data Mashups for Analytics
 
Introduction To Pentaho
Introduction To PentahoIntroduction To Pentaho
Introduction To Pentaho
 

Ähnlich wie Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQL and Analytical Databases

Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 
Enterprise Data Warehousing Positioning
Enterprise Data Warehousing PositioningEnterprise Data Warehousing Positioning
Enterprise Data Warehousing PositioningEdenH6
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesQubole
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?James Serra
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseRizaldy Ignacio
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & HadoopBlackvard
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overviewRohit Jain
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence ArchitecturePhilippe Julio
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudJames Serra
 

Ähnlich wie Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQL and Analytical Databases (20)

Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Enterprise Data Warehousing Positioning
Enterprise Data Warehousing PositioningEnterprise Data Warehousing Positioning
Enterprise Data Warehousing Positioning
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slides
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
4AA6-4492ENW
4AA6-4492ENW4AA6-4492ENW
4AA6-4492ENW
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence Architecture
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 

Mehr von Pentaho

Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview PresentationFilling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview PresentationPentaho
 
The Next Big Thing in Big Data
The Next Big Thing in Big DataThe Next Big Thing in Big Data
The Next Big Thing in Big DataPentaho
 
Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Pentaho
 
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...Pentaho
 
Pentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho
 
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcarePentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcarePentaho
 
Bay Area Hadoop User Group
Bay Area Hadoop User GroupBay Area Hadoop User Group
Bay Area Hadoop User GroupPentaho
 

Mehr von Pentaho (7)

Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview PresentationFilling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
 
The Next Big Thing in Big Data
The Next Big Thing in Big DataThe Next Big Thing in Big Data
The Next Big Thing in Big Data
 
Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics
 
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
 
Pentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho Healthcare Solutions
Pentaho Healthcare Solutions
 
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcarePentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
 
Bay Area Hadoop User Group
Bay Area Hadoop User GroupBay Area Hadoop User Group
Bay Area Hadoop User Group
 

Kürzlich hochgeladen

PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServiceRenan Moreira de Oliveira
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceMartin Humpolec
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIUdaiappa Ramachandran
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 

Kürzlich hochgeladen (20)

PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your Salesforce
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 

Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQL and Analytical Databases

  • 1. Reducing the Implementation Efforts of Hadoop, NoSQL & Analytical Databases © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 2. Goals for Today 2 To understand: •  Landscape of Big Data Databases Today •  Current Approaches to Implementation •  How Pentaho Reduces Time, Effort, and Complexity
  • 3. The Big Data Landscape 3 - Most businesses have 1 or more of these databases - Often used together to form a complete solution - Typically created / sold / supported by different vendor - Different programming interfaces for development and administration NoSQL Hadoop Analytical Transactional /DW
  • 4. The Big Data Landscape 4 - Most businesses have 1 or more of these databases - Often used together to form a complete solution - Typically created / sold / supported by different vendor - Different programming interfaces for development and administration NoSQL Hadoop Analytical Transactional /DW
  • 5. Transactional Databases •  Your everyday, garden-variety database •  Used to store data for line of business applications (ERP, CRM, HR) •  Serves as the “system of record” for business facts, e.g. –  Inventory levels –  Customer account status –  Payroll data •  Offers rich sources of analytical information (potential facts and dimensions) •  Fast and reliable for INSERT|UPDATE|DELETE operations 5
  • 6. Hadoop, HDFS and Hive •  Hadoop is a set of related open source Apache projects •  MapReduce is a programming framework •  HDFS is the Hadoop Distributed File System –  Used to store structured (tabular) and unstructured files •  Structured files in HDFS can be queried via Hive –  Hive provides a basic SQL interface on top of HDFS files –  Hive generates MapReduce programs to fulfill the SQL requests •  Ideal for: –  Unstructured data that doesn’t fit in a Transactional or NoSQL database –  Email, twitter posts, images, voice logs, etc. 6
  • 7. NoSQL Databases •  Non-relational databases –  No (or very limited) support for the SQL language •  Developers use programs, not queries to access data –  Flexible definition of structure (“extend as you go”) •  Limited metadata –  No referential integrity / limited support for transactions •  But: –  They can ingest (capture and persist) very large amounts of data with very low overhead, across multiple servers •  1000’s (to 100’s of thousands) of inserts per second –  Excellent for large numbers of reads based on key values •  Millions of reads per second 7
  • 8. Analytical Databases •  Databases designed for Business Intelligence –  Offer full support for SQL and relational metadata –  Use specialized techniques to improve query performance •  Compression •  Column-based storage (e.g. “M” or “F”) •  Clusters of independent servers •  Advanced mathematics for index & query optimization –  Support high-speed “bulk” inserts of structured data –  May be offered as standalone software or via an appliance •  Ideal for: –  Answering summary queries requiring complex filters, joins & groupings –  Online Analytical Processing (OLAP) clients 8
  • 9. Current Implementation Approaches Complex and Time Consuming 9 Load/ Ingest Manipulate/ Process Access Model Visualize/ Analyze Predict Decisions Ingest data into big data store Make data ready for analysis • Transform • Aggregate • Strucure-ize Direct to big data store OR Move slice to DM/DW Build metadata model: •  Dimensions •  Measures •  Hierarchies Report Dashboard Analyze Pivot Drill Act, then iterate based on insights STRUCTURED DATA CRM, POS, ERP, etc. UNSTRUCTURED DATA Forecasting Clustering Classification Association Regression Coding Coding Coding Coding CodingCoding Current Developer Approach: ETL Tool BI Tool Model Builder BI Tool Data Mining Tool Current IT Approach: Coding Coding
  • 10. A Smarter Approach 10 Load/ Ingest Manipulate/ Process Access Model Visualize/ Analyze Predict Decisions Ingest data into big data store Make data ready for analysis • Transform • Aggregate • Strucure-ize Direct to big data store OR Move slice to DM/DW Build metadata model: •  Dimensions •  Measures •  Hierarchies Report Dashboard Analyze Pivot Drill Act, then iterate based on insights STRUCTURED DATA CRM, POS, ERP, etc. UNSTRUCTURED DATA Forecasting Clustering Classification Association Regression Coding Coding Coding Coding CodingCoding Current Developer Approach: ETL Tool BI Tool Model Builder BI Tool Data Mining Tool Current IT Approach: Coding Coding Business Analytics for Big Data
  • 11. Complete platform from data to analytics –  Continuity of one platform –  Eliminate disjointed steps –  Days instead of weeks High productivity visual development –  Use existing skills for Hadoop –  Visual development for MapReduce jobs –  Visual development for Sqoop, Oozie Fast performance within Hadoop –  Parallel execution of MapReduce jobs across the Hadoop cluster Visual Data Preparation Coding VS. Pentaho MapReduce 11 Pentaho Visual Development for Big Data 15x Faster to Design & Deploy Big Data Analytics
  • 12. Sample Scenario •  A telecommunication company needs to: –  Ingest, store and retrieve call detail records (CDRs) from multiple digital switches –  Maintain a detailed profile of its commercial customers –  Analyze the volume of calls over time by SIC code, region, state, county and city •  Which database technology would it use for each of these applications? 12
  • 13. Pentaho Data Integration for Big Data 13 •  Transform the data for reporting without coding •  Execute in the Hadoop Cluster without java coding •  Visually orchestrate job workflows •  Visualize and analyze the data Leverage Native Capabilities of Big Data Vendors
  • 14. Complete Visual Data Management and Big Data Analytics Hadoop NoSQL Analytic Databases Data Discovery, Visualization Enterprise & Ad Hoc Reporting Data Ingestion, Manipulation, Integration Predictive Analytics 14 Big Data Fabric