SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Reducing the Implementation Efforts of
Hadoop, NoSQL & Analytical Databases
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Goals for Today
2
To understand:
•  Landscape of Big Data
Databases Today
•  Current Approaches to
Implementation
•  How Pentaho
Reduces Time, Effort,
and Complexity
The Big Data Landscape
3
- Most businesses have
1 or more of these
databases
- Often used together
to form a complete
solution
- Typically created /
sold / supported by
different vendor
- Different programming
interfaces for
development and
administration
NoSQL
Hadoop Analytical
Transactional /DW
The Big Data Landscape
4
- Most businesses
have 1 or more of
these databases
- Often used together
to form a complete
solution
- Typically created /
sold / supported by
different vendor
- Different
programming
interfaces for
development and
administration
NoSQL
Hadoop
Analytical
Transactional /DW
Transactional Databases
•  Your everyday, garden-variety database
•  Used to store data for line of business applications (ERP, CRM, HR)
•  Serves as the “system of record” for business facts, e.g.
–  Inventory levels
–  Customer account status
–  Payroll data
•  Offers rich sources of analytical information (potential facts and
dimensions)
•  Fast and reliable for INSERT|UPDATE|DELETE operations
5
Hadoop, HDFS and Hive
•  Hadoop is a set of related open source Apache projects
•  MapReduce is a programming framework
•  HDFS is the Hadoop Distributed File System
–  Used to store structured (tabular) and unstructured files
•  Structured files in HDFS can be queried via Hive
–  Hive provides a basic SQL interface on top of HDFS files
–  Hive generates MapReduce programs to fulfill the SQL requests
•  Ideal for:
–  Unstructured data that doesn’t fit in a Transactional or NoSQL database
–  Email, twitter posts, images, voice logs, etc.
6
NoSQL Databases
•  Non-relational databases
–  No (or very limited) support for the SQL language
•  Developers use programs, not queries to access data
–  Flexible definition of structure (“extend as you go”)
•  Limited metadata
–  No referential integrity / limited support for transactions
•  But:
–  They can ingest (capture and persist) very large amounts of data with
very low overhead, across multiple servers
•  1000’s (to 100’s of thousands) of inserts per second
–  Excellent for large numbers of reads based on key values
•  Millions of reads per second
7
Analytical Databases
•  Databases designed for Business Intelligence
–  Offer full support for SQL and relational metadata
–  Use specialized techniques to improve query performance
•  Compression
•  Column-based storage (e.g. “M” or “F”)
•  Clusters of independent servers
•  Advanced mathematics for index & query optimization
–  Support high-speed “bulk” inserts of structured data
–  May be offered as standalone software or via an appliance
•  Ideal for:
–  Answering summary queries requiring complex filters, joins & groupings
–  Online Analytical Processing (OLAP) clients
8
Current Implementation Approaches
Complex and Time Consuming
9
Load/
Ingest
Manipulate/
Process Access Model
Visualize/
Analyze Predict Decisions
Ingest data
into big
data store
Make data
ready for
analysis
• Transform
• Aggregate
• Strucure-ize
Direct to
big data
store OR
Move slice
to DM/DW
Build
metadata
model:
•  Dimensions
•  Measures
•  Hierarchies
Report
Dashboard
Analyze
Pivot
Drill
Act, then
iterate
based on
insights
STRUCTURED DATA
CRM, POS, ERP,
etc.
UNSTRUCTURED DATA
Forecasting
Clustering
Classification
Association
Regression
Coding Coding Coding Coding CodingCoding
Current
Developer
Approach:
ETL Tool BI Tool Model
Builder
BI Tool Data
Mining Tool
Current
IT
Approach:
Coding Coding
A Smarter Approach
10
Load/
Ingest
Manipulate/
Process Access Model
Visualize/
Analyze Predict Decisions
Ingest data
into big
data store
Make data
ready for
analysis
• Transform
• Aggregate
• Strucure-ize
Direct to
big data
store OR
Move slice
to DM/DW
Build
metadata
model:
•  Dimensions
•  Measures
•  Hierarchies
Report
Dashboard
Analyze
Pivot
Drill
Act, then
iterate
based on
insights
STRUCTURED DATA
CRM, POS, ERP,
etc.
UNSTRUCTURED DATA
Forecasting
Clustering
Classification
Association
Regression
Coding Coding Coding Coding CodingCoding
Current
Developer
Approach:
ETL Tool BI Tool Model
Builder
BI Tool Data
Mining Tool
Current
IT
Approach:
Coding Coding
Business Analytics for Big Data
Complete platform from data to analytics
–  Continuity of one platform
–  Eliminate disjointed steps
–  Days instead of weeks
High productivity visual development
–  Use existing skills for Hadoop
–  Visual development for MapReduce jobs
–  Visual development for Sqoop, Oozie
Fast performance within Hadoop
–  Parallel execution of MapReduce
jobs across the Hadoop cluster
Visual Data
Preparation
Coding VS. Pentaho MapReduce
11
Pentaho Visual Development for Big Data
15x Faster to Design & Deploy Big Data Analytics
Sample Scenario
•  A telecommunication company needs to:
–  Ingest, store and retrieve call detail records (CDRs)
from multiple digital switches
–  Maintain a detailed profile of its commercial
customers
–  Analyze the volume of calls over time by SIC code,
region, state, county and city
•  Which database technology would it use for each of
these applications?
12
Pentaho Data Integration for Big Data
13
•  Transform the data for reporting without coding
•  Execute in the Hadoop Cluster without java coding
•  Visually orchestrate job workflows
•  Visualize and analyze the data
Leverage Native Capabilities of Big Data Vendors
Complete Visual Data Management
and Big Data Analytics
Hadoop NoSQL Analytic Databases
Data Discovery,
Visualization
Enterprise & Ad
Hoc Reporting
Data Ingestion,
Manipulation,
Integration
Predictive
Analytics
14
Big Data Fabric

Weitere ähnliche Inhalte

Was ist angesagt?

Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
MongoDB
 
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
DataWorks Summit
 

Was ist angesagt? (20)

Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
 
Embedded Analytics in Customer Success
Embedded Analytics in Customer SuccessEmbedded Analytics in Customer Success
Embedded Analytics in Customer Success
 
Big Data Predictions for 2015
Big Data Predictions for 2015 Big Data Predictions for 2015
Big Data Predictions for 2015
 
Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica
 
Big Data for BI - Beyond the Hype - Pentaho
Big Data for BI - Beyond the Hype - PentahoBig Data for BI - Beyond the Hype - Pentaho
Big Data for BI - Beyond the Hype - Pentaho
 
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]Competitive edgewithmongod bandpentaho_2014sep_v3[1]
Competitive edgewithmongod bandpentaho_2014sep_v3[1]
 
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
 
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and Cloudera
 
Open Analytics 2014 - Pedro Alves - Innovation though Open Source
Open Analytics 2014 - Pedro Alves - Innovation though Open SourceOpen Analytics 2014 - Pedro Alves - Innovation though Open Source
Open Analytics 2014 - Pedro Alves - Innovation though Open Source
 
BI congres 2016-4: Hoe groei je als organisatie in analytische maturiteit? - ...
BI congres 2016-4: Hoe groei je als organisatie in analytische maturiteit? - ...BI congres 2016-4: Hoe groei je als organisatie in analytische maturiteit? - ...
BI congres 2016-4: Hoe groei je als organisatie in analytische maturiteit? - ...
 
MongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, Pentaho
MongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, PentahoMongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, Pentaho
MongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, Pentaho
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
 
BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers...
BI congres 2016-2: Diving into weblog data with SAS on Hadoop -  Lisa Truyers...BI congres 2016-2: Diving into weblog data with SAS on Hadoop -  Lisa Truyers...
BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers...
 
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data Discovery
 
Stream based Data Integration
Stream based Data IntegrationStream based Data Integration
Stream based Data Integration
 

Andere mochten auch

Andere mochten auch (8)

Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity
 
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Introduction to streaming and messaging  flume,kafka,SQS,kinesis Introduction to streaming and messaging  flume,kafka,SQS,kinesis
Introduction to streaming and messaging flume,kafka,SQS,kinesis
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
 
Kettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolKettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration tool
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introduction
 
Data Mashups for Analytics
Data Mashups for AnalyticsData Mashups for Analytics
Data Mashups for Analytics
 
Introduction To Pentaho
Introduction To PentahoIntroduction To Pentaho
Introduction To Pentaho
 

Ähnlich wie Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQL and Analytical Databases

Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 

Ähnlich wie Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQL and Analytical Databases (20)

Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Enterprise Data Warehousing Positioning
Enterprise Data Warehousing PositioningEnterprise Data Warehousing Positioning
Enterprise Data Warehousing Positioning
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slides
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
4AA6-4492ENW
4AA6-4492ENW4AA6-4492ENW
4AA6-4492ENW
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence Architecture
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 

Mehr von Pentaho

Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics
Pentaho
 
Pentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho Healthcare Solutions
Pentaho Healthcare Solutions
Pentaho
 
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcarePentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho
 

Mehr von Pentaho (7)

Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview PresentationFilling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
 
The Next Big Thing in Big Data
The Next Big Thing in Big DataThe Next Big Thing in Big Data
The Next Big Thing in Big Data
 
Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics
 
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
 
Pentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho Healthcare Solutions
Pentaho Healthcare Solutions
 
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcarePentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
 
Bay Area Hadoop User Group
Bay Area Hadoop User GroupBay Area Hadoop User Group
Bay Area Hadoop User Group
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQL and Analytical Databases

  • 1. Reducing the Implementation Efforts of Hadoop, NoSQL & Analytical Databases © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 2. Goals for Today 2 To understand: •  Landscape of Big Data Databases Today •  Current Approaches to Implementation •  How Pentaho Reduces Time, Effort, and Complexity
  • 3. The Big Data Landscape 3 - Most businesses have 1 or more of these databases - Often used together to form a complete solution - Typically created / sold / supported by different vendor - Different programming interfaces for development and administration NoSQL Hadoop Analytical Transactional /DW
  • 4. The Big Data Landscape 4 - Most businesses have 1 or more of these databases - Often used together to form a complete solution - Typically created / sold / supported by different vendor - Different programming interfaces for development and administration NoSQL Hadoop Analytical Transactional /DW
  • 5. Transactional Databases •  Your everyday, garden-variety database •  Used to store data for line of business applications (ERP, CRM, HR) •  Serves as the “system of record” for business facts, e.g. –  Inventory levels –  Customer account status –  Payroll data •  Offers rich sources of analytical information (potential facts and dimensions) •  Fast and reliable for INSERT|UPDATE|DELETE operations 5
  • 6. Hadoop, HDFS and Hive •  Hadoop is a set of related open source Apache projects •  MapReduce is a programming framework •  HDFS is the Hadoop Distributed File System –  Used to store structured (tabular) and unstructured files •  Structured files in HDFS can be queried via Hive –  Hive provides a basic SQL interface on top of HDFS files –  Hive generates MapReduce programs to fulfill the SQL requests •  Ideal for: –  Unstructured data that doesn’t fit in a Transactional or NoSQL database –  Email, twitter posts, images, voice logs, etc. 6
  • 7. NoSQL Databases •  Non-relational databases –  No (or very limited) support for the SQL language •  Developers use programs, not queries to access data –  Flexible definition of structure (“extend as you go”) •  Limited metadata –  No referential integrity / limited support for transactions •  But: –  They can ingest (capture and persist) very large amounts of data with very low overhead, across multiple servers •  1000’s (to 100’s of thousands) of inserts per second –  Excellent for large numbers of reads based on key values •  Millions of reads per second 7
  • 8. Analytical Databases •  Databases designed for Business Intelligence –  Offer full support for SQL and relational metadata –  Use specialized techniques to improve query performance •  Compression •  Column-based storage (e.g. “M” or “F”) •  Clusters of independent servers •  Advanced mathematics for index & query optimization –  Support high-speed “bulk” inserts of structured data –  May be offered as standalone software or via an appliance •  Ideal for: –  Answering summary queries requiring complex filters, joins & groupings –  Online Analytical Processing (OLAP) clients 8
  • 9. Current Implementation Approaches Complex and Time Consuming 9 Load/ Ingest Manipulate/ Process Access Model Visualize/ Analyze Predict Decisions Ingest data into big data store Make data ready for analysis • Transform • Aggregate • Strucure-ize Direct to big data store OR Move slice to DM/DW Build metadata model: •  Dimensions •  Measures •  Hierarchies Report Dashboard Analyze Pivot Drill Act, then iterate based on insights STRUCTURED DATA CRM, POS, ERP, etc. UNSTRUCTURED DATA Forecasting Clustering Classification Association Regression Coding Coding Coding Coding CodingCoding Current Developer Approach: ETL Tool BI Tool Model Builder BI Tool Data Mining Tool Current IT Approach: Coding Coding
  • 10. A Smarter Approach 10 Load/ Ingest Manipulate/ Process Access Model Visualize/ Analyze Predict Decisions Ingest data into big data store Make data ready for analysis • Transform • Aggregate • Strucure-ize Direct to big data store OR Move slice to DM/DW Build metadata model: •  Dimensions •  Measures •  Hierarchies Report Dashboard Analyze Pivot Drill Act, then iterate based on insights STRUCTURED DATA CRM, POS, ERP, etc. UNSTRUCTURED DATA Forecasting Clustering Classification Association Regression Coding Coding Coding Coding CodingCoding Current Developer Approach: ETL Tool BI Tool Model Builder BI Tool Data Mining Tool Current IT Approach: Coding Coding Business Analytics for Big Data
  • 11. Complete platform from data to analytics –  Continuity of one platform –  Eliminate disjointed steps –  Days instead of weeks High productivity visual development –  Use existing skills for Hadoop –  Visual development for MapReduce jobs –  Visual development for Sqoop, Oozie Fast performance within Hadoop –  Parallel execution of MapReduce jobs across the Hadoop cluster Visual Data Preparation Coding VS. Pentaho MapReduce 11 Pentaho Visual Development for Big Data 15x Faster to Design & Deploy Big Data Analytics
  • 12. Sample Scenario •  A telecommunication company needs to: –  Ingest, store and retrieve call detail records (CDRs) from multiple digital switches –  Maintain a detailed profile of its commercial customers –  Analyze the volume of calls over time by SIC code, region, state, county and city •  Which database technology would it use for each of these applications? 12
  • 13. Pentaho Data Integration for Big Data 13 •  Transform the data for reporting without coding •  Execute in the Hadoop Cluster without java coding •  Visually orchestrate job workflows •  Visualize and analyze the data Leverage Native Capabilities of Big Data Vendors
  • 14. Complete Visual Data Management and Big Data Analytics Hadoop NoSQL Analytic Databases Data Discovery, Visualization Enterprise & Ad Hoc Reporting Data Ingestion, Manipulation, Integration Predictive Analytics 14 Big Data Fabric