SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Downloaden Sie, um offline zu lesen
Mark Rittman, Oracle ACE Director
THE FUTURE OF ANALYTICS, DATA INTEGRATION
AND BI ON BIG DATA PLATFORMS
HADOOP USER GROUP IRELAND (HUG IRL)
Dublin, September 2016
•Mark Rittman, Co-Founder of Rittman Mead
•Oracle ACE Director, specialising in Oracle BI&DW
•14 Years Experience with Oracle Technology
•Regular columnist for Oracle Magazine
•Author of two Oracle Press Oracle BI books
•Oracle Business Intelligence Developers Guide
•Oracle Exalytics Revealed
•Writer for Rittman Mead Blog :

http://www.rittmanmead.com/blog
•Email : mark.rittman@rittmanmead.com
•Twitter : @markrittman
About the Speaker
2
OR AS I SAY AT PARTIES…
3
4
BUT SERIOUSLY…
5
•Started back in 1996 on a bank Oracle DW project
•Our tools were Oracle 7.3.4, SQL*Plus, PL/SQL 

and shell scripts
•Went on to use Oracle Developer/2000 and Designer/2000
•Our initial users queried the DW using SQL*Plus
•And later on, we rolled-out Discoverer/2000 to everyone else
•And life was fun…
20 Years in Old-school BI & Data Warehousing
6
•Data warehouses provided a unified view of the business
•Single place to store key data and metrics
•Joined-up view of the business
•Aggregates and conformed dimensions
•ETL routines to load, cleanse and conform data
•BI tools for simple, guided access to information
•Tabular data access using SQL-generating tools
•Drill paths, hierarchies, facts, attributes
•Fast access to pre-computed aggregates
•Packaged BI for fast-start ERP analytics
Data Warehouses and Enterprise BI Tools
7
Oracle
MongoDB
Oracle
Sybase
IBM	DB/2
MS	SQL	
MS	SQL	Server
Core	ERP	Platform
Retail	
Banking	
Call	Center	
E-Commerce	
CRM	


Business	
Intelligence	
Tools


Data	Warehouse
Access	&

Performance

Layer
ODS	/

Foundation

Layer
7
•Examples were Crystal Reports, Oracle Reports, Cognos Impromptu, Business Objects
•Report written against carefully-curated BI dataset, or directly connecting to ERP/CRM
•Adding data from external sources, or other RDBMSs,

was difficult and involved IT resources
•Report-writing was a skilled job
•High ongoing cost for maintenance and changes
•Little scope for analysis, predictive modeling
•Often user frustration and pace of delivery
Reporting Back Then…
8 8
•For example Oracle OBIEE, SAP Business Objects, IBM Cognos
•Full-featured, IT-orientated enterprise BI platforms
•Metadata layers, integrated security, web delivery
•Pre-build ERP metadata layers, dashboards + reports
•Federated queries across multiple sources
•Single version of the truth across the enterprise
•Mobile, web dashboards, alerts, published reports
•Integration with SOA and web services
Then Came Enterprise BI Tools
10 10
THEN CAME … BIG DATA
11
AND HADOOP
13
BIG, FAST AND FAULT-TOLERANT
14
•Data from new-world applications is not like historic data
•Typically comes in non-tabular form
•JSON, log files, key/value pairs
•Users often want it speculatively
•Haven’t thought it through
•Schema can evolve
•Or maybe there isn’t one
•But the end-users want it now
•Not when you’re ready
But Why Hadoop? Reason #1 - Flexible Storage
16
Big	Data	Management	Platform
Discovery	&	Development	Labs
Safe	&	secure	Discovery	and	Development	environment
Data	sets	and	
samples
Models	and	
programs
Single	Customer	View
Enriched	
Customer	Profile
Correlating
Modeling
Machine
Learning
Scoring
Schema-on
Read	Analysis
•Enterprise High-End RDBMSs such as Oracle can scale
•Clustering for single-instance DBs can scale to >PB
•Exadata scales further by offloading queries to storage
•Sharded databases (e.g. Netezza) can scale further
•But cost (and complexity) become limiting factors
•Typically $1m/node is not uncommon
But Why Hadoop? Reason #2 - Massive Scalability
17
•Hadoop started by being synonymous with MapReduce, and Java coding
•But YARN (Yet another Resource Negotiator) broke this dependency
•Modern Hadoop platforms provide overall cluster resource management,

but support multiple processing frameworks
•General-purpose (e.g. MapReduce)
•Graph processing
•Machine Learning
•Real-Time Processing (Spark Streaming, Storm)
•Even the Hadoop resource management framework

can be swapped out
•Apache Mesos
Reason #3 - Processing Frameworks
18
Big	Data	Platform	-	All	Running	Natively	Under	Hadoop
YARN	(Cluster	Resource	Management)
Batch

(MapReduce)
HDFS	(Cluster	Filesystem	holding	raw	data)
Interactive

(Impala,	Drill,

Tez,	Presto)
Streaming	+

In-Memory

(Spark,	Storm)
Graph	+	Search

(Solr,	Giraph)
Enriched	

Customer	Profile
Modeling
Scoring
•Data now landed in Hadoop clusters, NoSQL databases and Cloud Storage
•Flexible data storage platform with cheap storage, flexible schema support + compute
•Data lands in the data lake or reservoir in raw form, then minimally processed
•Data then accessed directly by “data scientists”, or processed further into DW
Meet the New Data Warehouse : The “Data Lake”
19
Data	Transfer Data	Access
Data	Factory
Data	Reservoir
Business	
Intelligence	Tools
Hadoop	Platform
File	Based	
Integration
Stream	
Based	
Integration
Data	streams
Discovery	&	Development	Labs
Safe	&	secure	Discovery	and	Development	
environment
Data	sets	and	
samples
Models	and	
programs
Marketing	/
Sales	Applications
Models
Machine
Learning
Segments
Operational	Data
Transactions
Customer
Master	ata
Unstructured	Data
Voice	+	Chat	
Transcripts
ETL	Based
Integration
Raw	
Customer	Data
Data	stored	in	
the	original	
format	(usually	
files)		such	as	
SS7,	ASN.1,	
JSON	etc.
Mapped	
Customer	Data
Data	sets	
produced	by	
mapping	and	
transforming	
raw	data
NEW STARTUPS ENABLING A HYBRID 

“OLD WORLD/NEW WORLD” APPROACH
20
AND PERFECT FOR ANALYTICS
22
•Enterprise High-End RDBMSs such as Oracle can scale into the petabytes, using clustering
•Sharded databases (e.g. Netezza) can scale further but with complexity / single workload
trade-offs
•Hadoop was designed from outside for massive horizontal scalability - using cheap hardware
•Anticipates hardware failure and makes multiple copies of data as protection
•More nodes you add, more stable it becomes
•And at a fraction of the cost of traditional

RDBMS platforms
Hadoop : The Default Platform Today for Analytics
23
BI INNOVATION IS HAPPENING

AROUND HADOOP
24
“WE’RE WINNING!”
27
BUT…
29
isn’t Hadoop Slow?
too slow

for ad-hoc querying?
WELCOME TO 2016
32
(HADOOP 2.0)
35
HADOOP IS NOW FAST
37
Hadoop 2.0 Processing Frameworks + Tools
38
•Cloudera’s answer to Hive query response time issues
•MPP SQL query engine running on Hadoop, bypasses MapReduce for direct data access
•Mostly in-memory, but spills to disk if required
•Uses Hive metastore to access Hive table metadata
•Similar SQL dialect to Hive - not as rich though and no support for Hive SerDes, storage
handlers etc
Cloudera Impala - Fast, MPP-style Access to Hadoop Data
39
•Beginners usually store data in HDFS using text file formats (CSV) but these have limitations
•Apache AVRO often used for general-purpose processing
•Splitability, schema evolution, in-built metadata, support for block compression
•Parquet now commonly used with Impala due to column-orientated storage
•Mirrors work in RDBMS world around column-store
•Only return (project) the columns you require across a wide table
Parquet - Column-Orientated Storage for Analytics
40
•But Parquet (and HDFS) have significant limitation for real-time analytics applications
•Append-only orientation, focus on column-store 

makes streaming ingestion harder
•Cloudera Kudu aims to combine 

best of HDFS + HBase
•Real-time analytics-optimised
•Supports updates to data
•Fast ingestion of data
•Accessed using SQL-style tables

and get/put/update/delete API
Cloudera Kudu - Best of HBase and Column-Store
41
•Kudu storage used with Impala - create tables using Kudu storage handler
•Can now UPDATE, DELETE and INSERT into Hadoop tables, not just SELECT and LOAD
DATA
Example Impala DDL + DML Commands with Kudu
42
CREATE TABLE `my_first_table` (
`id` BIGINT,
`name` STRING
)
TBLPROPERTIES(
'storage_handler' = 'com.cloudera.kudu.hive.KuduStorageHandler',
'kudu.table_name' = 'my_first_table',
'kudu.master_addresses' = 'kudu-master.example.com:7051',
'kudu.key_columns' = 'id'
);
INSERT INTO my_first_table VALUES (99, "sarah");
INSERT IGNORE INTO my_first_table VALUES (99, "sarah");
UPDATE my_first_table SET name="bob" where id = 3;
DELETE FROM my_first_table WHERE id < 3;
DELETE c FROM my_second_table c, stock_symbols s WHERE c.name = s.symbol;
AND IT’S NOW IN-MEMORY
43
Accompanied by Innovations in Underlying Platform
45
Cluster Resource Management to

support mulJ-tenant distributed services
In-Memory Distributed Storage,

to accompany In-Memory Distributed Processing
DATAFLOW PIPELINES 

ARE THE NEW ETL
46
New ways to do BI
New ways to do BI
HADOOP IS THE NEW ETL ENGINE
49
50Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Proprietary ETL
engines die circa
2015 – folded into
big data
Oracle Open World 2015 21
Proprietary ETL is Dead. Apache-based ETL is What’s Next
Scripted
SQL
Stored
Procs
ODI for
Columnar
ODI for
In-Mem
ODI for
Exadata
ODI for
Hive
ODI for
Pig & Oozie
1990’s
Eon of Scripts and PL-SQL Era of SQL E-LT/Pushdown Big Data ETL in Batch Streaming ETL
Period of Proprietary Batch ETL Engines
Informatica
Ascential/IBM
Ab Initio
Acta/SAP
SyncSort
1994
Oracle Data Integrator
ODI for
Spark
ODI for
Spark Streaming
Warehouse
Builder
MACHINE LEARNING & SEARCH FOR 

“AUTOMAGIC” SCHEMA DISCOVERY
51
New ways to do BI
•By definition there's lots of data in a big data system ... so how do you find the data you
want?
•Google's own internal solution - GOODS ("Google Dataset Search")
•Uses crawler to discover new datasets
•ML classification routines to infer domain
•Data provenance and lineage
•Indexes and catalogs 26bn datasets
•Other users, vendors also have solutions
•Oracle Big Data Discovery
•Datameer
•Platfora
•Cloudera Navigator
Google GOODS - Catalog + Search At Google-Scale
53
A NEW TAKE ON BI
54
•Came out if the data science movement, as a way to "show workings"
•A set of reproducible steps that tell a story about the data
•as well as being a better command-line environment for data analysis
•One example is Jupyter, evolution of iPython notebook
•supports pySpark, Pandas etc
•See also Apache Zepplin
Web-Based Data Analysis Notebooks
55
AND EMERGING OPEN-SOURCE

BI TOOLS AND PLATFORMS
57
And Emerging Open-Source

BI Tools and Platforms
wp-content/uploads/2016/05/paper.pdf
And Emerging Open-Source

BI Tools and Platforms
WELCOME TO THE FUTURE
62
Mark Rittman, Oracle ACE Director
THE FUTURE OF ANALYTICS, DATA INTEGRATION
AND BI ON BIG DATA PLATFORMS
HADOOP USER GROUP IRELAND (HUG IRL)
Dublin, September 2016

Weitere ähnliche Inhalte

Was ist angesagt?

Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
StampedeCon
 
AWS Customer Presentation - VMIX AWS Experience
AWS Customer Presentation - VMIX AWS ExperienceAWS Customer Presentation - VMIX AWS Experience
AWS Customer Presentation - VMIX AWS Experience
Amazon Web Services
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
DataWorks Summit
 

Was ist angesagt? (20)

Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Data Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringData Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data Engineering
 
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
 
SQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalSQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle Professional
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
The Microsoft BigData Story
The Microsoft BigData StoryThe Microsoft BigData Story
The Microsoft BigData Story
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azure
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
 
Cassandra Lunch #88: Cadence
Cassandra Lunch #88: CadenceCassandra Lunch #88: Cadence
Cassandra Lunch #88: Cadence
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
RDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business IntelligenceRDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business Intelligence
 
Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platform
 
AWS Customer Presentation - VMIX AWS Experience
AWS Customer Presentation - VMIX AWS ExperienceAWS Customer Presentation - VMIX AWS Experience
AWS Customer Presentation - VMIX AWS Experience
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 

Andere mochten auch

SD Worx - De toekomst van werk - Lieve & Veerle Michiels
SD Worx - De toekomst van werk - Lieve & Veerle MichielsSD Worx - De toekomst van werk - Lieve & Veerle Michiels
SD Worx - De toekomst van werk - Lieve & Veerle Michiels
SD Worx Belgium
 

Andere mochten auch (8)

Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data Context
 
Tame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data IntegrationTame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data Integration
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
SD Worx - De toekomst van werk - Lieve & Veerle Michiels
SD Worx - De toekomst van werk - Lieve & Veerle MichielsSD Worx - De toekomst van werk - Lieve & Veerle Michiels
SD Worx - De toekomst van werk - Lieve & Veerle Michiels
 
From good to great - How to beef up your localization program
From good to great - How to beef up your localization programFrom good to great - How to beef up your localization program
From good to great - How to beef up your localization program
 
CloudPay - How Groupon & Wells Fargo Manage Global Payroll & Compliance
CloudPay - How Groupon & Wells Fargo Manage Global Payroll & ComplianceCloudPay - How Groupon & Wells Fargo Manage Global Payroll & Compliance
CloudPay - How Groupon & Wells Fargo Manage Global Payroll & Compliance
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 
Ramco ERP on Cloud - The Best Cloud Computing Solution Worldwide
Ramco ERP on Cloud - The Best Cloud Computing Solution Worldwide Ramco ERP on Cloud - The Best Cloud Computing Solution Worldwide
Ramco ERP on Cloud - The Best Cloud Computing Solution Worldwide
 

Ähnlich wie The Future of Analytics, Data Integration and BI on Big Data Platforms

Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
Andrew Brust
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Andrew Brust
 

Ähnlich wie The Future of Analytics, Data Integration and BI on Big Data Platforms (20)

New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
Hadoop Data Modeling
Hadoop Data ModelingHadoop Data Modeling
Hadoop Data Modeling
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoop
 
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ DevicesDelivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
 
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
 
How to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldHow to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database World
 

Mehr von Mark Rittman

Mehr von Mark Rittman (19)

Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
 
Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...
 
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
 
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
 
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case StudyOracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
 
Deploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle CloudDeploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle Cloud
 
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
 
What is Big Data Discovery, and how it complements traditional business anal...
What is Big Data Discovery, and how it complements  traditional business anal...What is Big Data Discovery, and how it complements  traditional business anal...
What is Big Data Discovery, and how it complements traditional business anal...
 
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
 
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
 
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
 
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
 
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODIBIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
 
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI ProjectsOGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
 
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12cUKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
 
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
 
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11gPart 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
 
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cPart 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
 

Kürzlich hochgeladen

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 

Kürzlich hochgeladen (20)

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 

The Future of Analytics, Data Integration and BI on Big Data Platforms

  • 1. Mark Rittman, Oracle ACE Director THE FUTURE OF ANALYTICS, DATA INTEGRATION AND BI ON BIG DATA PLATFORMS HADOOP USER GROUP IRELAND (HUG IRL) Dublin, September 2016
  • 2. •Mark Rittman, Co-Founder of Rittman Mead •Oracle ACE Director, specialising in Oracle BI&DW •14 Years Experience with Oracle Technology •Regular columnist for Oracle Magazine •Author of two Oracle Press Oracle BI books •Oracle Business Intelligence Developers Guide •Oracle Exalytics Revealed •Writer for Rittman Mead Blog :
 http://www.rittmanmead.com/blog •Email : mark.rittman@rittmanmead.com •Twitter : @markrittman About the Speaker 2
  • 3. OR AS I SAY AT PARTIES… 3
  • 4. 4
  • 6. •Started back in 1996 on a bank Oracle DW project •Our tools were Oracle 7.3.4, SQL*Plus, PL/SQL 
 and shell scripts •Went on to use Oracle Developer/2000 and Designer/2000 •Our initial users queried the DW using SQL*Plus •And later on, we rolled-out Discoverer/2000 to everyone else •And life was fun… 20 Years in Old-school BI & Data Warehousing 6
  • 7. •Data warehouses provided a unified view of the business •Single place to store key data and metrics •Joined-up view of the business •Aggregates and conformed dimensions •ETL routines to load, cleanse and conform data •BI tools for simple, guided access to information •Tabular data access using SQL-generating tools •Drill paths, hierarchies, facts, attributes •Fast access to pre-computed aggregates •Packaged BI for fast-start ERP analytics Data Warehouses and Enterprise BI Tools 7 Oracle MongoDB Oracle Sybase IBM DB/2 MS SQL MS SQL Server Core ERP Platform Retail Banking Call Center E-Commerce CRM 
 Business Intelligence Tools 
 Data Warehouse Access &
 Performance
 Layer ODS /
 Foundation
 Layer 7
  • 8. •Examples were Crystal Reports, Oracle Reports, Cognos Impromptu, Business Objects •Report written against carefully-curated BI dataset, or directly connecting to ERP/CRM •Adding data from external sources, or other RDBMSs,
 was difficult and involved IT resources •Report-writing was a skilled job •High ongoing cost for maintenance and changes •Little scope for analysis, predictive modeling •Often user frustration and pace of delivery Reporting Back Then… 8 8
  • 9. •For example Oracle OBIEE, SAP Business Objects, IBM Cognos •Full-featured, IT-orientated enterprise BI platforms •Metadata layers, integrated security, web delivery •Pre-build ERP metadata layers, dashboards + reports •Federated queries across multiple sources •Single version of the truth across the enterprise •Mobile, web dashboards, alerts, published reports •Integration with SOA and web services Then Came Enterprise BI Tools 10 10
  • 10. THEN CAME … BIG DATA 11
  • 12. BIG, FAST AND FAULT-TOLERANT 14
  • 13. •Data from new-world applications is not like historic data •Typically comes in non-tabular form •JSON, log files, key/value pairs •Users often want it speculatively •Haven’t thought it through •Schema can evolve •Or maybe there isn’t one •But the end-users want it now •Not when you’re ready But Why Hadoop? Reason #1 - Flexible Storage 16 Big Data Management Platform Discovery & Development Labs Safe & secure Discovery and Development environment Data sets and samples Models and programs Single Customer View Enriched Customer Profile Correlating Modeling Machine Learning Scoring Schema-on Read Analysis
  • 14. •Enterprise High-End RDBMSs such as Oracle can scale •Clustering for single-instance DBs can scale to >PB •Exadata scales further by offloading queries to storage •Sharded databases (e.g. Netezza) can scale further •But cost (and complexity) become limiting factors •Typically $1m/node is not uncommon But Why Hadoop? Reason #2 - Massive Scalability 17
  • 15. •Hadoop started by being synonymous with MapReduce, and Java coding •But YARN (Yet another Resource Negotiator) broke this dependency •Modern Hadoop platforms provide overall cluster resource management,
 but support multiple processing frameworks •General-purpose (e.g. MapReduce) •Graph processing •Machine Learning •Real-Time Processing (Spark Streaming, Storm) •Even the Hadoop resource management framework
 can be swapped out •Apache Mesos Reason #3 - Processing Frameworks 18 Big Data Platform - All Running Natively Under Hadoop YARN (Cluster Resource Management) Batch
 (MapReduce) HDFS (Cluster Filesystem holding raw data) Interactive
 (Impala, Drill,
 Tez, Presto) Streaming +
 In-Memory
 (Spark, Storm) Graph + Search
 (Solr, Giraph) Enriched 
 Customer Profile Modeling Scoring
  • 16. •Data now landed in Hadoop clusters, NoSQL databases and Cloud Storage •Flexible data storage platform with cheap storage, flexible schema support + compute •Data lands in the data lake or reservoir in raw form, then minimally processed •Data then accessed directly by “data scientists”, or processed further into DW Meet the New Data Warehouse : The “Data Lake” 19 Data Transfer Data Access Data Factory Data Reservoir Business Intelligence Tools Hadoop Platform File Based Integration Stream Based Integration Data streams Discovery & Development Labs Safe & secure Discovery and Development environment Data sets and samples Models and programs Marketing / Sales Applications Models Machine Learning Segments Operational Data Transactions Customer Master ata Unstructured Data Voice + Chat Transcripts ETL Based Integration Raw Customer Data Data stored in the original format (usually files) such as SS7, ASN.1, JSON etc. Mapped Customer Data Data sets produced by mapping and transforming raw data
  • 17. NEW STARTUPS ENABLING A HYBRID 
 “OLD WORLD/NEW WORLD” APPROACH 20
  • 18. AND PERFECT FOR ANALYTICS 22
  • 19. •Enterprise High-End RDBMSs such as Oracle can scale into the petabytes, using clustering •Sharded databases (e.g. Netezza) can scale further but with complexity / single workload trade-offs •Hadoop was designed from outside for massive horizontal scalability - using cheap hardware •Anticipates hardware failure and makes multiple copies of data as protection •More nodes you add, more stable it becomes •And at a fraction of the cost of traditional
 RDBMS platforms Hadoop : The Default Platform Today for Analytics 23
  • 20. BI INNOVATION IS HAPPENING
 AROUND HADOOP 24
  • 22.
  • 27.
  • 28.
  • 30.
  • 31. HADOOP IS NOW FAST 37
  • 32. Hadoop 2.0 Processing Frameworks + Tools 38
  • 33. •Cloudera’s answer to Hive query response time issues •MPP SQL query engine running on Hadoop, bypasses MapReduce for direct data access •Mostly in-memory, but spills to disk if required •Uses Hive metastore to access Hive table metadata •Similar SQL dialect to Hive - not as rich though and no support for Hive SerDes, storage handlers etc Cloudera Impala - Fast, MPP-style Access to Hadoop Data 39
  • 34. •Beginners usually store data in HDFS using text file formats (CSV) but these have limitations •Apache AVRO often used for general-purpose processing •Splitability, schema evolution, in-built metadata, support for block compression •Parquet now commonly used with Impala due to column-orientated storage •Mirrors work in RDBMS world around column-store •Only return (project) the columns you require across a wide table Parquet - Column-Orientated Storage for Analytics 40
  • 35. •But Parquet (and HDFS) have significant limitation for real-time analytics applications •Append-only orientation, focus on column-store 
 makes streaming ingestion harder •Cloudera Kudu aims to combine 
 best of HDFS + HBase •Real-time analytics-optimised •Supports updates to data •Fast ingestion of data •Accessed using SQL-style tables
 and get/put/update/delete API Cloudera Kudu - Best of HBase and Column-Store 41
  • 36. •Kudu storage used with Impala - create tables using Kudu storage handler •Can now UPDATE, DELETE and INSERT into Hadoop tables, not just SELECT and LOAD DATA Example Impala DDL + DML Commands with Kudu 42 CREATE TABLE `my_first_table` ( `id` BIGINT, `name` STRING ) TBLPROPERTIES( 'storage_handler' = 'com.cloudera.kudu.hive.KuduStorageHandler', 'kudu.table_name' = 'my_first_table', 'kudu.master_addresses' = 'kudu-master.example.com:7051', 'kudu.key_columns' = 'id' ); INSERT INTO my_first_table VALUES (99, "sarah"); INSERT IGNORE INTO my_first_table VALUES (99, "sarah"); UPDATE my_first_table SET name="bob" where id = 3; DELETE FROM my_first_table WHERE id < 3; DELETE c FROM my_second_table c, stock_symbols s WHERE c.name = s.symbol;
  • 37. AND IT’S NOW IN-MEMORY 43
  • 38.
  • 39. Accompanied by Innovations in Underlying Platform 45 Cluster Resource Management to
 support mulJ-tenant distributed services In-Memory Distributed Storage,
 to accompany In-Memory Distributed Processing
  • 40. DATAFLOW PIPELINES 
 ARE THE NEW ETL 46
  • 41. New ways to do BI
  • 42. New ways to do BI
  • 43. HADOOP IS THE NEW ETL ENGINE 49
  • 44. 50Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Proprietary ETL engines die circa 2015 – folded into big data Oracle Open World 2015 21 Proprietary ETL is Dead. Apache-based ETL is What’s Next Scripted SQL Stored Procs ODI for Columnar ODI for In-Mem ODI for Exadata ODI for Hive ODI for Pig & Oozie 1990’s Eon of Scripts and PL-SQL Era of SQL E-LT/Pushdown Big Data ETL in Batch Streaming ETL Period of Proprietary Batch ETL Engines Informatica Ascential/IBM Ab Initio Acta/SAP SyncSort 1994 Oracle Data Integrator ODI for Spark ODI for Spark Streaming Warehouse Builder
  • 45. MACHINE LEARNING & SEARCH FOR 
 “AUTOMAGIC” SCHEMA DISCOVERY 51
  • 46. New ways to do BI
  • 47. •By definition there's lots of data in a big data system ... so how do you find the data you want? •Google's own internal solution - GOODS ("Google Dataset Search") •Uses crawler to discover new datasets •ML classification routines to infer domain •Data provenance and lineage •Indexes and catalogs 26bn datasets •Other users, vendors also have solutions •Oracle Big Data Discovery •Datameer •Platfora •Cloudera Navigator Google GOODS - Catalog + Search At Google-Scale 53
  • 48. A NEW TAKE ON BI 54
  • 49. •Came out if the data science movement, as a way to "show workings" •A set of reproducible steps that tell a story about the data •as well as being a better command-line environment for data analysis •One example is Jupyter, evolution of iPython notebook •supports pySpark, Pandas etc •See also Apache Zepplin Web-Based Data Analysis Notebooks 55
  • 50. AND EMERGING OPEN-SOURCE
 BI TOOLS AND PLATFORMS 57
  • 51. And Emerging Open-Source
 BI Tools and Platforms wp-content/uploads/2016/05/paper.pdf
  • 52.
  • 53. And Emerging Open-Source
 BI Tools and Platforms
  • 54. WELCOME TO THE FUTURE 62
  • 55. Mark Rittman, Oracle ACE Director THE FUTURE OF ANALYTICS, DATA INTEGRATION AND BI ON BIG DATA PLATFORMS HADOOP USER GROUP IRELAND (HUG IRL) Dublin, September 2016