SlideShare a Scribd company logo
1 of 52
Bridging Oracle Database and Hadoop
Alex Gorbachev
October 2015
Alex Gorbachev
• Chief Technology Officer at Pythian
• Blogger
• Cloudera Champion of Big Data
• OakTable Network member
• Oracle ACE Director
• Founder of BattleAgainstAnyGuess.com
• Founder of Sydney Oracle Meetup
• EVP, IOUG
What is Big Data?
and why Big Data today?
Why Big Data boom now?
• Advances in communication – it’s now feasible to
transfer large amounts of data economically by
anyone from virtually anywhere
• Commodity hardware – high performance and high
capacity at low price is available
• Commodity software – open-source phenomena
made advanced software products affordable to
anyone
• New data sources – mobile, sensors, social media
data-sources
• What’s been only possible at very high cost in the
past, can now be done by any small or large
business
Big Data = Affordable at Scale
Not everyone is
Facebook, Google, Yahoo
and etc.
These guys had to
push the envelope
because traditional
technology didn’t
scale
Not everyone is
Facebook, Google, Yahoo
and etc.
These guys had to
push the envelope
because traditional
technology didn’t
scale
Mere mortals’ challenge
is cost and agility
System capability per $
Big Data technology
may be expensive at low
scale due to high
engineering efforts.
Traditional technology
becomes too complex
and expensive to scale.
investments, $
capabilities
traditional
Big Data
What is Hadoop?
Hadoop Design Principle #1
Scalable Affordable Reliable Data Store
HDFS – Hadoop Distributed Filesystem
Hadoop Design Principle #2
Bring Code to Data
Code
Data
Why is Hadoop so affordable?
• Cheap hardware
• Resiliency through software
• Horizontal scalability
• Open-source software
How much does it cost?
Oracle Big Data Appliance
X5-2 rack - $525K list price
• 18 data nodes
• 648 CPU cores
• 2.3 TB RAM
• 216 x 4TB disks
• 864TB of raw disk capacity
• 288TB usable (triple
mirror)
• 40G InfiniBand + 10GbE
networking
• Cloudera Enterprise
Hadoop is very flexible
• Rich ecosystem of tools
• Can handle any data format
– Relational
– Text
– Audio, video
– Streaming data
– Logs
– Non-relational structured data (JSON, XML, binary
formats)
– Graph data
• Not limited to relational data processing
Challenges with Hadoop
for those of us used to Oracle
• New data access tools
– Relational and non-relational data
• Non-Oracle (and non-ANSI) Hive SQL
– Java-based UDFs and UDAFs
• Security features are not there out-of-the-box
• Maybe slow for “small data”
Tables in Hadoop
using Hadoop with relational data abstractions
Apache Hive
• Apache Hive provides a SQL layer over Hadoop
– data in HDFS (structured or unstructured via SerDe)
– using one of distributed processing frameworks –
MapReduce, Spark, Tez
• Presents data from HDFS as tables and columns
– Hive metastore (aka data dictionary)
• SQL language access (HiveQL)
– Parses SQL and creates execution plans in MR, Spark or
Tez
• JDBC and ODBC drivers
– Access from ETL and BI tools
– Custom apps
– Development tools
Native Hadoop tools
• Demo
• HUE
– HDFS files
– Hive
– Impala
Access Hive using SQL Developer
• Demo
• Use Cloudera JDBC drivers
• Query data & browse metadata
• Run DDL from SQL tab
• Create Hive table definitions inside Oracle DB
Hadoop and OBIEE 11g
• OBIEE 11.1.1.7 can query Hive/Hadoop as a
data source
– Hive ODBC drivers
– Apache Hive Physical Layer database type
• Limited features
– OBIEE 11.1.1.7 OBIEE has HiveServer1 ODBC
drivers
– HiveQL is only a subset of ANSI SQL
• Hive query response time is slow for speed of
thought response time
ODI 12c
• ODI – data transformation tool
– ELT approach pushes transformations down to
Hadoop - leveraging power of cluster
– Hive, HBase, Sqoop and OLH/ODCH KMs provide
native Hadoop loading / transformation
• Upcoming support for Pig and Spark
• Workflow orchestration
• Metadata and model-driven
• GUI workflow design
• Transformation audit & data quality
Moving Data to Hadoop using ODI
• Interface with Apache Sqoop using IKM SQL to
Hive-HBase-File knowledge module
– Hadoop ecosystem tool
– Able to run in parallel
– Optimized Sqoop JDBC drivers integration for Oracle
– Bi-directional in-and-out of Hadoop to RDBMS
– Data is moved directly between Hadoop cluster and
database
• Export RBDMS data to file and load using IKM
File to Hive
Integrating Hadoop with Oracle
Database
Oracle Big Data Connectors
• Oracle Loader for Hadoop
– Offloads some pre-processing to Hadoop MR jobs (data
type conversion, partitioning, sorting).
– Direct load into the database (online method)
– Data Pump binary files in HDFS (offline method)
• These can then be accessed as external tables on
HDFS
• Oracle Direct Connector for Hadoop
– Create external table on files in HDFS
– Text files or Data Pump binary files
– WARNING: lots of data movement! Great for archival non-
frequently accessed data to HDFS
Oracle Big Data SQL
25
Source: http://www.slideshare.net/gwenshap/data-wrangling-and-oracle-connectors-for-hadoop
Oracle Big Data SQL
• Transparent access from Oracle DB
to Hadoop
– Oracle SQL dialect
– Oracle DB security model
– Join data from Hadoop and Oracle
• SmartScan - pushing code to data
– Same software base as on Exadata
Storage Cells
– Minimize data transfer from Hadoop to
Oracle
• Requires BDA and Exadata
• Licensed per Hadoop disk spindle
26
Big Data SQL Demo
Big Data SQL in Oracle tools
• Transparent to any app
• SQL Developer
• ODI
• OBIEE
Hadoop as Data Warehouse
Traditional Needs of Data Warehouses
• Speed of thought end user analytics experience
– BI tools coupled with DW databases
• Scalable data platform
– DW database
• Versatile and scalable data transformation
engine
– ETL tools sometimes coupled with DW databases
• Data quality control and audit
– ETL tools
What drives Hadoop adoption for
Data Warehousing?
What drives Hadoop adoption for
Data Warehousing?
1. Cost efficiency
What drives Hadoop adoption for
Data Warehousing?
1. Cost efficiency
2. Agility needs
Why is Hadoop Cost Efficient?
Hadoop leverages two main trends in IT
industry
• Commodity hardware – high performance and
high capacity at low price is available
• Commodity software – open-source
phenomena made advanced software products
affordable to anyone
How Does Hadoop Enable Agility?
• Load first, structure later
– Don’t need to spend months changing DW to add
new types of data without knowing for sure it will be
valuable for end users
– Quick and easy to verify hypothesis – perfect data
exploration platform
• All data in one place is very powerful
– Much easier to test new theories
• Natural fit for “unstructured” data
Traditional needs of DW & Hadoop
• Speed of thought end user analytics experience?
– Very recent features – Impala, Presto, Drill, Hadapt, etc.
– BI tools embracing Hadoop as DW
– Totally new products become available
• Scalable data platform?
– Yes
• Versatile and scalable data transformation engine?
– Yes but needs a lot of DIY
– ETL vendors embraced Hadoop
• Data quality control and audit?
– Hadoop makes it more difficult because of flexibility it
brings
– A lot of DIY but ETL vendors getting better supporting
Hadoop + new products appear
Unique Hadoop Challenges
• Still “young” technology
– requires a lot of high quality engineering talent
• Security doesn’t come out of the box
– Capabilities are there but very tedious to implement
and somewhat fragile
• Challenge of selecting the right tool for the job
– Hadoop ecosystem is huge
• Hadoop breaks IT silos
• Requires commoditization of IT operations
– Large footprint with agile deployments
Typical Hadoop adoption in modern
Enterprise IT
Data WarehouseHadoop
BI tools
Bring the world in your data center
Rare historical report
Find a needle in a haystack
Will Hadoop displace traditional DW
platforms?
Hadoop
BI tools
Example pure Hadoop DW stack
HDFS
Hive/Pig FlumeSqoop DIY
Impala
Kerberos
Oozie + DIY -
data sources
Do you have a Big Data
problem?
Your Data
is NOT
as BIG
as you think
is NOT a Big Data problem
Using 8 years old hardware…
is NOT a Big Data problem
Misconfigured infrastructure…
is NOT a Big Data problem
Lack of purging policy…
is NOT a Big Data problem
Bad data model design…
is NOT a Big Data problem
Bad SQL…
Your Data
is NOT
as BIG
as you think
Controversy…
Thanks and Q&A
Contact info
gorbachev@pythian.com
+1-877-PYTHIAN
To follow us
pythian.com/blog
@alexgorbachev
@pythian
linkedin.com/company/pythian

More Related Content

What's hot

Oracle OpenWorld - A quick take on all 22 press releases of Day #1 - #3
Oracle OpenWorld - A quick take on all 22 press releases of Day #1 - #3Oracle OpenWorld - A quick take on all 22 press releases of Day #1 - #3
Oracle OpenWorld - A quick take on all 22 press releases of Day #1 - #3Holger Mueller
 
[db tech showcase Tokyo 2018] #dbts2018 #B31 『1,2,3 and Done! 3 easy ways to ...
[db tech showcase Tokyo 2018] #dbts2018 #B31 『1,2,3 and Done! 3 easy ways to ...[db tech showcase Tokyo 2018] #dbts2018 #B31 『1,2,3 and Done! 3 easy ways to ...
[db tech showcase Tokyo 2018] #dbts2018 #B31 『1,2,3 and Done! 3 easy ways to ...Insight Technology, Inc.
 
What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...
What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...
What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...Lucas Jellema
 
2019 - OOW - Database Migration Methods from On-Premise to Cloud
2019 - OOW - Database Migration Methods from On-Premise to Cloud2019 - OOW - Database Migration Methods from On-Premise to Cloud
2019 - OOW - Database Migration Methods from On-Premise to CloudMarcus Vinicius Miguel Pedro
 
Oracle OpenWorld 2016 Review - High Level Overview of major themes and grand ...
Oracle OpenWorld 2016 Review - High Level Overview of major themes and grand ...Oracle OpenWorld 2016 Review - High Level Overview of major themes and grand ...
Oracle OpenWorld 2016 Review - High Level Overview of major themes and grand ...Lucas Jellema
 
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)avanttic Consultoría Tecnológica
 
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Lucas Jellema
 
FOSDEM 2015 - NoSQL and SQL the best of both worlds
FOSDEM 2015 - NoSQL and SQL the best of both worldsFOSDEM 2015 - NoSQL and SQL the best of both worlds
FOSDEM 2015 - NoSQL and SQL the best of both worldsAndrew Morgan
 
6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)
6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)
6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)Lucas Jellema
 
Oracle Enterprise Manager 12c: updates and upgrades.
Oracle Enterprise Manager 12c: updates and upgrades.Oracle Enterprise Manager 12c: updates and upgrades.
Oracle Enterprise Manager 12c: updates and upgrades.Rolta
 
Open Innovation with Power Systems
Open Innovation with Power Systems Open Innovation with Power Systems
Open Innovation with Power Systems IBM Power Systems
 
SOA Suite 12c Customer implementation
SOA Suite 12c Customer implementationSOA Suite 12c Customer implementation
SOA Suite 12c Customer implementationMichel Schildmeijer
 
Review Oracle OpenWorld 2015 - Overview, Main themes, Announcements and Future
Review Oracle OpenWorld 2015 - Overview, Main themes, Announcements and FutureReview Oracle OpenWorld 2015 - Overview, Main themes, Announcements and Future
Review Oracle OpenWorld 2015 - Overview, Main themes, Announcements and FutureLucas Jellema
 
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...Lucas Jellema
 
Moving your Oracle Databases to the Oracle Cloud
Moving your Oracle Databases to the Oracle CloudMoving your Oracle Databases to the Oracle Cloud
Moving your Oracle Databases to the Oracle CloudAlex Zaballa
 
Java & SOA Cloud Service for Fusion Middleware Administrators
Java & SOA Cloud Service for Fusion Middleware AdministratorsJava & SOA Cloud Service for Fusion Middleware Administrators
Java & SOA Cloud Service for Fusion Middleware AdministratorsSimon Haslam
 
Business and IT agility through DevOps and microservice architecture powered ...
Business and IT agility through DevOps and microservice architecture powered ...Business and IT agility through DevOps and microservice architecture powered ...
Business and IT agility through DevOps and microservice architecture powered ...Lucas Jellema
 
Oracle WebLogic 12c New Multitenancy features
Oracle WebLogic 12c New Multitenancy featuresOracle WebLogic 12c New Multitenancy features
Oracle WebLogic 12c New Multitenancy featuresMichel Schildmeijer
 

What's hot (20)

Oracle OpenWorld - A quick take on all 22 press releases of Day #1 - #3
Oracle OpenWorld - A quick take on all 22 press releases of Day #1 - #3Oracle OpenWorld - A quick take on all 22 press releases of Day #1 - #3
Oracle OpenWorld - A quick take on all 22 press releases of Day #1 - #3
 
[db tech showcase Tokyo 2018] #dbts2018 #B31 『1,2,3 and Done! 3 easy ways to ...
[db tech showcase Tokyo 2018] #dbts2018 #B31 『1,2,3 and Done! 3 easy ways to ...[db tech showcase Tokyo 2018] #dbts2018 #B31 『1,2,3 and Done! 3 easy ways to ...
[db tech showcase Tokyo 2018] #dbts2018 #B31 『1,2,3 and Done! 3 easy ways to ...
 
What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...
What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...
What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...
 
2019 - OOW - Database Migration Methods from On-Premise to Cloud
2019 - OOW - Database Migration Methods from On-Premise to Cloud2019 - OOW - Database Migration Methods from On-Premise to Cloud
2019 - OOW - Database Migration Methods from On-Premise to Cloud
 
Oracle OpenWorld 2016 Review - High Level Overview of major themes and grand ...
Oracle OpenWorld 2016 Review - High Level Overview of major themes and grand ...Oracle OpenWorld 2016 Review - High Level Overview of major themes and grand ...
Oracle OpenWorld 2016 Review - High Level Overview of major themes and grand ...
 
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
 
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
 
FOSDEM 2015 - NoSQL and SQL the best of both worlds
FOSDEM 2015 - NoSQL and SQL the best of both worldsFOSDEM 2015 - NoSQL and SQL the best of both worlds
FOSDEM 2015 - NoSQL and SQL the best of both worlds
 
6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)
6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)
6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)
 
Oracle Enterprise Manager 12c: updates and upgrades.
Oracle Enterprise Manager 12c: updates and upgrades.Oracle Enterprise Manager 12c: updates and upgrades.
Oracle Enterprise Manager 12c: updates and upgrades.
 
Open Innovation with Power Systems
Open Innovation with Power Systems Open Innovation with Power Systems
Open Innovation with Power Systems
 
SOA Suite 12c Customer implementation
SOA Suite 12c Customer implementationSOA Suite 12c Customer implementation
SOA Suite 12c Customer implementation
 
Review Oracle OpenWorld 2015 - Overview, Main themes, Announcements and Future
Review Oracle OpenWorld 2015 - Overview, Main themes, Announcements and FutureReview Oracle OpenWorld 2015 - Overview, Main themes, Announcements and Future
Review Oracle OpenWorld 2015 - Overview, Main themes, Announcements and Future
 
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
 
Moving your Oracle Databases to the Oracle Cloud
Moving your Oracle Databases to the Oracle CloudMoving your Oracle Databases to the Oracle Cloud
Moving your Oracle Databases to the Oracle Cloud
 
Java & SOA Cloud Service for Fusion Middleware Administrators
Java & SOA Cloud Service for Fusion Middleware AdministratorsJava & SOA Cloud Service for Fusion Middleware Administrators
Java & SOA Cloud Service for Fusion Middleware Administrators
 
AMIS Oracle OpenWorld 2015 Review – part 3- PaaS Database, Integration, Ident...
AMIS Oracle OpenWorld 2015 Review – part 3- PaaS Database, Integration, Ident...AMIS Oracle OpenWorld 2015 Review – part 3- PaaS Database, Integration, Ident...
AMIS Oracle OpenWorld 2015 Review – part 3- PaaS Database, Integration, Ident...
 
Business and IT agility through DevOps and microservice architecture powered ...
Business and IT agility through DevOps and microservice architecture powered ...Business and IT agility through DevOps and microservice architecture powered ...
Business and IT agility through DevOps and microservice architecture powered ...
 
Oracle WebLogic 12c New Multitenancy features
Oracle WebLogic 12c New Multitenancy featuresOracle WebLogic 12c New Multitenancy features
Oracle WebLogic 12c New Multitenancy features
 
AMIS Oracle OpenWorld 2015 Review – part 2- Hardware & IaaS and PaaS Cloud Fo...
AMIS Oracle OpenWorld 2015 Review – part 2- Hardware & IaaS and PaaS Cloud Fo...AMIS Oracle OpenWorld 2015 Review – part 2- Hardware & IaaS and PaaS Cloud Fo...
AMIS Oracle OpenWorld 2015 Review – part 2- Hardware & IaaS and PaaS Cloud Fo...
 

Viewers also liked

Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Alex Gorbachev
 
APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna HiremaneIntelAPAC
 
Integrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI EnvironmentIntegrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI EnvironmentCloudera, Inc.
 
Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010ragho
 
A Basic Hive Inspection
A Basic Hive InspectionA Basic Hive Inspection
A Basic Hive InspectionLinda Tillman
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010Jonathan Seidman
 
Big Data Project using HIVE - college scorecard
Big Data Project using HIVE - college scorecardBig Data Project using HIVE - college scorecard
Big Data Project using HIVE - college scorecardAbhishek Gupta
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Kevin Weil
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBaseHortonworks
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop EasyNick Dimiduk
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start TutorialCarl Steinbach
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 

Viewers also liked (15)

Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
 
APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna Hiremane
 
Hadoop at eBay
Hadoop at eBayHadoop at eBay
Hadoop at eBay
 
Integrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI EnvironmentIntegrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI Environment
 
Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010
 
A Basic Hive Inspection
A Basic Hive InspectionA Basic Hive Inspection
A Basic Hive Inspection
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
 
Big Data Project using HIVE - college scorecard
Big Data Project using HIVE - college scorecardBig Data Project using HIVE - college scorecard
Big Data Project using HIVE - college scorecard
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 

Similar to Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle OpenWorld IOUG Forum

Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Eric Baldeschwieler
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightTillmann Eitelberg
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopCaserta
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Big Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersBig Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersAdaryl "Bob" Wakefield, MBA
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoopinside-BigData.com
 
No sql and sql - open analytics summit
No sql and sql - open analytics summitNo sql and sql - open analytics summit
No sql and sql - open analytics summitOpen Analytics
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3xKinAnx
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv larsgeorge
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Andrew Brust
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
The modern analytics architecture
The modern analytics architectureThe modern analytics architecture
The modern analytics architectureJoseph D'Antoni
 

Similar to Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle OpenWorld IOUG Forum (20)

Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsight
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Big Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersBig Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R Users
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoop
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
No sql and sql - open analytics summit
No sql and sql - open analytics summitNo sql and sql - open analytics summit
No sql and sql - open analytics summit
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 
Apache drill
Apache drillApache drill
Apache drill
 
Big Data Telecom
Big Data TelecomBig Data Telecom
Big Data Telecom
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
The modern analytics architecture
The modern analytics architectureThe modern analytics architecture
The modern analytics architecture
 

More from Alex Gorbachev

Introduction to Machine Learning for Oracle Database Professionals
Introduction to Machine Learning for Oracle Database ProfessionalsIntroduction to Machine Learning for Oracle Database Professionals
Introduction to Machine Learning for Oracle Database ProfessionalsAlex Gorbachev
 
UTHOC2 - Under The Hood of Oracle Clusterware 2.0 - Grid Infrastructure by Al...
UTHOC2 - Under The Hood of Oracle Clusterware 2.0 - Grid Infrastructure by Al...UTHOC2 - Under The Hood of Oracle Clusterware 2.0 - Grid Infrastructure by Al...
UTHOC2 - Under The Hood of Oracle Clusterware 2.0 - Grid Infrastructure by Al...Alex Gorbachev
 
Benchmarking Oracle I/O Performance with Orion by Alex Gorbachev
Benchmarking Oracle I/O Performance with Orion by Alex GorbachevBenchmarking Oracle I/O Performance with Orion by Alex Gorbachev
Benchmarking Oracle I/O Performance with Orion by Alex GorbachevAlex Gorbachev
 
Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...
Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...
Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...Alex Gorbachev
 
MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...
MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...
MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...Alex Gorbachev
 
MOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, Pythian
MOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, PythianMOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, Pythian
MOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, PythianAlex Gorbachev
 
Oracle ASM 11g - The Evolution
Oracle ASM 11g - The EvolutionOracle ASM 11g - The Evolution
Oracle ASM 11g - The EvolutionAlex Gorbachev
 
Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...
Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...
Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...Alex Gorbachev
 

More from Alex Gorbachev (8)

Introduction to Machine Learning for Oracle Database Professionals
Introduction to Machine Learning for Oracle Database ProfessionalsIntroduction to Machine Learning for Oracle Database Professionals
Introduction to Machine Learning for Oracle Database Professionals
 
UTHOC2 - Under The Hood of Oracle Clusterware 2.0 - Grid Infrastructure by Al...
UTHOC2 - Under The Hood of Oracle Clusterware 2.0 - Grid Infrastructure by Al...UTHOC2 - Under The Hood of Oracle Clusterware 2.0 - Grid Infrastructure by Al...
UTHOC2 - Under The Hood of Oracle Clusterware 2.0 - Grid Infrastructure by Al...
 
Benchmarking Oracle I/O Performance with Orion by Alex Gorbachev
Benchmarking Oracle I/O Performance with Orion by Alex GorbachevBenchmarking Oracle I/O Performance with Orion by Alex Gorbachev
Benchmarking Oracle I/O Performance with Orion by Alex Gorbachev
 
Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...
Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...
Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...
 
MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...
MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...
MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...
 
MOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, Pythian
MOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, PythianMOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, Pythian
MOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, Pythian
 
Oracle ASM 11g - The Evolution
Oracle ASM 11g - The EvolutionOracle ASM 11g - The Evolution
Oracle ASM 11g - The Evolution
 
Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...
Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...
Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...
 

Recently uploaded

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 

Recently uploaded (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 

Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle OpenWorld IOUG Forum

  • 1. Bridging Oracle Database and Hadoop Alex Gorbachev October 2015
  • 2. Alex Gorbachev • Chief Technology Officer at Pythian • Blogger • Cloudera Champion of Big Data • OakTable Network member • Oracle ACE Director • Founder of BattleAgainstAnyGuess.com • Founder of Sydney Oracle Meetup • EVP, IOUG
  • 3. What is Big Data? and why Big Data today?
  • 4. Why Big Data boom now? • Advances in communication – it’s now feasible to transfer large amounts of data economically by anyone from virtually anywhere • Commodity hardware – high performance and high capacity at low price is available • Commodity software – open-source phenomena made advanced software products affordable to anyone • New data sources – mobile, sensors, social media data-sources • What’s been only possible at very high cost in the past, can now be done by any small or large business
  • 5. Big Data = Affordable at Scale
  • 6. Not everyone is Facebook, Google, Yahoo and etc. These guys had to push the envelope because traditional technology didn’t scale
  • 7. Not everyone is Facebook, Google, Yahoo and etc. These guys had to push the envelope because traditional technology didn’t scale Mere mortals’ challenge is cost and agility
  • 8. System capability per $ Big Data technology may be expensive at low scale due to high engineering efforts. Traditional technology becomes too complex and expensive to scale. investments, $ capabilities traditional Big Data
  • 10. Hadoop Design Principle #1 Scalable Affordable Reliable Data Store HDFS – Hadoop Distributed Filesystem
  • 11. Hadoop Design Principle #2 Bring Code to Data Code Data
  • 12. Why is Hadoop so affordable? • Cheap hardware • Resiliency through software • Horizontal scalability • Open-source software
  • 13. How much does it cost? Oracle Big Data Appliance X5-2 rack - $525K list price • 18 data nodes • 648 CPU cores • 2.3 TB RAM • 216 x 4TB disks • 864TB of raw disk capacity • 288TB usable (triple mirror) • 40G InfiniBand + 10GbE networking • Cloudera Enterprise
  • 14. Hadoop is very flexible • Rich ecosystem of tools • Can handle any data format – Relational – Text – Audio, video – Streaming data – Logs – Non-relational structured data (JSON, XML, binary formats) – Graph data • Not limited to relational data processing
  • 15. Challenges with Hadoop for those of us used to Oracle • New data access tools – Relational and non-relational data • Non-Oracle (and non-ANSI) Hive SQL – Java-based UDFs and UDAFs • Security features are not there out-of-the-box • Maybe slow for “small data”
  • 16. Tables in Hadoop using Hadoop with relational data abstractions
  • 17. Apache Hive • Apache Hive provides a SQL layer over Hadoop – data in HDFS (structured or unstructured via SerDe) – using one of distributed processing frameworks – MapReduce, Spark, Tez • Presents data from HDFS as tables and columns – Hive metastore (aka data dictionary) • SQL language access (HiveQL) – Parses SQL and creates execution plans in MR, Spark or Tez • JDBC and ODBC drivers – Access from ETL and BI tools – Custom apps – Development tools
  • 18. Native Hadoop tools • Demo • HUE – HDFS files – Hive – Impala
  • 19. Access Hive using SQL Developer • Demo • Use Cloudera JDBC drivers • Query data & browse metadata • Run DDL from SQL tab • Create Hive table definitions inside Oracle DB
  • 20. Hadoop and OBIEE 11g • OBIEE 11.1.1.7 can query Hive/Hadoop as a data source – Hive ODBC drivers – Apache Hive Physical Layer database type • Limited features – OBIEE 11.1.1.7 OBIEE has HiveServer1 ODBC drivers – HiveQL is only a subset of ANSI SQL • Hive query response time is slow for speed of thought response time
  • 21. ODI 12c • ODI – data transformation tool – ELT approach pushes transformations down to Hadoop - leveraging power of cluster – Hive, HBase, Sqoop and OLH/ODCH KMs provide native Hadoop loading / transformation • Upcoming support for Pig and Spark • Workflow orchestration • Metadata and model-driven • GUI workflow design • Transformation audit & data quality
  • 22. Moving Data to Hadoop using ODI • Interface with Apache Sqoop using IKM SQL to Hive-HBase-File knowledge module – Hadoop ecosystem tool – Able to run in parallel – Optimized Sqoop JDBC drivers integration for Oracle – Bi-directional in-and-out of Hadoop to RDBMS – Data is moved directly between Hadoop cluster and database • Export RBDMS data to file and load using IKM File to Hive
  • 23. Integrating Hadoop with Oracle Database
  • 24. Oracle Big Data Connectors • Oracle Loader for Hadoop – Offloads some pre-processing to Hadoop MR jobs (data type conversion, partitioning, sorting). – Direct load into the database (online method) – Data Pump binary files in HDFS (offline method) • These can then be accessed as external tables on HDFS • Oracle Direct Connector for Hadoop – Create external table on files in HDFS – Text files or Data Pump binary files – WARNING: lots of data movement! Great for archival non- frequently accessed data to HDFS
  • 25. Oracle Big Data SQL 25 Source: http://www.slideshare.net/gwenshap/data-wrangling-and-oracle-connectors-for-hadoop
  • 26. Oracle Big Data SQL • Transparent access from Oracle DB to Hadoop – Oracle SQL dialect – Oracle DB security model – Join data from Hadoop and Oracle • SmartScan - pushing code to data – Same software base as on Exadata Storage Cells – Minimize data transfer from Hadoop to Oracle • Requires BDA and Exadata • Licensed per Hadoop disk spindle 26
  • 27. Big Data SQL Demo
  • 28. Big Data SQL in Oracle tools • Transparent to any app • SQL Developer • ODI • OBIEE
  • 29. Hadoop as Data Warehouse
  • 30. Traditional Needs of Data Warehouses • Speed of thought end user analytics experience – BI tools coupled with DW databases • Scalable data platform – DW database • Versatile and scalable data transformation engine – ETL tools sometimes coupled with DW databases • Data quality control and audit – ETL tools
  • 31. What drives Hadoop adoption for Data Warehousing?
  • 32. What drives Hadoop adoption for Data Warehousing? 1. Cost efficiency
  • 33. What drives Hadoop adoption for Data Warehousing? 1. Cost efficiency 2. Agility needs
  • 34. Why is Hadoop Cost Efficient? Hadoop leverages two main trends in IT industry • Commodity hardware – high performance and high capacity at low price is available • Commodity software – open-source phenomena made advanced software products affordable to anyone
  • 35. How Does Hadoop Enable Agility? • Load first, structure later – Don’t need to spend months changing DW to add new types of data without knowing for sure it will be valuable for end users – Quick and easy to verify hypothesis – perfect data exploration platform • All data in one place is very powerful – Much easier to test new theories • Natural fit for “unstructured” data
  • 36. Traditional needs of DW & Hadoop • Speed of thought end user analytics experience? – Very recent features – Impala, Presto, Drill, Hadapt, etc. – BI tools embracing Hadoop as DW – Totally new products become available • Scalable data platform? – Yes • Versatile and scalable data transformation engine? – Yes but needs a lot of DIY – ETL vendors embraced Hadoop • Data quality control and audit? – Hadoop makes it more difficult because of flexibility it brings – A lot of DIY but ETL vendors getting better supporting Hadoop + new products appear
  • 37. Unique Hadoop Challenges • Still “young” technology – requires a lot of high quality engineering talent • Security doesn’t come out of the box – Capabilities are there but very tedious to implement and somewhat fragile • Challenge of selecting the right tool for the job – Hadoop ecosystem is huge • Hadoop breaks IT silos • Requires commoditization of IT operations – Large footprint with agile deployments
  • 38. Typical Hadoop adoption in modern Enterprise IT Data WarehouseHadoop BI tools
  • 39. Bring the world in your data center
  • 41. Find a needle in a haystack
  • 42. Will Hadoop displace traditional DW platforms? Hadoop BI tools
  • 43. Example pure Hadoop DW stack HDFS Hive/Pig FlumeSqoop DIY Impala Kerberos Oozie + DIY - data sources
  • 44. Do you have a Big Data problem?
  • 45. Your Data is NOT as BIG as you think
  • 46. is NOT a Big Data problem Using 8 years old hardware…
  • 47. is NOT a Big Data problem Misconfigured infrastructure…
  • 48. is NOT a Big Data problem Lack of purging policy…
  • 49. is NOT a Big Data problem Bad data model design…
  • 50. is NOT a Big Data problem Bad SQL…
  • 51. Your Data is NOT as BIG as you think Controversy…
  • 52. Thanks and Q&A Contact info gorbachev@pythian.com +1-877-PYTHIAN To follow us pythian.com/blog @alexgorbachev @pythian linkedin.com/company/pythian

Editor's Notes

  1. WHERE Clause Evaluation Column Projection Bloom Filters for Better Join Performance JSON Parsing, Data Mining Model Evaluation
  2. There is a lot of interesting data that is not generated by your company. Listings of businesses in specific locations. Connections in social media The data may be un-structured, semi-structured or even structured. but it isn’t structured in the way your DWH expects and needs. We need a landing pad for cleanup, pre-processing, aggregating, filtering and structuring. Hadoop is perfect for this. Mappers can scrape data from websites efficiently. Map-reduce jobs that cleanup and process the data. And then load the results into your DWH.
  3. We want the top 3 items bought by left handed women between ages of 41 and 43, on November 15, 1998. How long it will take you to answer this question? For one of my customers, the answer is 25 minutes. As data grows older, it usually becomes less valuable to the business, and it gets aggregated and shelved off to tapes or other cheap storage. This means that for many organizations, answering details questions about events that happened more than few month ago is impossible or at least very challenging. The business learned to never ask those questions, because the answer is “you can’t”. Hadoop combines cheap storage and massive processing power, this allows us to store detailed history of our business, and to generate reports about it. And once the answer for questions about history is “You will have your data in 25 minutes” instead of “impossible”, the questions turn out to be less rare than we assumed.
  4. 7 Petabytes of log file data 3 lines point to the security hole that allowed a break-in last week Your DWH has aggregated information from the logs. Maybe. Hadoop is very cost effective about storing data. Lots of cheap disks, easy to throw data in without pre-processing. Search the data when you need it.
  5. Bad schema design is not big data Using 8 year old hardware is not big data Not having purging policy is not big data Not configuring your database and operating system correctly is not big data Poor data filtering is not big data either Keep the data you need and use. In a way that you can actually use it. If doing this requires cutting edge technology, excellent! But don’t tell me you need NoSQL because you don’t purge data and have un-optimized PL/SQL running on 10-yo hardware.
  6. Bad schema design is not big data Using 8 year old hardware is not big data Not having purging policy is not big data Not configuring your database and operating system correctly is not big data Poor data filtering is not big data either Keep the data you need and use. In a way that you can actually use it. If doing this requires cutting edge technology, excellent! But don’t tell me you need NoSQL because you don’t purge data and have un-optimized PL/SQL running on 10-yo hardware.