SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Big Data Warehousing
January 20, 2014

Sponsored By:

Today’s Topic: Big Data 2.0: YARN
Distributed ETL & SQL with Hadoop
Agenda
7:00

Networking (15 min)
Grab some food and a drink... Make some friends.

7:15

Welcome + Intro

President, Caserta Concepts

7:30

Joe Caserta (15 min)

About the Meetup, about Caserta Concepts

Elliott Cordo (20 min)

Hadoop 2.0: The Evolution of
Hadoop, SQL, and NoSQL

Chief Architect, Caserta Concepts

7:50

Paul Dingman (20 min)
Chief Technologist, Actian Innovation Lab

Using Actian to process data in
Hadoop
The latest features of Actian to enable maximum
throughput

8:10

Tyler Mitchell (35 min)

See how it works!

Senior Engineer, Actian Innovation Lab

8:45

Q&A, More Networking (15 min)
Tell us what you’re up to…
About the BDW Meetup
• Big Data is a complex, rapidly changing

landscape
• We want to share our stories and hear

about yours
• Great networking opportunity for like

minded data nerds
• Opportunities to collaborate on exciting

projects
• Founded by Caserta Concepts, DW, BI &

Big Data Analytics Consulting
• Next BDW Meetup: February 10, 2014
• Data Governance on Big Data with Cloudera
About Caserta Concepts
Focused
Expertise
•
•
•
•

Big Data Analytics
Data Warehousing
Business Intelligence
Strategic Data
Ecosystems

Industries Served
•
•
•
•
•

Financial Services
Healthcare / Insurance
Retail / eCommerce
Digital Media / Marketing
K-12 / Higher Education

Founded in 2001
• President: Joe Caserta, industry thought leader,
consultant, educator and co-author, The Data
Warehouse ETL Toolkit (Wiley, 2004)
Implementation Expertise & Offerings
Strategic Roadmap/
Assessment/Consulting

Big Data
Analytics

Storm
Database

BI/Visualization/
Analytics

Master Data Management
Client Portfolio
Finance
& Insurance

Retail/eCommerce
& Manufacturing

Education
& Services
Caserta Partners
Hadoop Distributions

Platforms/ETL

Analytics & BI
Caserta Concepts
Listed as a Top 20 Most Promising
Data Analytics Consulting Company

CIOReview looked at hundreds of data analytics consulting companies and shortlisted
the ones who are at the forefront of tackling the real analytics challenges.
A distinguished panel comprising of CEOs, CIOs, VCs, industry analysts and the editorial
board of CIOReview selected the Final 20.
Opportunities
Does this word cloud excite you?

Speak with us about our open positions: jobs@casertaconcepts.com
BIG DATA 2.0, EVOLUTION OF HADOOP,
SQL, AND NOSQL
Elliott Cordo
Chief Architect, Caserta Concepts
Hadoop1.0
WHAT DID WE ACHIEVE
• Established Hadoop’s place in analytic architecture
• Realized cheap, reliable, scalable storage and processing
• Made us more data driven
• Store and process anything  new data types, structured, unstructured
• New types of analysis including Machine learning  Mahout
What did this mean to the Big Data
Warehouse
• Extending the Data Warehouse
• Establish new facts and “projections” in Hadoop on unstructured
and high volume data sources
• Hive, Impala
• Datameer

• BIG ETL -- Using MapReduce pipelines to process

massive amounts of data 
• Using our favorite ELT tool PIG

• Data storage for staging
• Reducing the costs and increasing the performance of our EDW
Where did it fall short
• Pretty much only MapReduce
• Batch oriented – not tuned to real time or interactive

processes  Look at what was achieved with Impala
side-stepping MR for SQL Queries on Hadoop
• Hive performance made users sad

• Legacy vendors were slow to adopt due to the massive

paradigm shift in their product architecture.
Hadoop 2.0 - what is the big deal
• YARN “Yet Another Resource Negotiator”

• Job Tracker and Task Tracker has been split up
• Increase scalability
• Remove MapReduce from core architecture
• Now there is a
• Global Resource Manager
• Per Application - Application Manger – Map Reduce will have it’s own
• Per node slave NodeManger (with per application container)
YARN – Why is it significant
• Provides a management layer between

Applications and Hadoop
• These applications could still be Map Reduce
• Or all sorts of applications such as Streaming, ETL Engines,

New Database engines all running NATIVELY in Hadoop!
• These applications can have access to HDFS and

safely contained by cluster resources.
• 1st generation impala ran OUTSIDE of Hadoop and competed

with cluster resources
• More intelligent use of cluster resources  not just

slots… more productivity out of the same hardware.
Why is it important we are moving beyond
map reduce?
• MapReduce is a generalized computing framework
• A query engine for instance can benefit from a “non-

generalized” pattern”, the flexibility isn’t fully needed
• In-memory/ disk data access
• Index usage

• Serialization
• Shuffling/ data movement

 again look at the Impala approach

• MapReduce is not suited well for other tasks such as real

time stream processing, iterative machine learning, graph
processing
ETL Can benefit from this approach too!
• ETL have broader scopes than query engines but gains

can be made from a purpose built processing framework
• Batch is not the only way! Streaming apps can now

interact with HDFS and be managed by cluster resources
• Storm
• SPARK

• Existing Assets: SIGNIFICANT existing IP more easily

leverage from both open source and commercial software
Back to Query Engines
MPP: Massively Parallel Processing - scalable,
distributed processing engines.
• Typically underlying storage is columnar in nature
(performance, compression, easier to distribute data)
• Present themselves relationally and handle all the brutal
work of aggregation, joins  ANSI Compliant SQL
• Impala, HAWQ– the industry is really just taking the

approach of building MPP’s on Hadoop
• Columnar storage: ORC, Parquet, Proprietary
• Advanced query optimizations
MPP’s leveraging dedicated storage
• Modern MPP’s like Actian’s Matrix are also taking

advantage of Hadoop
• Integrating tight integrations to Hadoop infrastructure

On Demand Integration
• Developing tools and frameworks that leverage YARN

heavy lifting  ETL
So, about NOSQL
• In “Big Data 1.0” NoSQL found it’s place as a mainstream

analytic store:
• Cassandra
• HBase
• Redis
• Riak

• They gave us raw, unbeatable performance for handling

realtime analytic workloads
NOSQL Use cases - BDW
• Highly scalable and flexible Staging, ODS Layers

• High performance analytic store  Real time data analytic

systems
• Recommendation, customer profile data  web-facing

performance characteristics, flexible schema
• BIG ETL Components  Reference data lookup cache,

stream joins
2.0 NOSQL Evolutions
SQL!!!
• Easier adoption
• Standardizing Interfaces
Cassandra CQL3
Pheonix on HBase
Evolving
• Greater flexibility on in-memory/disk persistence
• In memory will also likely usher more flexibility on server side

processes: Map Reduce, Aggregation, Joins
• Analytic support
So.. In conclusion
HADOOP IS THE NEW DATA OS?
What we have:
• A distributed file system
• A robust multitenant resource manager

• Generalized framework for distributed computing and data

processing
Even greater mainstream adoption of NOSQL
SQL rules!
THANK YOU
Elliott Cordo (elliott@casertaconcepts.com)
Chief Architect, Caserta Concepts

Weitere ähnliche Inhalte

Was ist angesagt?

Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014MapR Technologies
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Science Thailand
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
From Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseFrom Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseBui Ha
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It? Caserta
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaCaserta
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data WarehouseCaserta
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lakepunedevscom
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digitalsambiswal
 
Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...
Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...
Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...Cloudera, Inc.
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteMark van Rijmenam
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Lviv Startup Club
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data LakeRobert Chong
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 

Was ist angesagt? (20)

Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
From Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseFrom Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data Warehouse
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
 
Data lake
Data lakeData lake
Data lake
 
Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...
Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...
Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes Keynote
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data Lake
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 

Andere mochten auch

Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and InnovationCaserta
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalCaserta
 
Data Driven Decisions - Big Data Warehousing Meetup, FICO
Data Driven Decisions - Big Data Warehousing Meetup, FICOData Driven Decisions - Big Data Warehousing Meetup, FICO
Data Driven Decisions - Big Data Warehousing Meetup, FICOCaserta
 
Neo4j Solutions - Master Data Management
Neo4j Solutions - Master Data ManagementNeo4j Solutions - Master Data Management
Neo4j Solutions - Master Data ManagementCaserta
 
DGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityDGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityCaserta
 
Deploying a Governed Data Lake
Deploying a Governed Data LakeDeploying a Governed Data Lake
Deploying a Governed Data LakeWaterlineData
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
Big MDM Part 2: Using a Graph Database for MDM and Relationship ManagementBig MDM Part 2: Using a Graph Database for MDM and Relationship Management
Big MDM Part 2: Using a Graph Database for MDM and Relationship ManagementCaserta
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 
Webinar: Initiating a Customer MDM/Data Governance Program
Webinar: Initiating a Customer MDM/Data Governance ProgramWebinar: Initiating a Customer MDM/Data Governance Program
Webinar: Initiating a Customer MDM/Data Governance ProgramDATAVERSITY
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseCaserta
 

Andere mochten auch (14)

Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobal
 
Data Driven Decisions - Big Data Warehousing Meetup, FICO
Data Driven Decisions - Big Data Warehousing Meetup, FICOData Driven Decisions - Big Data Warehousing Meetup, FICO
Data Driven Decisions - Big Data Warehousing Meetup, FICO
 
Neo4j Solutions - Master Data Management
Neo4j Solutions - Master Data ManagementNeo4j Solutions - Master Data Management
Neo4j Solutions - Master Data Management
 
DGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityDGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data Quality
 
Deploying a Governed Data Lake
Deploying a Governed Data LakeDeploying a Governed Data Lake
Deploying a Governed Data Lake
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
Big MDM Part 2: Using a Graph Database for MDM and Relationship ManagementBig MDM Part 2: Using a Graph Database for MDM and Relationship Management
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 
Webinar: Initiating a Customer MDM/Data Governance Program
Webinar: Initiating a Customer MDM/Data Governance ProgramWebinar: Initiating a Customer MDM/Data Governance Program
Webinar: Initiating a Customer MDM/Data Governance Program
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
 

Ähnlich wie Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop

New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
 
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupReal Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupCaserta
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3xKinAnx
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Alex Gorbachev
 
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...Data Con LA
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationAdaryl "Bob" Wakefield, MBA
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonDremio Corporation
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranJAX London
 
Talend for big_data_intorduction
Talend for big_data_intorductionTalend for big_data_intorduction
Talend for big_data_intorductionLakshman Dhullipalla
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Andrew Brust
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & HadoopBlackvard
 

Ähnlich wie Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop (20)

New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupReal Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
 
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
Talend for big_data_intorduction
Talend for big_data_intorductionTalend for big_data_intorduction
Talend for big_data_intorduction
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Retail & CPG
Retail & CPGRetail & CPG
Retail & CPG
 

Mehr von Caserta

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by DatabricksCaserta
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsCaserta
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupCaserta
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWSCaserta
 

Mehr von Caserta (17)

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure Limitations
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWS
 

KĂźrzlich hochgeladen

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

KĂźrzlich hochgeladen (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop

  • 1. Big Data Warehousing January 20, 2014 Sponsored By: Today’s Topic: Big Data 2.0: YARN Distributed ETL & SQL with Hadoop
  • 2. Agenda 7:00 Networking (15 min) Grab some food and a drink... Make some friends. 7:15 Welcome + Intro President, Caserta Concepts 7:30 Joe Caserta (15 min) About the Meetup, about Caserta Concepts Elliott Cordo (20 min) Hadoop 2.0: The Evolution of Hadoop, SQL, and NoSQL Chief Architect, Caserta Concepts 7:50 Paul Dingman (20 min) Chief Technologist, Actian Innovation Lab Using Actian to process data in Hadoop The latest features of Actian to enable maximum throughput 8:10 Tyler Mitchell (35 min) See how it works! Senior Engineer, Actian Innovation Lab 8:45 Q&A, More Networking (15 min) Tell us what you’re up to…
  • 3. About the BDW Meetup • Big Data is a complex, rapidly changing landscape • We want to share our stories and hear about yours • Great networking opportunity for like minded data nerds • Opportunities to collaborate on exciting projects • Founded by Caserta Concepts, DW, BI & Big Data Analytics Consulting • Next BDW Meetup: February 10, 2014 • Data Governance on Big Data with Cloudera
  • 4. About Caserta Concepts Focused Expertise • • • • Big Data Analytics Data Warehousing Business Intelligence Strategic Data Ecosystems Industries Served • • • • • Financial Services Healthcare / Insurance Retail / eCommerce Digital Media / Marketing K-12 / Higher Education Founded in 2001 • President: Joe Caserta, industry thought leader, consultant, educator and co-author, The Data Warehouse ETL Toolkit (Wiley, 2004)
  • 5. Implementation Expertise & Offerings Strategic Roadmap/ Assessment/Consulting Big Data Analytics Storm Database BI/Visualization/ Analytics Master Data Management
  • 6. Client Portfolio Finance & Insurance Retail/eCommerce & Manufacturing Education & Services
  • 8. Caserta Concepts Listed as a Top 20 Most Promising Data Analytics Consulting Company CIOReview looked at hundreds of data analytics consulting companies and shortlisted the ones who are at the forefront of tackling the real analytics challenges. A distinguished panel comprising of CEOs, CIOs, VCs, industry analysts and the editorial board of CIOReview selected the Final 20.
  • 9. Opportunities Does this word cloud excite you? Speak with us about our open positions: jobs@casertaconcepts.com
  • 10. BIG DATA 2.0, EVOLUTION OF HADOOP, SQL, AND NOSQL Elliott Cordo Chief Architect, Caserta Concepts
  • 11. Hadoop1.0 WHAT DID WE ACHIEVE • Established Hadoop’s place in analytic architecture • Realized cheap, reliable, scalable storage and processing • Made us more data driven • Store and process anything  new data types, structured, unstructured • New types of analysis including Machine learning  Mahout
  • 12. What did this mean to the Big Data Warehouse • Extending the Data Warehouse • Establish new facts and “projections” in Hadoop on unstructured and high volume data sources • Hive, Impala • Datameer • BIG ETL -- Using MapReduce pipelines to process massive amounts of data  • Using our favorite ELT tool PIG • Data storage for staging • Reducing the costs and increasing the performance of our EDW
  • 13. Where did it fall short • Pretty much only MapReduce • Batch oriented – not tuned to real time or interactive processes  Look at what was achieved with Impala side-stepping MR for SQL Queries on Hadoop • Hive performance made users sad • Legacy vendors were slow to adopt due to the massive paradigm shift in their product architecture.
  • 14. Hadoop 2.0 - what is the big deal • YARN “Yet Another Resource Negotiator” • Job Tracker and Task Tracker has been split up • Increase scalability • Remove MapReduce from core architecture • Now there is a • Global Resource Manager • Per Application - Application Manger – Map Reduce will have it’s own • Per node slave NodeManger (with per application container)
  • 15. YARN – Why is it significant • Provides a management layer between Applications and Hadoop • These applications could still be Map Reduce • Or all sorts of applications such as Streaming, ETL Engines, New Database engines all running NATIVELY in Hadoop! • These applications can have access to HDFS and safely contained by cluster resources. • 1st generation impala ran OUTSIDE of Hadoop and competed with cluster resources • More intelligent use of cluster resources  not just slots… more productivity out of the same hardware.
  • 16. Why is it important we are moving beyond map reduce? • MapReduce is a generalized computing framework • A query engine for instance can benefit from a “non- generalized” pattern”, the flexibility isn’t fully needed • In-memory/ disk data access • Index usage • Serialization • Shuffling/ data movement  again look at the Impala approach • MapReduce is not suited well for other tasks such as real time stream processing, iterative machine learning, graph processing
  • 17. ETL Can benefit from this approach too! • ETL have broader scopes than query engines but gains can be made from a purpose built processing framework • Batch is not the only way! Streaming apps can now interact with HDFS and be managed by cluster resources • Storm • SPARK • Existing Assets: SIGNIFICANT existing IP more easily leverage from both open source and commercial software
  • 18. Back to Query Engines MPP: Massively Parallel Processing - scalable, distributed processing engines. • Typically underlying storage is columnar in nature (performance, compression, easier to distribute data) • Present themselves relationally and handle all the brutal work of aggregation, joins  ANSI Compliant SQL • Impala, HAWQ– the industry is really just taking the approach of building MPP’s on Hadoop • Columnar storage: ORC, Parquet, Proprietary • Advanced query optimizations
  • 19. MPP’s leveraging dedicated storage • Modern MPP’s like Actian’s Matrix are also taking advantage of Hadoop • Integrating tight integrations to Hadoop infrastructure On Demand Integration • Developing tools and frameworks that leverage YARN heavy lifting  ETL
  • 20. So, about NOSQL • In “Big Data 1.0” NoSQL found it’s place as a mainstream analytic store: • Cassandra • HBase • Redis • Riak • They gave us raw, unbeatable performance for handling realtime analytic workloads
  • 21. NOSQL Use cases - BDW • Highly scalable and flexible Staging, ODS Layers • High performance analytic store  Real time data analytic systems • Recommendation, customer profile data  web-facing performance characteristics, flexible schema • BIG ETL Components  Reference data lookup cache, stream joins
  • 22. 2.0 NOSQL Evolutions SQL!!! • Easier adoption • Standardizing Interfaces Cassandra CQL3 Pheonix on HBase Evolving • Greater flexibility on in-memory/disk persistence • In memory will also likely usher more flexibility on server side processes: Map Reduce, Aggregation, Joins • Analytic support
  • 23. So.. In conclusion HADOOP IS THE NEW DATA OS? What we have: • A distributed file system • A robust multitenant resource manager • Generalized framework for distributed computing and data processing Even greater mainstream adoption of NOSQL SQL rules!
  • 24. THANK YOU Elliott Cordo (elliott@casertaconcepts.com) Chief Architect, Caserta Concepts