SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
The Economics of SQL on
Hadoop

© 2013 Datameer, Inc. All rights reserved.
Watch the Recording of this Webinar


View the entire recorded webinar at:

http://info.datameer.com/SlideshareEconomics-SQL-Hadoop.html
About our Speakers
John Myers
!
John Myers joined Enterprise Management Associates
in 2011 as senior analyst of the business intelligence
(BI) practice area. John has 10+ years of experience
working in areas related to business analytics in
professional services consulting and product
development roles, as well as helping organizations
solve their business analytics problems, whether they
relate to operational platforms, such as customer care
or billing, or applied analytical applications, such as
revenue assurance or fraud management. !

Slide 3

© 2013 Datameer, Inc. All rights reserved.
About our Speakers
Stefan Groschupf!
!
▪  Stefan Groschupf is the co-founder and CEO of

Datameer. He is one of the original contributors to
Nutch, the open source predecessor of Hadoop,
Stefan has been at the forefront of the Hadoop and
Big Data market.
Prior to Datameer, Stefan was the co-founder and
CEO of Scale Unlimited, which implemented
custom Hadoop analytic solutions for HP, Sun,
Deutsche Telekom, Nokia and others. Earlier,
Stefan was CEO of 101Tec, a supplier of Hadoop
and Nutch-based search and text classification
software to industry-leading companies such as
Apple, DHL and EMI Music. Stefan has also served
as CTO at multiple companies, including Sproose,
a social search engine company.

Slide 4

© 2013 Datameer, Inc. All rights reserved.
About our Speakers
Matt Schumpert!
!
Matt has been working in enterprise software of
over 10 years in various capacities, including sales
engineering, strategic alliances and consulting.  !
!
Matt currently runs the pre-sales engineering team
at Datameer, supporting all technical aspects of
customer engagement through roll-out of customers
into production. !
 !
Matt holds a BS in Computer Science from the
University of Virginia.!

Slide 5

© 2013 Datameer, Inc. All rights reserved.
Agenda
▪  EMA on Current State of the Big Data Industry!
– 
– 
– 
– 
– 

Online Archiving in Practice!
SQL on NoSQL: Metadata!
Exploratory Use Cases!
Late Binding Schemas better for Discovery!
Economics of Hadoop!

▪  Datameer on how to solve these problems!
–  Use Case #1: Semi-Structured Data !
–  Use Case #2: Text Analytics data!
–  Use Case #3: Path Analysis!

▪  Takeaways; and Question and Answer!

Slide 6

© 2013 Datameer, Inc. All rights reserved.
State of Big Data Industry

© 2013 Datameer, Inc. All rights reserved.
Online Archiving is the majority use case for Big
Data projects

Slide 8

© 2013Enterprise Management Associates, Inc.
Moving Beyond select * from tablename
SQL requires a managed set of metadata

Slide 9

© 2013Enterprise Management Associates, Inc.
Big Data Platforms have Multiple Uses:
Discovery is a significant portion

Slide 10

© 2013Enterprise Management Associates, Inc.
Late Binding Schemas are good for Discovery

Slide 11

© 2013Enterprise Management Associates, Inc.
Free as a Free puppy…

Slide 12

© 2013 Enterprise Management Associates, Inc.
Datameer Demos

© 2013 Datameer, Inc. All rights reserved.
Use Case #1: Semi-Structured Data

▪  Noisy, log-structured data à signal

Slide 14

© 2013 Datameer, Inc. All rights reserved.
Use Case #1: Semi-Structured Data

▪  Noisy, log-structured data à signal
▪  Extract, cast, & define fields on demand

Slide 15

© 2013 Datameer, Inc. All rights reserved.
Use Case #1: Semi-Structured Data

▪  Noisy, log-structured data à signal
▪  Extract, cast, & define fields on demand
▪  Painful/impossible without inspection

Slide 16

© 2013 Datameer, Inc. All rights reserved.
Use Case #1: Semi-Structured Data

▪  Noisy, log-structured data à signal
▪  Extract, cast, & define fields on demand
▪  Painful/impossible without inspection
▪  “One-offs” are possible with SQL+UDFs
▪  But better to collaborate with shared “views”

Slide 17

© 2013 Datameer, Inc. All rights reserved.
Use Case #1: Semi-Structured Data

▪  Noisy, log-structured data à signal
▪  Extract, cast, & define fields on demand
▪  Painful/impossible without inspection
▪  “One-offs” are possible with SQL+UDFs
▪  But better to collaborate with shared “views”

▪  Examples:
▪  “User-agent” string
▪  URL Parameters 
▪  JSON
Slide 18

© 2013 Datameer, Inc. All rights reserved.
Use Case #2: Text Analytics
▪  Few/no known fields

Slide 19

© 2013 Datameer, Inc. All rights reserved.
Use Case #2: Text Analytics
▪  Few/no known fields
▪  Notion of a record is nebulous / fluid

Slide 20

© 2013 Datameer, Inc. All rights reserved.
Use Case #2: Text Analytics
▪  Few/no known fields
▪  Notion of a record is nebulous / fluid
▪  Wrangling and mining

Slide 21

© 2013 Datameer, Inc. All rights reserved.
Use Case #2: Text Analytics
▪  Few/no known fields
▪  Notion of a record is nebulous / fluid
▪  Wrangling and mining
▪  “Bag-of-Words” is a sensible start

Slide 22

© 2013 Datameer, Inc. All rights reserved.
Use Case #2: Text Analytics
▪  Few/no known fields
▪  Notion of a record is nebulous / fluid
▪  Wrangling and mining
▪  “Bag-of-Words” is a sensible start
▪  Again, frequent inspection is key

Slide 23

© 2013 Datameer, Inc. All rights reserved.
Use Case #3: Path Analysis 
▪  Key component of clickstream analysis

Slide 24

© 2013 Datameer, Inc. All rights reserved.
Use Case #3: Path Analysis 
▪  Key component of clickstream analysis
▪  Compares each record to the next/previous

Slide 25

© 2013 Datameer, Inc. All rights reserved.
Use Case #3: Path Analysis 
▪  Key component of clickstream analysis
▪  Compares each record to the next/previous
▪  Defines/summarizes transitions, not events

Slide 26

© 2013 Datameer, Inc. All rights reserved.
Use Case #3: Path Analysis 
▪  Key component of clickstream analysis
▪  Compares each record to the next/previous
▪  Defines/summarizes transitions, not events
▪  Supported by list/array types

Slide 27

© 2013 Datameer, Inc. All rights reserved.
Use Case #3: Path Analysis 
▪  Key component of clickstream analysis
▪  Compares each record to the next/previous
▪  Defines/summarizes transitions, not events
▪  Supported by list/array types
▪  Requires multi-pass queries

Slide 28

© 2013 Datameer, Inc. All rights reserved.
Takeaways

© 2013 Datameer, Inc. All rights reserved.
When NOT to use SQL on Hadoop
▪  Structured Schemas

or “Schema on Write”

Slide 30

© 2013 Datameer, Inc. All rights reserved.
When NOT to use SQL on Hadoop
▪  Structured Schemas

or “Schema on Write”
▪  “Realtime” Query
SLAs for operational
or reporting tasks

Slide 31

© 2013 Datameer, Inc. All rights reserved.
When NOT to use SQL on Hadoop
▪  Structured Schemas

or “Schema on Write”
▪  “Realtime” Query
SLAs for operational
or reporting tasks
▪  Highly detailed SQL
query requirements
(SQL-2003)

Slide 32

© 2013 Datameer, Inc. All rights reserved.
When to use SQL on Hadoop
▪  Unstructured

Datasets and
“Schema on Read”

Slide 33

© 2013 Datameer, Inc. All rights reserved.
When to use SQL on Hadoop
▪  Unstructured

Datasets and
“Schema on Read”
▪  Discovery tasks
designed to find new
connections and new
business value

Slide 34

© 2013 Datameer, Inc. All rights reserved.
When to use SQL on Hadoop
▪  Unstructured

Datasets and
“Schema on Read”
▪  Discovery tasks
designed to find new
connections and new
business value
▪  Lower level SQL
queries (SQL-99) 

Slide 35

© 2013 Datameer, Inc. All rights reserved.
Summary
▪  EMA on Current State of the Big Data Industry
–  Online Archiving in Practice
–  SQL on NoSQL: Metadata
–  Exploratory Use Cases
–  Late Binding Schemas better for Discovery

▪  Datameer on how to solve these problems
–  Use Case #1: Semi-Structured Data
–  Use Case #2: Text Analytics
–  Use Case #3: Path Analysis

Slide 36

© 2013 Datameer, Inc. All rights reserved.
Call To Action
■  Visit our website
–  www.datameer.com

■  Download our Trial
–  http://www.datameer.com/Datameer-trial.html

Slide 37

© 2013 Datameer, Inc. All rights reserved.
The Economics of SQL on Hadoop

Weitere ähnliche Inhalte

Ähnlich wie The Economics of SQL on Hadoop

The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudInside Analysis
 
How to do Data Science Without the Scientist
How to do Data Science Without the ScientistHow to do Data Science Without the Scientist
How to do Data Science Without the ScientistDatameer
 
How to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarHow to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarDatameer
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccionFran Navarro
 
Looking Before You Leap into the Cloud: A proactive approach to machine learn...
Looking Before You Leap into the Cloud: A proactive approach to machine learn...Looking Before You Leap into the Cloud: A proactive approach to machine learn...
Looking Before You Leap into the Cloud: A proactive approach to machine learn...Enterprise Management Associates
 
Customer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data AnalyticsCustomer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data AnalyticsDatameer
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsightsWilfried Hoge
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data SnapLogic
 
When SAP alone is not enough
When SAP alone is not enoughWhen SAP alone is not enough
When SAP alone is not enoughCloudera, Inc.
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoopDr. Wilfred Lin (Ph.D.)
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessionsJessicaMurrell3
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar ibi
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InSnapLogic
 
Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Datameer
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaNeo4j
 
Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondCloudera, Inc.
 
Transform Banking with Big Data and Automated Machine Learning 9.12.17
Transform Banking with Big Data and Automated Machine Learning 9.12.17Transform Banking with Big Data and Automated Machine Learning 9.12.17
Transform Banking with Big Data and Automated Machine Learning 9.12.17Cloudera, Inc.
 

Ähnlich wie The Economics of SQL on Hadoop (20)

The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the Cloud
 
How to do Data Science Without the Scientist
How to do Data Science Without the ScientistHow to do Data Science Without the Scientist
How to do Data Science Without the Scientist
 
How to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarHow to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics Webinar
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccion
 
Looking Before You Leap into the Cloud: A proactive approach to machine learn...
Looking Before You Leap into the Cloud: A proactive approach to machine learn...Looking Before You Leap into the Cloud: A proactive approach to machine learn...
Looking Before You Leap into the Cloud: A proactive approach to machine learn...
 
Customer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data AnalyticsCustomer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data Analytics
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsights
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
 
When SAP alone is not enough
When SAP alone is not enoughWhen SAP alone is not enough
When SAP alone is not enough
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessions
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
 
Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, Cloudera
 
Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
 
Transform Banking with Big Data and Automated Machine Learning 9.12.17
Transform Banking with Big Data and Automated Machine Learning 9.12.17Transform Banking with Big Data and Automated Machine Learning 9.12.17
Transform Banking with Big Data and Automated Machine Learning 9.12.17
 

Mehr von Datameer

Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data AnalyticsDatameer
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersDatameer
 
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...Datameer
 
Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data Datameer
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarDatameer
 
Webinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-EndWebinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-EndDatameer
 
Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Datameer
 
Why Use Hadoop?
Why Use Hadoop?Why Use Hadoop?
Why Use Hadoop?Datameer
 
Online Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarOnline Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarDatameer
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? Datameer
 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Datameer
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsDatameer
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopDatameer
 
Lean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseLean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseDatameer
 
Top 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataTop 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataDatameer
 
Best Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerBest Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerDatameer
 
How to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataHow to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataDatameer
 

Mehr von Datameer (17)

Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data Analytics
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business Managers
 
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
 
Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop Webinar
 
Webinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-EndWebinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-End
 
Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?
 
Why Use Hadoop?
Why Use Hadoop?Why Use Hadoop?
Why Use Hadoop?
 
Online Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarOnline Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics Webinar
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics?
 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data Analytics
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Lean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseLean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use Case
 
Top 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataTop 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big Data
 
Best Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerBest Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by Datameer
 
How to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataHow to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited Data
 

Kürzlich hochgeladen

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Kürzlich hochgeladen (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

The Economics of SQL on Hadoop

  • 1. The Economics of SQL on Hadoop © 2013 Datameer, Inc. All rights reserved.
  • 2. Watch the Recording of this Webinar View the entire recorded webinar at: http://info.datameer.com/SlideshareEconomics-SQL-Hadoop.html
  • 3. About our Speakers John Myers ! John Myers joined Enterprise Management Associates in 2011 as senior analyst of the business intelligence (BI) practice area. John has 10+ years of experience working in areas related to business analytics in professional services consulting and product development roles, as well as helping organizations solve their business analytics problems, whether they relate to operational platforms, such as customer care or billing, or applied analytical applications, such as revenue assurance or fraud management. ! Slide 3 © 2013 Datameer, Inc. All rights reserved.
  • 4. About our Speakers Stefan Groschupf! ! ▪  Stefan Groschupf is the co-founder and CEO of Datameer. He is one of the original contributors to Nutch, the open source predecessor of Hadoop, Stefan has been at the forefront of the Hadoop and Big Data market. Prior to Datameer, Stefan was the co-founder and CEO of Scale Unlimited, which implemented custom Hadoop analytic solutions for HP, Sun, Deutsche Telekom, Nokia and others. Earlier, Stefan was CEO of 101Tec, a supplier of Hadoop and Nutch-based search and text classification software to industry-leading companies such as Apple, DHL and EMI Music. Stefan has also served as CTO at multiple companies, including Sproose, a social search engine company. Slide 4 © 2013 Datameer, Inc. All rights reserved.
  • 5. About our Speakers Matt Schumpert! ! Matt has been working in enterprise software of over 10 years in various capacities, including sales engineering, strategic alliances and consulting.  ! ! Matt currently runs the pre-sales engineering team at Datameer, supporting all technical aspects of customer engagement through roll-out of customers into production. !  ! Matt holds a BS in Computer Science from the University of Virginia.! Slide 5 © 2013 Datameer, Inc. All rights reserved.
  • 6. Agenda ▪  EMA on Current State of the Big Data Industry! –  –  –  –  –  Online Archiving in Practice! SQL on NoSQL: Metadata! Exploratory Use Cases! Late Binding Schemas better for Discovery! Economics of Hadoop! ▪  Datameer on how to solve these problems! –  Use Case #1: Semi-Structured Data ! –  Use Case #2: Text Analytics data! –  Use Case #3: Path Analysis! ▪  Takeaways; and Question and Answer! Slide 6 © 2013 Datameer, Inc. All rights reserved.
  • 7. State of Big Data Industry © 2013 Datameer, Inc. All rights reserved.
  • 8. Online Archiving is the majority use case for Big Data projects Slide 8 © 2013Enterprise Management Associates, Inc.
  • 9. Moving Beyond select * from tablename SQL requires a managed set of metadata Slide 9 © 2013Enterprise Management Associates, Inc.
  • 10. Big Data Platforms have Multiple Uses: Discovery is a significant portion Slide 10 © 2013Enterprise Management Associates, Inc.
  • 11. Late Binding Schemas are good for Discovery Slide 11 © 2013Enterprise Management Associates, Inc.
  • 12. Free as a Free puppy… Slide 12 © 2013 Enterprise Management Associates, Inc.
  • 13. Datameer Demos © 2013 Datameer, Inc. All rights reserved.
  • 14. Use Case #1: Semi-Structured Data ▪  Noisy, log-structured data à signal Slide 14 © 2013 Datameer, Inc. All rights reserved.
  • 15. Use Case #1: Semi-Structured Data ▪  Noisy, log-structured data à signal ▪  Extract, cast, & define fields on demand Slide 15 © 2013 Datameer, Inc. All rights reserved.
  • 16. Use Case #1: Semi-Structured Data ▪  Noisy, log-structured data à signal ▪  Extract, cast, & define fields on demand ▪  Painful/impossible without inspection Slide 16 © 2013 Datameer, Inc. All rights reserved.
  • 17. Use Case #1: Semi-Structured Data ▪  Noisy, log-structured data à signal ▪  Extract, cast, & define fields on demand ▪  Painful/impossible without inspection ▪  “One-offs” are possible with SQL+UDFs ▪  But better to collaborate with shared “views” Slide 17 © 2013 Datameer, Inc. All rights reserved.
  • 18. Use Case #1: Semi-Structured Data ▪  Noisy, log-structured data à signal ▪  Extract, cast, & define fields on demand ▪  Painful/impossible without inspection ▪  “One-offs” are possible with SQL+UDFs ▪  But better to collaborate with shared “views” ▪  Examples: ▪  “User-agent” string ▪  URL Parameters ▪  JSON Slide 18 © 2013 Datameer, Inc. All rights reserved.
  • 19. Use Case #2: Text Analytics ▪  Few/no known fields Slide 19 © 2013 Datameer, Inc. All rights reserved.
  • 20. Use Case #2: Text Analytics ▪  Few/no known fields ▪  Notion of a record is nebulous / fluid Slide 20 © 2013 Datameer, Inc. All rights reserved.
  • 21. Use Case #2: Text Analytics ▪  Few/no known fields ▪  Notion of a record is nebulous / fluid ▪  Wrangling and mining Slide 21 © 2013 Datameer, Inc. All rights reserved.
  • 22. Use Case #2: Text Analytics ▪  Few/no known fields ▪  Notion of a record is nebulous / fluid ▪  Wrangling and mining ▪  “Bag-of-Words” is a sensible start Slide 22 © 2013 Datameer, Inc. All rights reserved.
  • 23. Use Case #2: Text Analytics ▪  Few/no known fields ▪  Notion of a record is nebulous / fluid ▪  Wrangling and mining ▪  “Bag-of-Words” is a sensible start ▪  Again, frequent inspection is key Slide 23 © 2013 Datameer, Inc. All rights reserved.
  • 24. Use Case #3: Path Analysis ▪  Key component of clickstream analysis Slide 24 © 2013 Datameer, Inc. All rights reserved.
  • 25. Use Case #3: Path Analysis ▪  Key component of clickstream analysis ▪  Compares each record to the next/previous Slide 25 © 2013 Datameer, Inc. All rights reserved.
  • 26. Use Case #3: Path Analysis ▪  Key component of clickstream analysis ▪  Compares each record to the next/previous ▪  Defines/summarizes transitions, not events Slide 26 © 2013 Datameer, Inc. All rights reserved.
  • 27. Use Case #3: Path Analysis ▪  Key component of clickstream analysis ▪  Compares each record to the next/previous ▪  Defines/summarizes transitions, not events ▪  Supported by list/array types Slide 27 © 2013 Datameer, Inc. All rights reserved.
  • 28. Use Case #3: Path Analysis ▪  Key component of clickstream analysis ▪  Compares each record to the next/previous ▪  Defines/summarizes transitions, not events ▪  Supported by list/array types ▪  Requires multi-pass queries Slide 28 © 2013 Datameer, Inc. All rights reserved.
  • 29. Takeaways © 2013 Datameer, Inc. All rights reserved.
  • 30. When NOT to use SQL on Hadoop ▪  Structured Schemas or “Schema on Write” Slide 30 © 2013 Datameer, Inc. All rights reserved.
  • 31. When NOT to use SQL on Hadoop ▪  Structured Schemas or “Schema on Write” ▪  “Realtime” Query SLAs for operational or reporting tasks Slide 31 © 2013 Datameer, Inc. All rights reserved.
  • 32. When NOT to use SQL on Hadoop ▪  Structured Schemas or “Schema on Write” ▪  “Realtime” Query SLAs for operational or reporting tasks ▪  Highly detailed SQL query requirements (SQL-2003) Slide 32 © 2013 Datameer, Inc. All rights reserved.
  • 33. When to use SQL on Hadoop ▪  Unstructured Datasets and “Schema on Read” Slide 33 © 2013 Datameer, Inc. All rights reserved.
  • 34. When to use SQL on Hadoop ▪  Unstructured Datasets and “Schema on Read” ▪  Discovery tasks designed to find new connections and new business value Slide 34 © 2013 Datameer, Inc. All rights reserved.
  • 35. When to use SQL on Hadoop ▪  Unstructured Datasets and “Schema on Read” ▪  Discovery tasks designed to find new connections and new business value ▪  Lower level SQL queries (SQL-99) Slide 35 © 2013 Datameer, Inc. All rights reserved.
  • 36. Summary ▪  EMA on Current State of the Big Data Industry –  Online Archiving in Practice –  SQL on NoSQL: Metadata –  Exploratory Use Cases –  Late Binding Schemas better for Discovery ▪  Datameer on how to solve these problems –  Use Case #1: Semi-Structured Data –  Use Case #2: Text Analytics –  Use Case #3: Path Analysis Slide 36 © 2013 Datameer, Inc. All rights reserved.
  • 37. Call To Action ■  Visit our website –  www.datameer.com ■  Download our Trial –  http://www.datameer.com/Datameer-trial.html Slide 37 © 2013 Datameer, Inc. All rights reserved.

Hinweis der Redaktion

  1. According to 2012 EMA research, Online Archiving, or Hadumping, is the Phase “zero” of most Big Data initiatives Teaches Internal teams about the data delivery and structure How to interact with the data How to apply data to business cases as opposed to simply a technology project It is the where you start when: “you don’t know what you don’t know…” 2013 EMA Research shows that over half of Big Data projects have online archiving as an ‘In Operation’ status In Production or as a Pilot Project with hands on keyboards. Software installed. Over 4 in 10 respondents say “Economics” are a Business Reason for Online Archiving Use Case. These organizations are attempting to lower their operational costs
  2. Moving beyond select * requires a standard requires a facility that manages and tracks metadata Select * tablename is the rough equivalent to cat filename SQL starts to become truly “special” when you use a query such as Select t.columnA, s.columnB, s.columnC from tablename t tablename s Where t.columnZ = s.column.X NoSQL and specifically Hadoop have focused on the ability to be flexible in data storage often at the expense of metadata management SQL doesn’t do with an “or” data structure (image on right) SQL works best with a defined data structure (image on right) When you ask Hive a question it doesn’t understand…. You get the error message. In2013 EMA Research Big Data initiatives used the following datasets Machine generated (JSON, XML, etc) almost 40% Process mediated (structured) just under 30% Human sourced (emails, texts,) over 30% Over 30% of respondents indicate that a lack of self-service data access (SQL) is a challenge to operate a Hadoop platform Nearly 40% of respondents say a lack of SQL data access is a challenge to operate a NoSQL platform In each of these instances, it indicates that while you “CAN” perform certain applications on Hadoop, SQL-based data access is a high concern.
  3. Big Data environments aren’t just for EDW replacement as some would say There are multiple use cases Operational Analytical Exploratory Nearly 3 of 10 respondents in 2013 research say that they are using Exploratory or Discovery use cases Just under 50% of respondents say operational costs (staff head count is included) are a challenge to operate a discovery platform. 3 of 10 respondents want to utilize the features and functions of products to speed their skills acquisition. Often times these are features that they feel most comfortable with. Interfaces and processes that they use every day. MS Excel is an example. Nearly 4 out 10 respondents indicate new skills development is a challenge to operate a discovery platform
  4. When you are using exploratory or discovery use cases, you need flexibility… applying a hard schema (structured) presupposes particular questions AND answers. Square wooden peg and round wooden hole – not a lot of give. Being able to apply a schema or structure at the time of query or late binding schema enables the best method of discovery Flexible schema at the time of processing…. Sausage grinder 2013 EMA research says Over 30% of respondents use late binding schemas when processing data Nearly a third use multiple approaches Over 10% don’t apply a schema at all… “Only” about one third of Respondents are using external technical resources to bridge their skills gaps. This comes from the costs associated with the outside consultants vs existing staff
  5. “Free as in Speech” or “Free as in Beer”… Big Data is “Free as a Free Puppy” Over 40% of respondents say Economics are a Business Reason for Online Archiving Use Case Back to Metadata…. Over one third of respondents indicate shortage of technical metadata a challenge to operate a discovery platform. Applying that technical metadata layer takes a manual effort and thus additional headcount. When you link this to ‘only’ a 1% increase in big data budget from 2013 to 2014 for Hadoop implementations, it is important to put the best use for hadoop platforms. 36% implementation time to implement is a challenge to operate a hadoop platform 43% say operational costs are a challenge to operate a discovery platform (link to a 1% increase in big data operational budget from 2013 to 2014) Over one third of respondents say they lack the skills to manage multi-structured data platforms as an obstacle to implement (Top answer)