SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
© 2014 Datameer, Inc. All rights reserved.
Analyzing Unstructured Data in
Hadoop!
View Recording !!

You can view the recording of this webinar
at:

http://info.datameer.com/Online-Slideshare-
Analyzing-Unstructured-Data-in-Hadoop-
On-Demand.html
© 2013 Datameer, Inc. All rights reserved.
Matt Schumpert @datameer
Senior Director, Solutions Engineering

Matt has been working in the enterprise
infrastructure software space for over 14 years in
various capacities, including sales engineering,
strategic alliances and consulting.

Matt currently runs the pre-sales engineering team at
Datameer, supporting all technical aspects of
customer engagement from initial contact through
roll-out of customers into production.

Matt holds a BS in Computer Science from the
University of Virginia. 
#datameer @datameer
About Our Speaker!
Agenda!
•  Market & Data trends
•  Tuning into new channels
•  The good news
•  The rise of wrangling
•  Analytics requirements
•  Bringing order to chaos
•  Use Cases
What we learned in 2010… (or before)!
Market & Data Trends!
•  Data volumes will grow 800% in 5 years
•  Unstructured data is growing 62% faster
•  80% of all data will be unstructured in 2019
•  “Big Unstructured Data” requires new tech.
•  85% of the Fortune 500 will be unable to exploit
Big Data for competitive advantage through 2015
Source: Gartner
Market & Data Trends!
•  ‘Multi-structured’ is the word of the day
•  Mainstream IT tools broadening the base
•  Competitive advantage lies outside your firewall!
S U
Tuning Into New Channels!
Tuning Into New Channels!
•  Public & social data is available by the firehose
•  The new discipline: connecting, filtering, switching
•  Find the right keywords, dictionaries, segments
•  Learn from, but don’t emulate search engines
•  Beware of point solutions
The Good News!
•  All data has structure
•  Storage is cheap (Hadoop ~= $300 / TB)
•  Processing is cheap (“free”)
•  Unstructured data compresses well
•  Data APIs abound
•  Public data blossoming (data.gov, etc.)
The Rise of Wrangling!
•  A ‘record’ is no longer a record
•  Event streams need different angles of attack
•  Explode, project, align, window, search
•  New companies/technologies specializing in it
Source: Gartner
Analytics Requirements (1)!
•  A scalable Big Data foundation (Hadoop)
•  Schema-on-read
•  Data profiling & cleansing
•  Fast, visual iteration over samples
Source: Gartner
Analytics Requirements (2)!
•  Text mining, without programing
•  Helper functions for semi/un-structured formats
•  Data connectors, new visualizations
•  Patience, and a an culture of data discovery
Datameer:!
End-to-End Big Data Analytics!
Enterprise Integration!
Bringing Order to Chaos!
•  ‘Big Data Visualization’ is an oxymoron
•  Rich, detailed summaries are the goal
•  ‘It’s the analytics, stupid’
Industry Use Cases!
•  Retail: Competitive pricing through web scraping
•  MFG: Product sentiment through Twitter
•  FSI: Brand preferences from Facebook “likes”
•  Gov: Nefarious behavior through email seizure!
For more information!

http://www.datameer.com



" @datameer
" mschumpert@datameer.com

Learn more
Contact
#datameer @datameer

Weitere ähnliche Inhalte

Was ist angesagt?

Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera, Inc.
 
Predictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupPredictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupCaserta
 
Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchKlaas Bosteels
 
Objectivity/DB: A Multipurpose NoSQL Database
Objectivity/DB: A Multipurpose NoSQL DatabaseObjectivity/DB: A Multipurpose NoSQL Database
Objectivity/DB: A Multipurpose NoSQL DatabaseInfiniteGraph
 
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...BigDataEverywhere
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalHarvinder Atwal
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BIDeZyre
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science TeamsEMC
 
Presumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessPresumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessInside Analysis
 
Modernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyModernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyCloudera, Inc.
 
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...GetInData
 
Platfora Girl Geek Dinner
Platfora Girl Geek DinnerPlatfora Girl Geek Dinner
Platfora Girl Geek DinnerPlatfora
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) Dataiku
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
Conflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big DataConflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big DataHalo BI
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Technologies
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaCloudera, Inc.
 
Analytics Solutions from SAP
Analytics Solutions from SAPAnalytics Solutions from SAP
Analytics Solutions from SAPSAP Analytics
 

Was ist angesagt? (20)

Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learning
 
Predictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupPredictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing Meetup
 
Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from Scratch
 
Objectivity/DB: A Multipurpose NoSQL Database
Objectivity/DB: A Multipurpose NoSQL DatabaseObjectivity/DB: A Multipurpose NoSQL Database
Objectivity/DB: A Multipurpose NoSQL Database
 
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BI
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
 
Presumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessPresumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of Success
 
Modernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyModernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data Strategy
 
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
 
Platfora Girl Geek Dinner
Platfora Girl Geek DinnerPlatfora Girl Geek Dinner
Platfora Girl Geek Dinner
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Conflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big DataConflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big Data
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and Cloudera
 
Analytics Solutions from SAP
Analytics Solutions from SAPAnalytics Solutions from SAP
Analytics Solutions from SAP
 

Andere mochten auch

Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataSeth Grimes
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Datameer
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016George Roth
 
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...Peter Wren-Hilton
 
Dealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityDealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityGreat Wide Open
 
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured DataHotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured DataMarco Gralike
 
Lecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data WarehouseLecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data Warehousephanleson
 
The Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the DataThe Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the DataHealth Catalyst
 
Israel redefining innovation at International CES 2015
Israel redefining innovation at International CES 2015Israel redefining innovation at International CES 2015
Israel redefining innovation at International CES 2015FSJU AUJF
 
Windows Azure Mobile Services
Windows Azure Mobile ServicesWindows Azure Mobile Services
Windows Azure Mobile ServicesJan Hentschel
 
Service Cloud für Fortgeschrittene – Die Roadmap für 2012
Service Cloud für Fortgeschrittene – Die Roadmap für 2012Service Cloud für Fortgeschrittene – Die Roadmap für 2012
Service Cloud für Fortgeschrittene – Die Roadmap für 2012Salesforce Deutschland
 
Model-Driven Software Development 2.0
Model-Driven Software Development 2.0Model-Driven Software Development 2.0
Model-Driven Software Development 2.0Etienne Juliot
 
Model Driven Software Development - Data Model Evolution
Model Driven Software Development - Data Model EvolutionModel Driven Software Development - Data Model Evolution
Model Driven Software Development - Data Model EvolutionSander Vermolen
 
iPhonical and model-driven software development for the iPhone
iPhonical and model-driven software development for the iPhoneiPhonical and model-driven software development for the iPhone
iPhonical and model-driven software development for the iPhoneHeiko Behrens
 
Unstructured Data in BI
Unstructured Data in BIUnstructured Data in BI
Unstructured Data in BIMonaheng Diaho
 
Agile MDD
Agile MDDAgile MDD
Agile MDDfntnhd
 

Andere mochten auch (20)

Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016
 
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
 
Dealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityDealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to Infinity
 
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured DataHotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
 
Lecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data WarehouseLecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data Warehouse
 
The Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the DataThe Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the Data
 
Israel redefining innovation at International CES 2015
Israel redefining innovation at International CES 2015Israel redefining innovation at International CES 2015
Israel redefining innovation at International CES 2015
 
Windows Azure Mobile Services
Windows Azure Mobile ServicesWindows Azure Mobile Services
Windows Azure Mobile Services
 
Service Cloud für Fortgeschrittene – Die Roadmap für 2012
Service Cloud für Fortgeschrittene – Die Roadmap für 2012Service Cloud für Fortgeschrittene – Die Roadmap für 2012
Service Cloud für Fortgeschrittene – Die Roadmap für 2012
 
Model-Driven Software Development 2.0
Model-Driven Software Development 2.0Model-Driven Software Development 2.0
Model-Driven Software Development 2.0
 
Datameer
DatameerDatameer
Datameer
 
Model Driven Software Development - Data Model Evolution
Model Driven Software Development - Data Model EvolutionModel Driven Software Development - Data Model Evolution
Model Driven Software Development - Data Model Evolution
 
iPhonical and model-driven software development for the iPhone
iPhonical and model-driven software development for the iPhoneiPhonical and model-driven software development for the iPhone
iPhonical and model-driven software development for the iPhone
 
IN4308 1
IN4308 1IN4308 1
IN4308 1
 
Unstructured Data in BI
Unstructured Data in BIUnstructured Data in BI
Unstructured Data in BI
 
APEX 5.0, und sonst?
APEX 5.0, und sonst?APEX 5.0, und sonst?
APEX 5.0, und sonst?
 
Agile MDD
Agile MDDAgile MDD
Agile MDD
 

Ähnlich wie Analyzing Unstructured Data in Hadoop Webinar

Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeCaserta
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and InnovationCaserta
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It? Caserta
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AIGary Allemann
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US InformationJulian Tong
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterInside Analysis
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Data In Action: Business Value of Data
Data In Action: Business Value of DataData In Action: Business Value of Data
Data In Action: Business Value of DataMatt Turner
 
Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09Hortonworks
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseDatabricks
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityPrecisely
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 

Ähnlich wie Analyzing Unstructured Data in Hadoop Webinar (20)

Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Data In Action: Business Value of Data
Data In Action: Business Value of DataData In Action: Business Value of Data
Data In Action: Business Value of Data
 
Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 

Mehr von Datameer

Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data Datameer
 
Webinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-EndWebinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-EndDatameer
 
Why Use Hadoop?
Why Use Hadoop?Why Use Hadoop?
Why Use Hadoop?Datameer
 
Online Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarOnline Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarDatameer
 
Instant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of AnalysisInstant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of AnalysisDatameer
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? Datameer
 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Datameer
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsDatameer
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopDatameer
 
Lean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseLean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseDatameer
 
The Economics of SQL on Hadoop
The Economics of SQL on HadoopThe Economics of SQL on Hadoop
The Economics of SQL on HadoopDatameer
 
Top 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataTop 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataDatameer
 
How to do Data Science Without the Scientist
How to do Data Science Without the ScientistHow to do Data Science Without the Scientist
How to do Data Science Without the ScientistDatameer
 
How to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataHow to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataDatameer
 

Mehr von Datameer (14)

Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data
 
Webinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-EndWebinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-End
 
Why Use Hadoop?
Why Use Hadoop?Why Use Hadoop?
Why Use Hadoop?
 
Online Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarOnline Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics Webinar
 
Instant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of AnalysisInstant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of Analysis
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics?
 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data Analytics
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Lean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseLean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use Case
 
The Economics of SQL on Hadoop
The Economics of SQL on HadoopThe Economics of SQL on Hadoop
The Economics of SQL on Hadoop
 
Top 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataTop 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big Data
 
How to do Data Science Without the Scientist
How to do Data Science Without the ScientistHow to do Data Science Without the Scientist
How to do Data Science Without the Scientist
 
How to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataHow to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited Data
 

Kürzlich hochgeladen

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Kürzlich hochgeladen (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Analyzing Unstructured Data in Hadoop Webinar

  • 1. © 2014 Datameer, Inc. All rights reserved. Analyzing Unstructured Data in Hadoop!
  • 2. View Recording !! You can view the recording of this webinar at: http://info.datameer.com/Online-Slideshare- Analyzing-Unstructured-Data-in-Hadoop- On-Demand.html
  • 3. © 2013 Datameer, Inc. All rights reserved. Matt Schumpert @datameer Senior Director, Solutions Engineering Matt has been working in the enterprise infrastructure software space for over 14 years in various capacities, including sales engineering, strategic alliances and consulting. Matt currently runs the pre-sales engineering team at Datameer, supporting all technical aspects of customer engagement from initial contact through roll-out of customers into production. Matt holds a BS in Computer Science from the University of Virginia.  #datameer @datameer About Our Speaker!
  • 4. Agenda! •  Market & Data trends •  Tuning into new channels •  The good news •  The rise of wrangling •  Analytics requirements •  Bringing order to chaos •  Use Cases
  • 5. What we learned in 2010… (or before)!
  • 6. Market & Data Trends! •  Data volumes will grow 800% in 5 years •  Unstructured data is growing 62% faster •  80% of all data will be unstructured in 2019 •  “Big Unstructured Data” requires new tech. •  85% of the Fortune 500 will be unable to exploit Big Data for competitive advantage through 2015 Source: Gartner
  • 7. Market & Data Trends! •  ‘Multi-structured’ is the word of the day •  Mainstream IT tools broadening the base •  Competitive advantage lies outside your firewall! S U
  • 8. Tuning Into New Channels!
  • 9. Tuning Into New Channels! •  Public & social data is available by the firehose •  The new discipline: connecting, filtering, switching •  Find the right keywords, dictionaries, segments •  Learn from, but don’t emulate search engines •  Beware of point solutions
  • 10. The Good News! •  All data has structure •  Storage is cheap (Hadoop ~= $300 / TB) •  Processing is cheap (“free”) •  Unstructured data compresses well •  Data APIs abound •  Public data blossoming (data.gov, etc.)
  • 11. The Rise of Wrangling! •  A ‘record’ is no longer a record •  Event streams need different angles of attack •  Explode, project, align, window, search •  New companies/technologies specializing in it Source: Gartner
  • 12. Analytics Requirements (1)! •  A scalable Big Data foundation (Hadoop) •  Schema-on-read •  Data profiling & cleansing •  Fast, visual iteration over samples Source: Gartner
  • 13. Analytics Requirements (2)! •  Text mining, without programing •  Helper functions for semi/un-structured formats •  Data connectors, new visualizations •  Patience, and a an culture of data discovery
  • 16. Bringing Order to Chaos! •  ‘Big Data Visualization’ is an oxymoron •  Rich, detailed summaries are the goal •  ‘It’s the analytics, stupid’
  • 17. Industry Use Cases! •  Retail: Competitive pricing through web scraping •  MFG: Product sentiment through Twitter •  FSI: Brand preferences from Facebook “likes” •  Gov: Nefarious behavior through email seizure!
  • 18.
  • 19. For more information! http://www.datameer.com " @datameer " mschumpert@datameer.com Learn more Contact #datameer @datameer