SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
TRILLIUM SOFTWARE 2013 CUSTOMER CONFERENCE
(Who’s Afraid of…)
The Big Bad Data Wolf?
The Big Bad Data Challenge – Big Data & the
Data Quality Imperative
Presented By:
Nigel Turner
VP Information
Management Strategy
1
The tale of the Three Little Pigs
2© Copyright 2013, Trillium Software, Inc. All rights reserved.
Big Data – what is it?
 Set of new concepts, practices & technologies to
manage & exploit digital data
 Can be defined as:
 “Data that exceeds the processing capability of
conventional database systems. The data is too big,
moves too fast, or doesn’t fit the strictures of your
database architecture”
(Source: Ed Dumbill – O’Reilly Community)
 Its key premise is that all data has potential
value if it can be collected, analysed and used to
generate actionable insight
3
3
3© Copyright 2013, Trillium Software, Inc. All rights reserved.
Where does Big Data come from?
SOCIAL
MEDIA &
SOCIAL
NETWORKS
MACHINE
GENERATED
WIDELY KNOWN
SOURCES
4
4
4
© Copyright 2013, Trillium Software, Inc. All rights reserved.
What’s different about
Big Data?
 New technologies which enable distributed & highly
scalable MPP (Massively Parallel Processing), e.g.
 Apache Hadoop
 MapReduce
 NoSQL databases
 Strong emphasis on analytical approaches
 Emergence of “data science”
 Predictive Analytics
 Data Mining
 The “democratisation” of data
 Data made available to all (cf Cloud Computing)
 Business and not IT led BI
5
Big Data & Data Quality – parallel
worlds?
6
BIG
DATA
DATA
QUALITY
© Copyright 2013, Trillium Software, Inc. All rights reserved.
Parallel worlds… or are they (1)?
7
Shared with 100,000+
others and counting…
Parallel worlds… or are they (2)?
8
“ I spend the vast majority of my time cleaning
data systems…cleaning and preparing
data sets makes everything I do better
… it’s the highest value activity I do”
Josh Wills
Senior Director of Data Science
Cloudera
(From “Training a new generation of
Data Scientists” – Cloudera video)
When Big Data & Data Quality
worlds collide…
9
Big Data will
expose Data Quality
shortcomings
Poor Data Quality
will undermine the
value of Big Data
investments
Big Data – building on solid
foundations
BIG DATA / ANALYTICS
DATA QUALITY FOUNDATION
10
The 3Vs and the DQ challenge
• Exponential growth of data – predicted 40-60% per
annum
• 2.5 quintillion bytes of data are created every day
• 90% of all digital data created in the last two years
• Data generated more varied and complex than before:
– Text, Audio, Images, Machine Generated etc.
• Much of this data is semi-structured or unstructured
• Traditional IT techniques ill equipped to process &
analyse it
• Data often generated in real time
• Analysis and response needs to be rapid, often also
real time
• Traditional BI / DW environments cannot cope – new
approaches are needed
11
11
Big Data –
Foundations of Success
 Identifying the right data to solve the business
problem or opportunity
 The ability to integrate & match varied data from
multiple data sources
 structured, semi-structured, unstructured
 Building the right IT infrastructure to support Big
Data applications
 Having the right capabilities & skills to exploit
the data
12
12
Big Data – some vertical
applications
 Retail: using point of sale & social media data to
supplement & enrich traditional CRM / Marketing data
 Insurance & Banking: fraud detection
 Health: holistic patient analysis
 Utilities: consumption peaks & troughs & capacity
planning
 Telcos: call routing optimisation & customer churn
 Manufacturing: predictive fault identification & supply
chain optimisation
 Research: particle analysis, genomics etc.
13
Example Big Data benefit:
The Open Big Data Cloud
14
SOURCE: LINKED OPEN DATA (LOD) COMMUNITY
Big Data in practice - Volvo
 Every Volvo vehicle has hundreds of
microprocessors / sensors
 Data generated used within the car itself
but also captured for analysis by Volvo
and its dealers
 All data is loaded into a centralised
analysis hub & integrated with CRM,
dealership, product & social network data
 Used to optimise design & manufacturing,
enhance customer interaction, improve
safety & act on customer feedback
15
Big Data – Barriers & Pitfalls
 The sheer volume of data – what’s worth using?
 Data extraction challenges
 The ability to match data from disparate sources
/ formats / media
 The time taken to integrate new data sources
 The risks of mismatching and incorrect
identification of individuals
 Legal & regulatory pitfalls
 Security concerns – corporate & individual
 Lack of skills & expertise
16
16
Big Data – the data integration
challenge
SOCIAL
MEDIA
SENSORS
OPEN
DATA
EMAIL
MOBILES
EXTERNALDATASOURCES
INTERNALDATASOURCES
CRM
BILLING
OPS
SALES
PRODS
ANALYTICS PLATFORM 1
ANALYTICS PLATFORM 2
ANALYTICS PLATFORM 3
ANALYTICS PLATFORM n
ACTIONABLE INSIGHT & KNOWLEDGE
17
Big Data – the Data Quality
Imperative (1)
 Need to profile external and internal data sources
 Need to classify data to define what data really
matters
 Need to assure the quality of internal (and some
external) data sources for accuracy, completeness,
consistency
 Need to define & apply business rules & metadata
management to how the data will be defined and
used
 Need for a data governance framework to ensure
consistency & control
18
Big Data – the Data Quality
Imperative (2)
 Need processes & tools to enable:
 Source data profiling
 Data integration
 Data parsing
 Data standardisation
 Business rule creation & management
 Metadata management & a shared business / IT glossary
 Data de-duplication
 Data normalisation
 Data matching
 Data enrichment
 Data audit
 Many of these functions must be capable of
being carried out in real time with zero lag
19
Big Data – DQ as the key enabler
SOCIAL
MEDIA
SENSOR
S
OPEN
DATA
EMAIL
EXTERNALDATASOURCES
INTERNALDATASOURCES
CRM
BILLING
OPS
SALES
PRODS
ANALYTICS PLATFORM 1
ANALYTICS PLATFORM 2
ANALYTICS PLATFORM 3
ANALYTICS PLATFORM n
ACTIONABLE INSIGHT & KNOWLEDGE
PROFILE
PARSE
STANDARDISE
MATCH
ENRICH
DATA QUALITY PLATFORM
PROFILE
PARSE
STANDARDISE
MATCH
ENRICH
MOBILES
20
Big Data – some algorithms
1. BIG DATA + POOR DATA QUALITY = BIG
PROBLEMS
2. DATA DEMOCRITISATION – DATA GOVERNANCE =
ANARCHY
3. DATA MASH UPS – DATA QUALITY = DATA MESS
4. BIG DATA ANALYTICS + POOR DQ = WRONG
RESULTS
5. BIG DATA – DATA ASSURANCE = JAIL
6. 3V + DATA QUALITY = 4V (VALIDITY)
21
Big Data & Data Quality –
summary
• Big Data will depend on
data quality to reap its
claimed benefits – the
GIGO truism
• The democratization of
data will expose poor
DQ
• The need for Data
Governance increases as
data becomes more
accessible
• Data skills will become
more valued for ‘data
science’
• Big Data will increase
the 3Vs of data
• Control of data becomes
more difficult – scope
and variety of use
increases
• Data standards &
business rules become
more complex
• Potential legal &
regulatory minefield
22
22
What action should we take as
data management / DQ
professionals?
 Identify and get involved in any current or
planned Big Data initiatives within our
organisations
 Ensure that the Data Quality and Data
Governance implications & imperatives of these
initiatives are understood
 Plan for the new Data Quality and Data
Governance challenges that these trends will
pose
23
23
So who’s afraid of the Big Bad
Data Wolf?
24
Questions
(Who’s Afraid of…) The Big Bad Data Wolf?
The big Bad Data challenge – Big Data &
the Data Quality imperative
25

Weitere ähnliche Inhalte

Was ist angesagt?

Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...Cambridge Semantics
 
Protecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UKProtecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UKUlf Mattsson
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseVaticle
 
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)Denodo
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Denodo
 
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
New Analytic Uses of Master Data Management in the Enterprise
New Analytic Uses of Master Data Management in the EnterpriseNew Analytic Uses of Master Data Management in the Enterprise
New Analytic Uses of Master Data Management in the EnterpriseDATAVERSITY
 
Slides: Why You Need End-to-End Data Quality to Build Trust in Kafka
Slides: Why You Need End-to-End Data Quality to Build Trust in KafkaSlides: Why You Need End-to-End Data Quality to Build Trust in Kafka
Slides: Why You Need End-to-End Data Quality to Build Trust in KafkaDATAVERSITY
 
Logical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsLogical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsDenodo
 
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)Denodo
 
Big Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationBig Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationCambridge Semantics
 
Four Pillars of Business Analytics by Actuate
Four Pillars of Business Analytics by ActuateFour Pillars of Business Analytics by Actuate
Four Pillars of Business Analytics by ActuateEdgar Alejandro Villegas
 
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsEvaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsTeradata Aster
 
Big Data Analytics Architecture PowerPoint Presentation Slides
Big Data Analytics Architecture PowerPoint Presentation SlidesBig Data Analytics Architecture PowerPoint Presentation Slides
Big Data Analytics Architecture PowerPoint Presentation SlidesSlideTeam
 
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...DATAVERSITY
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChicago Hadoop Users Group
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jrJonathan Raspaud
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...DATAVERSITY
 

Was ist angesagt? (20)

Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
 
Protecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UKProtecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UK
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
 
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
 
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
New Analytic Uses of Master Data Management in the Enterprise
New Analytic Uses of Master Data Management in the EnterpriseNew Analytic Uses of Master Data Management in the Enterprise
New Analytic Uses of Master Data Management in the Enterprise
 
Slides: Why You Need End-to-End Data Quality to Build Trust in Kafka
Slides: Why You Need End-to-End Data Quality to Build Trust in KafkaSlides: Why You Need End-to-End Data Quality to Build Trust in Kafka
Slides: Why You Need End-to-End Data Quality to Build Trust in Kafka
 
Logical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsLogical Data Fabric: Architectural Components
Logical Data Fabric: Architectural Components
 
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
 
Big Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationBig Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data Democratization
 
Four Pillars of Business Analytics by Actuate
Four Pillars of Business Analytics by ActuateFour Pillars of Business Analytics by Actuate
Four Pillars of Business Analytics by Actuate
 
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsEvaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics Platforms
 
Big Data Analytics Architecture PowerPoint Presentation Slides
Big Data Analytics Architecture PowerPoint Presentation SlidesBig Data Analytics Architecture PowerPoint Presentation Slides
Big Data Analytics Architecture PowerPoint Presentation Slides
 
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jr
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
 

Andere mochten auch

Big data for quality education
Big data for quality educationBig data for quality education
Big data for quality educationMalintha Adikari
 
When Big Data and Predictive Analytics Collide: Visual Magic Happens
When Big Data and Predictive Analytics Collide: Visual Magic HappensWhen Big Data and Predictive Analytics Collide: Visual Magic Happens
When Big Data and Predictive Analytics Collide: Visual Magic HappensChase McMichael
 
Unit 1. quality, total quality, tqm
Unit 1. quality, total quality, tqmUnit 1. quality, total quality, tqm
Unit 1. quality, total quality, tqmShekhar Mallur
 
Total quality management in education
Total quality management  in educationTotal quality management  in education
Total quality management in educationSam Luke
 
DICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made EasyDICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made EasyCloudify Community
 
DAMA Webinar - Big and Little Data Quality
DAMA Webinar - Big and Little Data QualityDAMA Webinar - Big and Little Data Quality
DAMA Webinar - Big and Little Data QualityDATAVERSITY
 
Big Data et Sport - Gestion de données & Analytics
Big Data et Sport - Gestion de données & AnalyticsBig Data et Sport - Gestion de données & Analytics
Big Data et Sport - Gestion de données & AnalyticsGroupe D.FI
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
 

Andere mochten auch (9)

Big data for quality education
Big data for quality educationBig data for quality education
Big data for quality education
 
When Big Data and Predictive Analytics Collide: Visual Magic Happens
When Big Data and Predictive Analytics Collide: Visual Magic HappensWhen Big Data and Predictive Analytics Collide: Visual Magic Happens
When Big Data and Predictive Analytics Collide: Visual Magic Happens
 
Unit 1. quality, total quality, tqm
Unit 1. quality, total quality, tqmUnit 1. quality, total quality, tqm
Unit 1. quality, total quality, tqm
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Total quality management in education
Total quality management  in educationTotal quality management  in education
Total quality management in education
 
DICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made EasyDICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made Easy
 
DAMA Webinar - Big and Little Data Quality
DAMA Webinar - Big and Little Data QualityDAMA Webinar - Big and Little Data Quality
DAMA Webinar - Big and Little Data Quality
 
Big Data et Sport - Gestion de données & Analytics
Big Data et Sport - Gestion de données & AnalyticsBig Data et Sport - Gestion de données & Analytics
Big Data et Sport - Gestion de données & Analytics
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 

Ähnlich wie Big data and the data quality imperative

The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallTrillium Software
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigDataValarmathi V
 
DataOps @ Scale: A Modern Framework for Data Management in the Public Sector
DataOps @ Scale: A Modern Framework for Data Management in the Public SectorDataOps @ Scale: A Modern Framework for Data Management in the Public Sector
DataOps @ Scale: A Modern Framework for Data Management in the Public SectorTamrMarketing
 
Big Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonBig Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonIBM Danmark
 
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"MDS ap
 
Tdwi austin simplifying big data delivery to drive new insights final
Tdwi austin   simplifying big data delivery to drive new insights finalTdwi austin   simplifying big data delivery to drive new insights final
Tdwi austin simplifying big data delivery to drive new insights finalSal Marcus
 
BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?Christopher Bradley
 
final oracle presentation
final oracle presentationfinal oracle presentation
final oracle presentationPriyesh Patel
 
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptxch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptxMrityunjay Emmi
 
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...Precisely
 
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...IT Support Engineer
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICSNAGARAJAGIDDE
 
Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013 Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013 IBM Sverige
 
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...Denodo
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...Jochem van Grondelle
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big dataDigimark
 

Ähnlich wie Big data and the data quality imperative (20)

The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
DataOps @ Scale: A Modern Framework for Data Management in the Public Sector
DataOps @ Scale: A Modern Framework for Data Management in the Public SectorDataOps @ Scale: A Modern Framework for Data Management in the Public Sector
DataOps @ Scale: A Modern Framework for Data Management in the Public Sector
 
Big Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonBig Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter Jönsson
 
Identifying the new frontier of big data as an enabler for T&T industries: Re...
Identifying the new frontier of big data as an enabler for T&T industries: Re...Identifying the new frontier of big data as an enabler for T&T industries: Re...
Identifying the new frontier of big data as an enabler for T&T industries: Re...
 
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
 
Tdwi austin simplifying big data delivery to drive new insights final
Tdwi austin   simplifying big data delivery to drive new insights finalTdwi austin   simplifying big data delivery to drive new insights final
Tdwi austin simplifying big data delivery to drive new insights final
 
BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?
 
final oracle presentation
final oracle presentationfinal oracle presentation
final oracle presentation
 
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptxch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
 
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
 
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
 
Big data
Big dataBig data
Big data
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICS
 
Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013 Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013
 
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
 

Mehr von Trillium Software

Trillium software garp march 2014 presentation bfast briefing
Trillium software   garp march 2014 presentation bfast briefingTrillium software   garp march 2014 presentation bfast briefing
Trillium software garp march 2014 presentation bfast briefingTrillium Software
 
How Underwriters Can Access Claims Data Now
How Underwriters Can Access Claims Data NowHow Underwriters Can Access Claims Data Now
How Underwriters Can Access Claims Data NowTrillium Software
 
Trillium Software CRMUG Webinar August 6, 2013
Trillium Software CRMUG Webinar August 6, 2013Trillium Software CRMUG Webinar August 6, 2013
Trillium Software CRMUG Webinar August 6, 2013Trillium Software
 
How to Identify Claims High-Risk Insurance Claims Faster and More Accurately
How to Identify Claims High-Risk Insurance Claims Faster and More AccuratelyHow to Identify Claims High-Risk Insurance Claims Faster and More Accurately
How to Identify Claims High-Risk Insurance Claims Faster and More AccuratelyTrillium Software
 
Cloud Computing and Data Governance
Cloud Computing and Data GovernanceCloud Computing and Data Governance
Cloud Computing and Data GovernanceTrillium Software
 
Trillium Software Building the Business Case for Data Quality
Trillium Software Building the Business Case for Data QualityTrillium Software Building the Business Case for Data Quality
Trillium Software Building the Business Case for Data QualityTrillium Software
 
Lean Mean Data Governance Machine Webinar Part 1
Lean Mean Data Governance Machine  Webinar Part 1Lean Mean Data Governance Machine  Webinar Part 1
Lean Mean Data Governance Machine Webinar Part 1Trillium Software
 
Lean Mean Data Governance Machine Webinar Part 2
Lean Mean Data Governance Machine Webinar Part 2 Lean Mean Data Governance Machine Webinar Part 2
Lean Mean Data Governance Machine Webinar Part 2 Trillium Software
 
Creating Your Data Governance Dashboard
Creating Your Data Governance DashboardCreating Your Data Governance Dashboard
Creating Your Data Governance DashboardTrillium Software
 
The Changing Data Quality & Data Governance Landscape
The Changing Data Quality & Data Governance LandscapeThe Changing Data Quality & Data Governance Landscape
The Changing Data Quality & Data Governance LandscapeTrillium Software
 

Mehr von Trillium Software (10)

Trillium software garp march 2014 presentation bfast briefing
Trillium software   garp march 2014 presentation bfast briefingTrillium software   garp march 2014 presentation bfast briefing
Trillium software garp march 2014 presentation bfast briefing
 
How Underwriters Can Access Claims Data Now
How Underwriters Can Access Claims Data NowHow Underwriters Can Access Claims Data Now
How Underwriters Can Access Claims Data Now
 
Trillium Software CRMUG Webinar August 6, 2013
Trillium Software CRMUG Webinar August 6, 2013Trillium Software CRMUG Webinar August 6, 2013
Trillium Software CRMUG Webinar August 6, 2013
 
How to Identify Claims High-Risk Insurance Claims Faster and More Accurately
How to Identify Claims High-Risk Insurance Claims Faster and More AccuratelyHow to Identify Claims High-Risk Insurance Claims Faster and More Accurately
How to Identify Claims High-Risk Insurance Claims Faster and More Accurately
 
Cloud Computing and Data Governance
Cloud Computing and Data GovernanceCloud Computing and Data Governance
Cloud Computing and Data Governance
 
Trillium Software Building the Business Case for Data Quality
Trillium Software Building the Business Case for Data QualityTrillium Software Building the Business Case for Data Quality
Trillium Software Building the Business Case for Data Quality
 
Lean Mean Data Governance Machine Webinar Part 1
Lean Mean Data Governance Machine  Webinar Part 1Lean Mean Data Governance Machine  Webinar Part 1
Lean Mean Data Governance Machine Webinar Part 1
 
Lean Mean Data Governance Machine Webinar Part 2
Lean Mean Data Governance Machine Webinar Part 2 Lean Mean Data Governance Machine Webinar Part 2
Lean Mean Data Governance Machine Webinar Part 2
 
Creating Your Data Governance Dashboard
Creating Your Data Governance DashboardCreating Your Data Governance Dashboard
Creating Your Data Governance Dashboard
 
The Changing Data Quality & Data Governance Landscape
The Changing Data Quality & Data Governance LandscapeThe Changing Data Quality & Data Governance Landscape
The Changing Data Quality & Data Governance Landscape
 

Kürzlich hochgeladen

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Kürzlich hochgeladen (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Big data and the data quality imperative

  • 1. TRILLIUM SOFTWARE 2013 CUSTOMER CONFERENCE (Who’s Afraid of…) The Big Bad Data Wolf? The Big Bad Data Challenge – Big Data & the Data Quality Imperative Presented By: Nigel Turner VP Information Management Strategy 1
  • 2. The tale of the Three Little Pigs 2© Copyright 2013, Trillium Software, Inc. All rights reserved.
  • 3. Big Data – what is it?  Set of new concepts, practices & technologies to manage & exploit digital data  Can be defined as:  “Data that exceeds the processing capability of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architecture” (Source: Ed Dumbill – O’Reilly Community)  Its key premise is that all data has potential value if it can be collected, analysed and used to generate actionable insight 3 3 3© Copyright 2013, Trillium Software, Inc. All rights reserved.
  • 4. Where does Big Data come from? SOCIAL MEDIA & SOCIAL NETWORKS MACHINE GENERATED WIDELY KNOWN SOURCES 4 4 4 © Copyright 2013, Trillium Software, Inc. All rights reserved.
  • 5. What’s different about Big Data?  New technologies which enable distributed & highly scalable MPP (Massively Parallel Processing), e.g.  Apache Hadoop  MapReduce  NoSQL databases  Strong emphasis on analytical approaches  Emergence of “data science”  Predictive Analytics  Data Mining  The “democratisation” of data  Data made available to all (cf Cloud Computing)  Business and not IT led BI 5
  • 6. Big Data & Data Quality – parallel worlds? 6 BIG DATA DATA QUALITY © Copyright 2013, Trillium Software, Inc. All rights reserved.
  • 7. Parallel worlds… or are they (1)? 7 Shared with 100,000+ others and counting…
  • 8. Parallel worlds… or are they (2)? 8 “ I spend the vast majority of my time cleaning data systems…cleaning and preparing data sets makes everything I do better … it’s the highest value activity I do” Josh Wills Senior Director of Data Science Cloudera (From “Training a new generation of Data Scientists” – Cloudera video)
  • 9. When Big Data & Data Quality worlds collide… 9 Big Data will expose Data Quality shortcomings Poor Data Quality will undermine the value of Big Data investments
  • 10. Big Data – building on solid foundations BIG DATA / ANALYTICS DATA QUALITY FOUNDATION 10
  • 11. The 3Vs and the DQ challenge • Exponential growth of data – predicted 40-60% per annum • 2.5 quintillion bytes of data are created every day • 90% of all digital data created in the last two years • Data generated more varied and complex than before: – Text, Audio, Images, Machine Generated etc. • Much of this data is semi-structured or unstructured • Traditional IT techniques ill equipped to process & analyse it • Data often generated in real time • Analysis and response needs to be rapid, often also real time • Traditional BI / DW environments cannot cope – new approaches are needed 11 11
  • 12. Big Data – Foundations of Success  Identifying the right data to solve the business problem or opportunity  The ability to integrate & match varied data from multiple data sources  structured, semi-structured, unstructured  Building the right IT infrastructure to support Big Data applications  Having the right capabilities & skills to exploit the data 12 12
  • 13. Big Data – some vertical applications  Retail: using point of sale & social media data to supplement & enrich traditional CRM / Marketing data  Insurance & Banking: fraud detection  Health: holistic patient analysis  Utilities: consumption peaks & troughs & capacity planning  Telcos: call routing optimisation & customer churn  Manufacturing: predictive fault identification & supply chain optimisation  Research: particle analysis, genomics etc. 13
  • 14. Example Big Data benefit: The Open Big Data Cloud 14 SOURCE: LINKED OPEN DATA (LOD) COMMUNITY
  • 15. Big Data in practice - Volvo  Every Volvo vehicle has hundreds of microprocessors / sensors  Data generated used within the car itself but also captured for analysis by Volvo and its dealers  All data is loaded into a centralised analysis hub & integrated with CRM, dealership, product & social network data  Used to optimise design & manufacturing, enhance customer interaction, improve safety & act on customer feedback 15
  • 16. Big Data – Barriers & Pitfalls  The sheer volume of data – what’s worth using?  Data extraction challenges  The ability to match data from disparate sources / formats / media  The time taken to integrate new data sources  The risks of mismatching and incorrect identification of individuals  Legal & regulatory pitfalls  Security concerns – corporate & individual  Lack of skills & expertise 16 16
  • 17. Big Data – the data integration challenge SOCIAL MEDIA SENSORS OPEN DATA EMAIL MOBILES EXTERNALDATASOURCES INTERNALDATASOURCES CRM BILLING OPS SALES PRODS ANALYTICS PLATFORM 1 ANALYTICS PLATFORM 2 ANALYTICS PLATFORM 3 ANALYTICS PLATFORM n ACTIONABLE INSIGHT & KNOWLEDGE 17
  • 18. Big Data – the Data Quality Imperative (1)  Need to profile external and internal data sources  Need to classify data to define what data really matters  Need to assure the quality of internal (and some external) data sources for accuracy, completeness, consistency  Need to define & apply business rules & metadata management to how the data will be defined and used  Need for a data governance framework to ensure consistency & control 18
  • 19. Big Data – the Data Quality Imperative (2)  Need processes & tools to enable:  Source data profiling  Data integration  Data parsing  Data standardisation  Business rule creation & management  Metadata management & a shared business / IT glossary  Data de-duplication  Data normalisation  Data matching  Data enrichment  Data audit  Many of these functions must be capable of being carried out in real time with zero lag 19
  • 20. Big Data – DQ as the key enabler SOCIAL MEDIA SENSOR S OPEN DATA EMAIL EXTERNALDATASOURCES INTERNALDATASOURCES CRM BILLING OPS SALES PRODS ANALYTICS PLATFORM 1 ANALYTICS PLATFORM 2 ANALYTICS PLATFORM 3 ANALYTICS PLATFORM n ACTIONABLE INSIGHT & KNOWLEDGE PROFILE PARSE STANDARDISE MATCH ENRICH DATA QUALITY PLATFORM PROFILE PARSE STANDARDISE MATCH ENRICH MOBILES 20
  • 21. Big Data – some algorithms 1. BIG DATA + POOR DATA QUALITY = BIG PROBLEMS 2. DATA DEMOCRITISATION – DATA GOVERNANCE = ANARCHY 3. DATA MASH UPS – DATA QUALITY = DATA MESS 4. BIG DATA ANALYTICS + POOR DQ = WRONG RESULTS 5. BIG DATA – DATA ASSURANCE = JAIL 6. 3V + DATA QUALITY = 4V (VALIDITY) 21
  • 22. Big Data & Data Quality – summary • Big Data will depend on data quality to reap its claimed benefits – the GIGO truism • The democratization of data will expose poor DQ • The need for Data Governance increases as data becomes more accessible • Data skills will become more valued for ‘data science’ • Big Data will increase the 3Vs of data • Control of data becomes more difficult – scope and variety of use increases • Data standards & business rules become more complex • Potential legal & regulatory minefield 22 22
  • 23. What action should we take as data management / DQ professionals?  Identify and get involved in any current or planned Big Data initiatives within our organisations  Ensure that the Data Quality and Data Governance implications & imperatives of these initiatives are understood  Plan for the new Data Quality and Data Governance challenges that these trends will pose 23 23
  • 24. So who’s afraid of the Big Bad Data Wolf? 24
  • 25. Questions (Who’s Afraid of…) The Big Bad Data Wolf? The big Bad Data challenge – Big Data & the Data Quality imperative 25