1. Be Certain. Be Trillium Certain.
The Bigger They Are The
Harder They Fall:
Big Data & the Data Quality
Imperative
Nigel Turner, VP Strategic Information Management
Tuesday 19th June 2012
4. Big Data – what is it?
Set of new concepts, practices & technologies to manage &
exploit digital data
OVUM defines it as:
“A data computational problem that is large and varied enough to
demand new approaches to traditional SQL & related practices”
Key premise is that all data has potential value if it can be
collected, analysed and used to generate actionable insight
5. Big Data – its characteristics
The 3Vs
• Reflects exponential growth of data – predicted 40-60% per annum
• Today 2.5 quintillion bytes of data are created every day
• 90% of all digital data was created in the last two years
• Data generated more varied and complex than before:
– Text, Audio, Images, Machine Generated etc.
• Much of this data is semi-structured or unstructured
• Traditional IT techniques ill equipped to process & analyse it
• Data often generated in real time
• Analysis and response needs to be rapid, often also real time
• Traditional BI / DW environments becoming obsolescent – new
approaches are needed
6. What’s different about Big Data?
New technologies which enable distributed & highly
scalable MPP (Massively Parallel Processing), e.g.
Apache Hadoop
MapReduce
NoSQL databases
Strong emphasis on analytical approaches
Emergence of “data science”
Predictive Analytics
Data Mining
The “democratisation” of data
Data made available to all (cf Cloud Computing)
Business and not IT led BI
10. Big Data – some vertical applications
Retail: using point of sale & social media data to
supplement & enrich traditional CRM / Marketing data
Insurance & Banking: fraud detection
Health: holistic patient analysis
Utilities: consumption peaks & troughs & capacity
planning
Telcos: call routing optimisation & customer churn
Manufacturing: predictive fault identification & supply
chain optimisation
Research: particle analysis, genomics etc.
11. Big Data in practice - Volvo
Every Volvo vehicle has hundreds of
microprocessors / sensors
Data generated used within the car itself but
also captured for analysis by Volvo and its
dealers
All data is loaded into a centralised data
analysis hub & integrated with CRM,
dealership & product data
Used to optimise design & manufacturing,
enhance customer interaction & improve
safety
13. Big Data – why invest?
Better understanding of customer & market behaviour
Improved knowledge of product & service performance
Aids innovation in products & services
Fact based and more rapid decision making
Enhances revenue
Reduces costs
Stimulates economic growth
14. Big Data – the impact on individuals
Employees
Empower & devolve decision making
Create new job & upskilling opportunities
Consumers
Better targeted offers
Improved products & services that meet needs
16. Big Data – Foundations of Success
Identifying the right data to solve the business problem or
opportunity
The ability to integrate & match varied data from multiple data
sources
structured, semi-structured, unstructured
Building the right IT infrastructure to support Big Data
applications
Having the right capabilities & skills to exploit the data
17. Big Data – the data integration challenge
SOCIAL
MEDIA
SENSORS
CS
DATA
EMAIL
MOBILES
EXTERNALDATASOURCES
INTERNALDATASOURCES
CRM
BILLING
OPS
SALES
PRODS
ANALYTICS PLATFORM 1
ANALYTICS PLATFORM 2
ANALYTICS PLATFORM 3
ANALYTICS PLATFORM n
ACTIONABLE INSIGHT & KNOWLEDGE
18. Big Data – Barriers & Pitfalls
The sheer volume of data – what’s worth using?
Data extraction challenges
The ability to match data from disparate sources / formats / media
The time taken to integrate new data sources
The risks of mismatching and incorrect identification of individuals
Legal & regulatory pitfalls
Security concerns – corporate & individual
Lack of skills & expertise
Making the case for investment
19. Big Data – the Data Quality Imperative (1)
Need to profile external and internal data sources
Need to classify data to define what data really matters
Need to assure the quality of internal (and some external)
data sources for accuracy, completeness, consistency
Need to define & apply business rules & metadata
management to how the data will be defined and used
Need for a data governance framework to ensure
consistency & control
20. Big Data – the Data Quality Imperative (2)
Need processes & tools to enable:
Source data profiling
Data integration
Data parsing
Data standardisation
Business rule creation & management
Metadata management & a shared business / IT glossary
Data de-duplication
Data normalisation
Data standardisation
Data matching
Data enrichment
Data audit
Many of these functions must be capable of being carried
out in real time with zero lag
21. Big Data – the key enablerEXTERNALDATASOURCES
INTERNALDATASOURCES
ANALYTICS PLATFORM 1
ANALYTICS PLATFORM 2
ANALYTICS PLATFORM 3
ANALYTICS PLATFORM n
ACTIONABLE INSIGHT & KNOWLEDGE
PROFILE
PARSE
STANDARDISE
MATCH
ENRICH
DATA QUALITY PLATFORM
PROFILE
PARSE
STANDARDISE
MATCH
ENRICH
22. Big Data – some algorithms
1. BIG DATA + POOR DATA QUALITY = BIG PROBLEMS
2. DATA DEMOCRITISATION – DATA GOVERNANCE =
ANARCHY
3. DATA MASH UPS – DATA QUALITY = DATA MESS
4. BIG DATA ANALYTICS + POOR DQ = WRONG RESULTS
5. BIG DATA – DATA ASSURANCE = JAIL
6. 3V + DATA QUALITY = 4V (VALIDITY)
23. Big Data – the future
To date Big Data has been overhyped but now a
tipping point has come
It is here and will grow in volume, velocity &
variety
Immature concept & market so hard to plan – but
consolidation is happening
Big data in a business context reflects emerging
generation’s expectations & needs
Data will increasingly be seen as an asset
Data skills will become increasingly valued
24. Big Data – how Trillium Software can help
Current Trillium Software products & services
can help you succeed in your Big Data
journey:
Real time & batch data capabilities in:
o Data profiling
o Parsing
o Standardisation
o De-duplication
o Matching
o Enrichment
o Audit
Strategic consulting services to prepare for and
realise Big Data opportunities