Successful Big Data initiatives rely on accurate, complete data, but the information they draw on is often not validated when it enters an organization. In this session we will look at the challenges big data brings to an organization, and how data quality principles are adapting to ensure business goals and return on investments in big data are realised. We will cover:
- Challenges of big data
- Turning data lakes into reservoirs
- How data quality tools are adapting
- Why data governance disciplines remain crucial
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
1. BIG DATA AND THE DATA
QUALITY IMPERATIVE
ED WRAZEN
VP PRODUCT MANAGEMENT, BIG DATA
2. 2
EMERGENCE OF THE “NEW” ENTERPRISE DATA HUB
Data Sources
Applications
Data Warehouse
Data Marts
Databases
RDBMS
Files
Reference Data
Enterprise
Applications
Business
Intelligence
Custom
Analytics
Enterprise
Hub
New Sources
Monitor
& Manage
The expanded Data
Hub
Data Ingestion
+ Volume
+ Velocity
+ Variety
3. 3
CHALLENGES WITH ENTERPRISE DATA
Multiple silos of information
Collating information is resource
intensive
Analysis of data is difficult and
intensive
Inconsistent, inaccurate,
incomplete data
Difficult to reconcile
Manual overhead
No single version of the truth!
4. 4
BIG DATA USE CASES
Profiled database
(RDMS such as
MySQL)
Single Customer View
• Cleanse, validate and match disparate customer data points to improve customer
experience, customer insights, more targeted marketing
Analytics
• Ensure accuracy for downstream analytics initiatives for marketing, fraud detection, risk
mitigation, etc.
Data Lake
• Data isn’t often cleansed as it enters the organization or data lake, resulting in larger
scale of data quality issues
Lower-cost storage, processing
• Organizations seek low-cost, high-performance ways to store, process, analyze, and
manage larger volumes of data at faster speeds
5. 5
BIG DATA CHALLENGES
Common Big Data Roadblocks
Limited in-house expertise
Maturity of emerging technology
Alignment to business objectives
Complexity of unstructured data
Lack of trust and assurance in data
Inability to manage velocity of data expansion
Number of internal and external sources of data
6. 6
DATA QUALITY AND SINGLE CUSTOMER VIEWS
Integrating data from
multiple data sources
presents differences in
completeness,
consistency and
quality
7. 7
Can I trust this data
enough to make my
critical decisions?
How accurate are
these numbers?
IMPACT OF POOR DATA QUALITY ON ANALYTICS
Are these terms
consistent with our
business definitions?
How current is this
data? When was it
last updated?
8. 8
COMPLEXITY OF UNSTRUCTURED DATA
Revd new transfer claim ondiary. inj party
still OOW and treating. Atty repped.called
atty for status. Been treating for over 4
months now, sft tissue neck and back sprain.
Clmnt complaining of numbness and tingling
in fingers. Clmnt is now being scheduled for
MRI and CT scan. RX has been written for
oxycotin for pain. Atty will send all updated
meds and records he has in his file.
Severity
Indicator ?
Medication?
Employment
Status ?
9. 9
INSIGHT AND CONTEXT FROM UNSTRUCTURED
DATA IS POSSIBLE, BUT DIFFICULT
Oxycotin = Oxycontin = Medication
10. 10
BIG DATA QUALITY CHALLENGES PERSIST
“ I spend the vast majority of my time cleaning
data systems…cleaning and preparing
data sets makes everything I do better
… it’s the highest value activity I do”
Josh Willis
Senior Director of Data Science
Cloudera
(From “Training a new generation of
Data Scientists” – Cloudera video)
11. 11
SHIFT IN FOCUS
Profiled database
(RDMS such as
MySQL)
Big Data adopters moving beyond the hype and focusing on traditional
challenges and business goals
Top 3 Challenges
Finding value
Risk and governance (security, privacy, data quality)
Integrating multiple data sources
Top 3 Priorities
Enhanced customer experience
Process efficiency
More targeted marketing
Source: Gartner
12. 12
ABOUT TRILLIUM
Trillium is a global provider and innovator of data quality solutions
• A business unit of Harte Hanks (HHS-NYSE)
• Over 2 decades in business with specific focus on data quality
• Data quality solutions for Big Data, CRM, MDM, ERP, Single Customer Views, Data Integration
Data Governance, Risk & Compliance, Fraud, Marketing
Analyst Ratings
Gartner
2014 Magic Quadrant: Leader
Forrester
Forrester Wave 2013 – Leader
Bloor Research
Market Leader
Client Examples
13. 13
TRILLIUM BIG DATA
• Graphically build DQ workflows
• Reuse existing processes
• Deploy natively in Hadoop
• Leverage Hadoop
processing architecture
Trillium Server
Interface
Hadoop
HDFS
17 New England Executive Park, Suite 300 | Burlington, MA 01803 | 1-978-436-8900 | www.trilliumsoftware.com
Parse
Parse
Standardize
Match
Commonize
14. 14
BENEFITS OF BIG DATA QUALITY
Understand the impact of data quality and reduce downstream risk
• Profile, analyze and measure the quality of multi-domain data
• Create a data quality blueprint and plan for data cleansing
Build the best view of your global customer data
• Cleanse and enrich customer data and create single customer views
• Improve business processes, detect fraud, create personalized customer
experiences, and deploy targeted marketing campaigns
Maximize the value of your Big Data investments
• Power downstream machine learning initiatives and analytics platforms with
reliable, fit-for-purpose data that supports timely, accurate business decisions
17 New England Executive Park, Suite 300 | Burlington, MA 01803 | 1-978-436-8900 | www.trilliumsoftware.com
15. 15
CONTACT INFORMATION
email: ed.wrazen@trilliumsoftware.com
Tel: +44 118 940 7634
web: www.trilliumsoftware.com
17 New England Executive Park, Suite 300 | Burlington, MA 01803 | 1-978-436-8900 | www.trilliumsoftware.com
email: info@intodq.com
Tel: 0297 254 390
web: www.intodq.com