SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
“Yes, you can plug Data Quality as a Service (DQaaS) into Big Data”
October 4th, 2015
Master Data Management
for big data
October 4th, 2015
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
 Big data is here to stay and expanding rapidly
 The 4th “V” of big data
 How your data architecture is growing
 Big data, and perhaps a big mess!
 Data quality as a Service for your data lake
 Tools of the trade (Microsoft MDS + Profisee’s Maestro)
 Plugging DQaaS into your Big Data lakes
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
 USA (21 years)
and France (14 years)
 Database/Data Architecture
– RDBMS’s:
 Oracle, PostGres, MySQL, DB2, ……
 Microsoft SQL Server/Analysis Services
– Master Data Management
 MDS
 Maestro
 Oracle
 IBM (initiate)
– Big Data:
 Hadoop, ParAccel, NoSQL
 Database talent pool:
– Top database and data architects
– Acclaimed Authors
– Speakers at many events and conferences
 Database Tools: P&T Tool - highly
graphical for Sybase, Oracle and MS SQL
Server
 Database Education & Training
 Partnerships
 Microsoft
 Profisee
 DesignMind
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
Big Data’s Rapid Expansion
5
Digital Data (created and replicated)
 Reached 4 zettabytes at the end of 2013
 That’s 50% more than in 2012
 And, 4 times more than in 2010
 Will hit 50 ZB’s by 2020!
Source: IDC
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
Impact of bad data
$3,100,000,000,000
IBM’s Estimate of Annual Cost of Bad Data
to US Economy (IBM BDH)
15%
Surveyed Executives
Trusting Overall Data (IDC)
27%
Surveyed Executives Sure
of Data Accuracy (IBM)
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
You will be (or, are already) dealing with..
7
Volume
Velocity
Variety
Veracity
 High-Volumes of data you need to access
 High-Velocity of streaming data pouring in
 High-Variety of information assets (structured, semi-structured, unstructured)
 AND, you need to get to this data to enable enhanced decision
making, insight discovery and process optimization
Oh, and it better be good data (have Veracity) (source: IBM/Diginome)
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
Are you doing the right thing?
 Hadoop (HDFS solutions) lends itself to problems that can be solved
through distributed strategies coupled with advanced analytics.
 Other problems just need a horizontally scalable solution (via MPP) with
current mainstream analytics/database (like ParAccel, Teradata, PDW…)
 AND, attack the quality of the data !!!!! (veracity)
Understand the problem first,
Next, apply the proper architecture,
and finally, choose the proper tools!
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
Security/AccessFramework
OperationsFramework-Monitoring/Alerts/Workflow
DataIntegrationandAcquisitionFramework
Analytics
Put into different
perspectives and
trends
(forecasting)
Operational
What we did
Reactive
Data Mining
What other things
might exist or are
affecting what we
did
(why did it happen)
Predictive
Modeling,
Simulation &
Optimization
See what is
possible (next)
and
what is the best
way to do it
(prescriptive)
Proactive
Front
Office
Back
Office
Other
Internal
External
Social/
Other
Data Services/Tools/Data Visualization
Islands of BI (non-IT)
Data Warehouse/Marts
(Aggregated/Dimensional)
Operational Data Store (ODS)
(Detail/Transactional/taxonomies)
SAS/SPSS
QlikView Dashboards/Light Analytics
Application Bundled Reporting
Business Objects
OLAP
“Big Data” (structured, semi & un-structured data)
“Big Data” (structured, semi & un-structured data)
Variety
Governance – Quality - Certification
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
Data Pipeline
10
Data
Acquisition
Data
Storage
Data
Analysis
HDFS Commands
Flume (logs)
Scribe (RT stream logs)
Sqoop (as needed)
Many others
HDFS (Hadoop)
Hbase (Big Table)
Dryad
Others
MapReduce (Hadoop)
Pig (data analysis/pig latin/data flow)
Hive (DW for Hadoop/HiveQL)
Cascading (complex MR workflows)
Shark (HiveQL on Spark)
Spark (In-mem/cluster computing)
Flume
Few others
Kafka (producer/consumer
model)
Kestrel (distributed message
queue)
Storm (RT computation)
Trident (Operations on top of Storm)
S4 (distributed stream computing)
Spark Streaming (RT Spark)
Emerging: Hybrid Computational Models
SummingBird, Lambdoop, others.
[eliminates MapReduce, all processing paradigms supported]
VolumeVelocityBoth
Results
Value
Governance – Quality – Certification
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
Would you drink this?
11
NO, but it likely could have been prevented (or cleaned up during acquisition or earlier)
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
Big Data Platform
Adisturbing pattern has emerged in big data
Universe of
External and
Internal Data –
100’s of sources, dozens
of formats, no control of
content
All new data
flows to the big
data platform
Unidentified
Records are just
ignored
Zero data governance
Nothing is fixing
bad data in the data lakes
(perhaps on query?)
How do you identity
what is good data versus
bad data?
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
Data Quality as a
Service
Big Data Platform
So, let’s add in data quality as a service!
Universe of
External Data
–
100’s of sources,
dozens of formats, no
control of content
All new data
flows to the big
data platform
Unidentified/Unseen
records flow to DQaaS
DQaaS Fuzzy Matches,
Users Map, Workflows
Occur, Knowledge is built
Cleansed data
flows back to Big
Data Platform
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
Abig data effort we finished recently (with DQaaS)
DownstreamData
Analytics/Surfacing
Internal Enterprise Data
- Master Data (e.g. Customers,
Products, SKU’s)
- Data Warehouse (Dimensional, Facts)
BIG DATA
Data
Staging/Data
Acquisition
Data Quality as a Service (BUS)
External Transactional
Data (streams)
External 3rd Party Data
Internal Transactional
Data (streams)
Internal “Other” Data
BIG DATA
Data Delivery
Platform
Hives
Sqoop
Cloudera
Cloudera
Hives
Sqoop
RAW
RAW
RAW
RAW
eMSTR
STG
STG
STG
STG
eSTG
Delivery
Delivery
Delivery
Delivery
Delivery
Masters Matching Cleansing Enriching
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
Ingest, master, and deliver (the new data pipeline)
RAW STAGE
eMaster
DIFF
Not Mastered Yet,
or not seen before
Already
Mastered
Data
Delivery
Mastered
(“conformed”) Not
Mastered
Yet
Data Quality as a Service (BUS)
Masters Matching Cleansing Enriching
Workflow
Data Stewardship
Via Subscription
Views
Via Staged
Data Tables
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
Things we can do easily on big data platform
Existing Data
(Mastered Already + Mastering Results)
New Data We saw this before (use the master results)
We haven’t seen this yet (master it)Needs to be mastered
DIFF
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
Filtering the Data Lake
17
 Matching Strategies
 Survivorship
 Dedupe
 Harmonization
 Golden Records
 Taxonomies
 Cleansing
 Standardization
 Defaults
Data Quality as a Service (BUS)
Masters Matching Cleansing Enriching
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
SQL Server Master Data Services
 Master Data Management Platform on SQL Server
 Model & Rules – Managed Schema
 Security and Access
 Bulk data loads & consumption – table access
 Hierarchy Management
 Deployment, management, versioning
 Application-level transaction management
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
Entities
Model
Extended Attributes
User-defined Metadata
Sub
Entities
Collections
Derived
Hierarchies
Explicit
Hierarchies
Domain Based
Attributes
Attributes
Attribute
Groups
Business
Rules
Name (mandatory)
Code (mandatory)
Free-form Attributes
- text type
- numeric type
- date type
- link type
File Attributes
- files
- documents
- images
Master List domains (types)
Like color, ISO Customer
Segmentation,
States, Provinces, Countries, so on.
Members of the Model
(physical data entries)
For a specific business need
(example “Customer Master”)
- Version
- Version Lock
1:N groups many 1:N may have
Transactions Annotations
Hierarchies
Subscription
Views
© Data by Design, LLC ⃝ www.dataXdesign.com
MDS
Excel
Add-in
MDS Web
App
MDS Web App
MDS Web Service
MDS
Staging
Tables
IIS
SQL Server
SQL Server Master Data Services
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
 User Experience – Stewardship, Access, Manage
 Workflow – Initiate, Approve, Contribute, Calculate
 Golden Record Management – Matching, Survivorship
 Data Quality – Verification, Address, Person, Email
 Application Integration – MDM, CRM, Federation
 MDM Programmability – Web Objects, Web Services
Profisee Maestro (Empowering MDM)
© Data by Design, LLC ⃝ www.dataXdesign.com
Custom
Apps
Workflow
Forms
Web
Parts
Maestro
Desktop
MDS
Excel
Add-in
MDS Web
App
MS Dynamics
Salesforce
Maestro
Maestro
Maestro Web App MDS Web App
MDS Web ServiceMaestro Web Service
MDS
Staging
Tables
Maestro SDK/API
Connectors
IIS
SQL Server
Batch
Integration
Real-time
Integration
MSMQ
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
Maestro
What we can show you today!
BIG DATA
Data
Staging/Data
Acquisition
Data Quality as a Service (BUS)
External
Transactional
Data
(streams)
BIG DATA
Data Delivery
Platform
Cloudera
Ingest Enterprise Master Data
Masters Matching Cleansing Enriching
Raw
Customer Data
Customer Data
that needs to be “mastered”
ClouderaAny Any
Mastered (“conformed”)
Customer Data
Enterprise
Customer Data
Maestro
MDS
Data
Steward
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
Staging, matching, cleansing, publishing (MDS & Maestro)
MDS Demo (5 mins)
Maestro Demo (10 mins)
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
Great options, even better opportunities
 Understand your processing and data requirements!
 Strive for high quality data that is relevant to your
most important business drivers/needs!
 Work within a consistent framework that provides you
the needed performance, access, compliance, and
quality your company demands!
 Plug in data quality (DQaaS) as early as you can
in the big data food chain
(starting at acquisition (ingest) time)
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
 Top data experts in the industry
 USA and European offices
 Acclaimed Authors
 Presenters at major conferences
info@dataXdesign.com
Data Consulting Services
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
© Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
No User Interface
We know how to fix the data,
but we don’t have a place to
fix it.
Cosmic Data Volumes
More data than ever before –
but is it consistent? Is the noise
growing faster than the signal?
Multiple Stakeholders
Departments, Subsidiaries,
Regulatory bodies, view the
world in different ways.
Data Quality is Poor
Duplicates exist even
within systems and data
values are missing or just
plain wrong.
Externally Sourced Data
They’re talking about my
things on the web, but they
aren’t speaking my
language.
Multiple Systems
Even my own systems don’t
use the same names and
don’t have the same
attributes.
How do I trust that my
analytics are driving correct
decisions?
TheCaseforMasterDataManagement
© Data by Design, LLC ⃝ www.dataXdesign.com
Query without a data map
SELECT customers who complained
on Facebook more than twice
FROM Giant Hot Mess of Data
WHERE product is in this giant list
and flag = current
BUT NOT when starts with XRB0
AND ALSO include these other
products from this acquisitions list
when these four conditions match but
never when the country of
manufacture is Sweden ... Goes on for
16 pages...are you following this?...
Query with a data map
SELECT customers who
complained on Facebook more
than twice
FROM Giant Hot Mess of Data JOIN
Map on known keys
WHERE product is a current
reporting product
© Data by Design, LLC ⃝ www.dataXdesign.com
Name Code Source Add1.. Customer
#
Master
XYZ Corporation Master-6001 329 Main St South C5321 Master-6001
XYZ c 6001 EXT2 329 Main
Street S
Master-6001
XYZ
Corporation
6005 ERP 1 3229 Main St Master-6001
Xyz Corp 6009 CRM 329 Main Street So C5321 Master-6001
Profisee Master-6003 2520 Northwinds C5400 Master-6003
Profisee 6003 CRM 2520 Northwinds C5400 Master-6003
Master Customer
Golden Records
Bind other
Candidate
Records
New Candidate
Records are
Address
Corrected for
Sure Matching
Candidate Records
are added to their
“Master or
Golden” Record
Group
Golden Records may
have attributes from
the candidate records
or new attributes
altogether
© Data by Design, LLC ⃝ www.dataXdesign.com
31
Profisee Maestro: Reference Architecture
Company Feed
Industry Feed
Ratings Feed
Reference Data
Flat Files
XML
EMR1
EMR2
Credentialing1
Credentialing2
Labs1
Labs2
Datawarehouse
Flat Files
XML
ERP1
ERP2
CRM1
CRM2
SCM1
SCM2
DW
BI
GL
HR

Weitere ähnliche Inhalte

Andere mochten auch

Master Data Management
Master Data ManagementMaster Data Management
Master Data ManagementSung Kuan
 
SocialMedia_Coy
SocialMedia_CoySocialMedia_Coy
SocialMedia_CoyAshley_Coy
 
State University of New York at Upstate DFRP (.ppt)(3)
State University of New York at Upstate DFRP (.ppt)(3)State University of New York at Upstate DFRP (.ppt)(3)
State University of New York at Upstate DFRP (.ppt)(3)John C. Farruggio
 
Парикмахерский бренд Amika в прессе январь 2016
Парикмахерский бренд Amika в прессе январь 2016Парикмахерский бренд Amika в прессе январь 2016
Парикмахерский бренд Amika в прессе январь 2016nk010282
 
Парикмахерский бренд Joico в прессе январь 2017
Парикмахерский бренд Joico в прессе январь 2017 Парикмахерский бренд Joico в прессе январь 2017
Парикмахерский бренд Joico в прессе январь 2017 nk010282
 
Ring rolling machine
Ring rolling machineRing rolling machine
Ring rolling machineCHE WENDA
 
Парикмахерский бренд Amika в прессе июнь 2016
Парикмахерский бренд Amika в прессе июнь 2016Парикмахерский бренд Amika в прессе июнь 2016
Парикмахерский бренд Amika в прессе июнь 2016nk010282
 
Сниппеты для сайта, как их делать своими руками
Сниппеты для сайта, как их делать своими рукамиСниппеты для сайта, как их делать своими руками
Сниппеты для сайта, как их делать своими рукамиЕкатерина Иова
 
Assesment Project PPT
Assesment Project PPTAssesment Project PPT
Assesment Project PPTCarolyn Young
 
День української мови та писемності
День української мови та писемностіДень української мови та писемності
День української мови та писемностіЮлия Ефименко
 

Andere mochten auch (15)

Master Data Management
Master Data ManagementMaster Data Management
Master Data Management
 
SUNY Conference 2011.PPT
SUNY Conference 2011.PPTSUNY Conference 2011.PPT
SUNY Conference 2011.PPT
 
Resume2013
Resume2013Resume2013
Resume2013
 
SocialMedia_Coy
SocialMedia_CoySocialMedia_Coy
SocialMedia_Coy
 
State University of New York at Upstate DFRP (.ppt)(3)
State University of New York at Upstate DFRP (.ppt)(3)State University of New York at Upstate DFRP (.ppt)(3)
State University of New York at Upstate DFRP (.ppt)(3)
 
Парикмахерский бренд Amika в прессе январь 2016
Парикмахерский бренд Amika в прессе январь 2016Парикмахерский бренд Amika в прессе январь 2016
Парикмахерский бренд Amika в прессе январь 2016
 
Press Coverage
Press CoveragePress Coverage
Press Coverage
 
DRSambula2015
DRSambula2015DRSambula2015
DRSambula2015
 
Парикмахерский бренд Joico в прессе январь 2017
Парикмахерский бренд Joico в прессе январь 2017 Парикмахерский бренд Joico в прессе январь 2017
Парикмахерский бренд Joico в прессе январь 2017
 
Ring rolling machine
Ring rolling machineRing rolling machine
Ring rolling machine
 
Парикмахерский бренд Amika в прессе июнь 2016
Парикмахерский бренд Amika в прессе июнь 2016Парикмахерский бренд Amika в прессе июнь 2016
Парикмахерский бренд Amika в прессе июнь 2016
 
Es ist super, Superheld zu sein!
Es ist super, Superheld zu sein!Es ist super, Superheld zu sein!
Es ist super, Superheld zu sein!
 
Сниппеты для сайта, как их делать своими руками
Сниппеты для сайта, как их делать своими рукамиСниппеты для сайта, как их делать своими руками
Сниппеты для сайта, как их делать своими руками
 
Assesment Project PPT
Assesment Project PPTAssesment Project PPT
Assesment Project PPT
 
День української мови та писемності
День української мови та писемностіДень української мови та писемності
День української мови та писемності
 

Kürzlich hochgeladen

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 

Kürzlich hochgeladen (20)

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 

DxD big data and data quality as a service svcc oct 2015

  • 1. “Yes, you can plug Data Quality as a Service (DQaaS) into Big Data” October 4th, 2015 Master Data Management for big data October 4th, 2015
  • 2. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com  Big data is here to stay and expanding rapidly  The 4th “V” of big data  How your data architecture is growing  Big data, and perhaps a big mess!  Data quality as a Service for your data lake  Tools of the trade (Microsoft MDS + Profisee’s Maestro)  Plugging DQaaS into your Big Data lakes
  • 3. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com  USA (21 years) and France (14 years)  Database/Data Architecture – RDBMS’s:  Oracle, PostGres, MySQL, DB2, ……  Microsoft SQL Server/Analysis Services – Master Data Management  MDS  Maestro  Oracle  IBM (initiate) – Big Data:  Hadoop, ParAccel, NoSQL  Database talent pool: – Top database and data architects – Acclaimed Authors – Speakers at many events and conferences  Database Tools: P&T Tool - highly graphical for Sybase, Oracle and MS SQL Server  Database Education & Training  Partnerships  Microsoft  Profisee  DesignMind
  • 4. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
  • 5. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com Big Data’s Rapid Expansion 5 Digital Data (created and replicated)  Reached 4 zettabytes at the end of 2013  That’s 50% more than in 2012  And, 4 times more than in 2010  Will hit 50 ZB’s by 2020! Source: IDC
  • 6. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com Impact of bad data $3,100,000,000,000 IBM’s Estimate of Annual Cost of Bad Data to US Economy (IBM BDH) 15% Surveyed Executives Trusting Overall Data (IDC) 27% Surveyed Executives Sure of Data Accuracy (IBM)
  • 7. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com You will be (or, are already) dealing with.. 7 Volume Velocity Variety Veracity  High-Volumes of data you need to access  High-Velocity of streaming data pouring in  High-Variety of information assets (structured, semi-structured, unstructured)  AND, you need to get to this data to enable enhanced decision making, insight discovery and process optimization Oh, and it better be good data (have Veracity) (source: IBM/Diginome)
  • 8. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com Are you doing the right thing?  Hadoop (HDFS solutions) lends itself to problems that can be solved through distributed strategies coupled with advanced analytics.  Other problems just need a horizontally scalable solution (via MPP) with current mainstream analytics/database (like ParAccel, Teradata, PDW…)  AND, attack the quality of the data !!!!! (veracity) Understand the problem first, Next, apply the proper architecture, and finally, choose the proper tools!
  • 9. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com Security/AccessFramework OperationsFramework-Monitoring/Alerts/Workflow DataIntegrationandAcquisitionFramework Analytics Put into different perspectives and trends (forecasting) Operational What we did Reactive Data Mining What other things might exist or are affecting what we did (why did it happen) Predictive Modeling, Simulation & Optimization See what is possible (next) and what is the best way to do it (prescriptive) Proactive Front Office Back Office Other Internal External Social/ Other Data Services/Tools/Data Visualization Islands of BI (non-IT) Data Warehouse/Marts (Aggregated/Dimensional) Operational Data Store (ODS) (Detail/Transactional/taxonomies) SAS/SPSS QlikView Dashboards/Light Analytics Application Bundled Reporting Business Objects OLAP “Big Data” (structured, semi & un-structured data) “Big Data” (structured, semi & un-structured data) Variety Governance – Quality - Certification
  • 10. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com Data Pipeline 10 Data Acquisition Data Storage Data Analysis HDFS Commands Flume (logs) Scribe (RT stream logs) Sqoop (as needed) Many others HDFS (Hadoop) Hbase (Big Table) Dryad Others MapReduce (Hadoop) Pig (data analysis/pig latin/data flow) Hive (DW for Hadoop/HiveQL) Cascading (complex MR workflows) Shark (HiveQL on Spark) Spark (In-mem/cluster computing) Flume Few others Kafka (producer/consumer model) Kestrel (distributed message queue) Storm (RT computation) Trident (Operations on top of Storm) S4 (distributed stream computing) Spark Streaming (RT Spark) Emerging: Hybrid Computational Models SummingBird, Lambdoop, others. [eliminates MapReduce, all processing paradigms supported] VolumeVelocityBoth Results Value Governance – Quality – Certification
  • 11. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com Would you drink this? 11 NO, but it likely could have been prevented (or cleaned up during acquisition or earlier)
  • 12. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com Big Data Platform Adisturbing pattern has emerged in big data Universe of External and Internal Data – 100’s of sources, dozens of formats, no control of content All new data flows to the big data platform Unidentified Records are just ignored Zero data governance Nothing is fixing bad data in the data lakes (perhaps on query?) How do you identity what is good data versus bad data?
  • 13. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com Data Quality as a Service Big Data Platform So, let’s add in data quality as a service! Universe of External Data – 100’s of sources, dozens of formats, no control of content All new data flows to the big data platform Unidentified/Unseen records flow to DQaaS DQaaS Fuzzy Matches, Users Map, Workflows Occur, Knowledge is built Cleansed data flows back to Big Data Platform
  • 14. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com Abig data effort we finished recently (with DQaaS) DownstreamData Analytics/Surfacing Internal Enterprise Data - Master Data (e.g. Customers, Products, SKU’s) - Data Warehouse (Dimensional, Facts) BIG DATA Data Staging/Data Acquisition Data Quality as a Service (BUS) External Transactional Data (streams) External 3rd Party Data Internal Transactional Data (streams) Internal “Other” Data BIG DATA Data Delivery Platform Hives Sqoop Cloudera Cloudera Hives Sqoop RAW RAW RAW RAW eMSTR STG STG STG STG eSTG Delivery Delivery Delivery Delivery Delivery Masters Matching Cleansing Enriching
  • 15. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com Ingest, master, and deliver (the new data pipeline) RAW STAGE eMaster DIFF Not Mastered Yet, or not seen before Already Mastered Data Delivery Mastered (“conformed”) Not Mastered Yet Data Quality as a Service (BUS) Masters Matching Cleansing Enriching Workflow Data Stewardship Via Subscription Views Via Staged Data Tables
  • 16. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com Things we can do easily on big data platform Existing Data (Mastered Already + Mastering Results) New Data We saw this before (use the master results) We haven’t seen this yet (master it)Needs to be mastered DIFF
  • 17. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com Filtering the Data Lake 17  Matching Strategies  Survivorship  Dedupe  Harmonization  Golden Records  Taxonomies  Cleansing  Standardization  Defaults Data Quality as a Service (BUS) Masters Matching Cleansing Enriching
  • 18. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com SQL Server Master Data Services  Master Data Management Platform on SQL Server  Model & Rules – Managed Schema  Security and Access  Bulk data loads & consumption – table access  Hierarchy Management  Deployment, management, versioning  Application-level transaction management
  • 19. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com Entities Model Extended Attributes User-defined Metadata Sub Entities Collections Derived Hierarchies Explicit Hierarchies Domain Based Attributes Attributes Attribute Groups Business Rules Name (mandatory) Code (mandatory) Free-form Attributes - text type - numeric type - date type - link type File Attributes - files - documents - images Master List domains (types) Like color, ISO Customer Segmentation, States, Provinces, Countries, so on. Members of the Model (physical data entries) For a specific business need (example “Customer Master”) - Version - Version Lock 1:N groups many 1:N may have Transactions Annotations Hierarchies Subscription Views
  • 20. © Data by Design, LLC ⃝ www.dataXdesign.com MDS Excel Add-in MDS Web App MDS Web App MDS Web Service MDS Staging Tables IIS SQL Server SQL Server Master Data Services
  • 21. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com  User Experience – Stewardship, Access, Manage  Workflow – Initiate, Approve, Contribute, Calculate  Golden Record Management – Matching, Survivorship  Data Quality – Verification, Address, Person, Email  Application Integration – MDM, CRM, Federation  MDM Programmability – Web Objects, Web Services Profisee Maestro (Empowering MDM)
  • 22. © Data by Design, LLC ⃝ www.dataXdesign.com Custom Apps Workflow Forms Web Parts Maestro Desktop MDS Excel Add-in MDS Web App MS Dynamics Salesforce Maestro Maestro Maestro Web App MDS Web App MDS Web ServiceMaestro Web Service MDS Staging Tables Maestro SDK/API Connectors IIS SQL Server Batch Integration Real-time Integration MSMQ
  • 23. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com Maestro What we can show you today! BIG DATA Data Staging/Data Acquisition Data Quality as a Service (BUS) External Transactional Data (streams) BIG DATA Data Delivery Platform Cloudera Ingest Enterprise Master Data Masters Matching Cleansing Enriching Raw Customer Data Customer Data that needs to be “mastered” ClouderaAny Any Mastered (“conformed”) Customer Data Enterprise Customer Data Maestro MDS Data Steward
  • 24. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com Staging, matching, cleansing, publishing (MDS & Maestro) MDS Demo (5 mins) Maestro Demo (10 mins)
  • 25. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com Great options, even better opportunities  Understand your processing and data requirements!  Strive for high quality data that is relevant to your most important business drivers/needs!  Work within a consistent framework that provides you the needed performance, access, compliance, and quality your company demands!  Plug in data quality (DQaaS) as early as you can in the big data food chain (starting at acquisition (ingest) time)
  • 26. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com  Top data experts in the industry  USA and European offices  Acclaimed Authors  Presenters at major conferences info@dataXdesign.com Data Consulting Services
  • 27. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com
  • 28. © Data by Design, LLC ⃝ www.dataXdesign.com© Data by Design, LLC ⃝ www.dataXdesign.com No User Interface We know how to fix the data, but we don’t have a place to fix it. Cosmic Data Volumes More data than ever before – but is it consistent? Is the noise growing faster than the signal? Multiple Stakeholders Departments, Subsidiaries, Regulatory bodies, view the world in different ways. Data Quality is Poor Duplicates exist even within systems and data values are missing or just plain wrong. Externally Sourced Data They’re talking about my things on the web, but they aren’t speaking my language. Multiple Systems Even my own systems don’t use the same names and don’t have the same attributes. How do I trust that my analytics are driving correct decisions? TheCaseforMasterDataManagement
  • 29. © Data by Design, LLC ⃝ www.dataXdesign.com Query without a data map SELECT customers who complained on Facebook more than twice FROM Giant Hot Mess of Data WHERE product is in this giant list and flag = current BUT NOT when starts with XRB0 AND ALSO include these other products from this acquisitions list when these four conditions match but never when the country of manufacture is Sweden ... Goes on for 16 pages...are you following this?... Query with a data map SELECT customers who complained on Facebook more than twice FROM Giant Hot Mess of Data JOIN Map on known keys WHERE product is a current reporting product
  • 30. © Data by Design, LLC ⃝ www.dataXdesign.com Name Code Source Add1.. Customer # Master XYZ Corporation Master-6001 329 Main St South C5321 Master-6001 XYZ c 6001 EXT2 329 Main Street S Master-6001 XYZ Corporation 6005 ERP 1 3229 Main St Master-6001 Xyz Corp 6009 CRM 329 Main Street So C5321 Master-6001 Profisee Master-6003 2520 Northwinds C5400 Master-6003 Profisee 6003 CRM 2520 Northwinds C5400 Master-6003 Master Customer Golden Records Bind other Candidate Records New Candidate Records are Address Corrected for Sure Matching Candidate Records are added to their “Master or Golden” Record Group Golden Records may have attributes from the candidate records or new attributes altogether
  • 31. © Data by Design, LLC ⃝ www.dataXdesign.com 31 Profisee Maestro: Reference Architecture Company Feed Industry Feed Ratings Feed Reference Data Flat Files XML EMR1 EMR2 Credentialing1 Credentialing2 Labs1 Labs2 Datawarehouse Flat Files XML ERP1 ERP2 CRM1 CRM2 SCM1 SCM2 DW BI GL HR