SlideShare ist ein Scribd-Unternehmen logo
1 von 16
MDM AND THE DATA UNIFICATION IMPERATIVE
JAMES MARKARIAN | ADVISOR, TAMR
Data Heterogeneity is Inherent in Large Companies
Data sources are bound to applications with idiosyncratic bias
Sales
Marketing
Manufacturing
HR
Support
Finance
AppsStoreApps Store
Sales
Marketing
Manufacturing
HR
Support
Finance
Aggregation of Data Creates Ambiguity/Complexity
Broad analytics create need to bring data together from many sources
Outside Forces = More Confusion + Complexity
Leadership
Changes
Mergers &
Acquisitions
Reorganizations
Result: Just 10% of Data is Consumable by Any One Person
And 80% of data scientist time is spent preparing it
90%
Dark Data
Expectations for Global Corporate IT as Data Broker
Increasing quickly -- along with the hype about Big Data/Analytics 3.0
HR
Sales
Finance
Divisions
Marketing MFG
ENG
Some Options
Option #1 - Deny Variety - use information that is easiest/closest
Option #2 - Manage Variety incrementally - using traditional approaches:
● Standardization
● Aggregation
● Master Data Management
● Rationalize Systems
● Throw Bodies at it
● Improve Individual Productivity
Option #3 - Embrace Variety using probabalistic/model based approach - Tamr
Traditional Data Management Approaches: Necessary but not sufficient
● Standardization
● Aggregation
● Master Data Management
● Rationalize Systems
● Throw Bodies at it
● Improve Individual Productivity
Option #2: “Manage” Variety Using Traditional Approaches
Logical Evolution to Probabilistic/Model-Based Approach
Probabilistic
Deterministic
Probabilistic
Deterministic
Today Future
Probabilistic (Tamr) complements, NOT Replaces, Deterministic (MDM)
INTRODUCING TAMR
▪ Founded in 2013 by
enterprise database software
veterans
▪ World-class engineering team
▪ Top tier venture backing
(Google Ventures, NEA)
Jerry Held,
PhD
Andy Palmer Mike Stonebraker,
PhD
Ihab Ilyas,
PhD
Kevin Burke Nidhi Aggarwal,
PhD
Min Xiao Nik Bates-
Haus
Kevin Willis
10
Managing enterprise information as an asset requires a new,
bottom-up design pattern
Catalog Connect Consume
ALL your metadata and
map it to logical entities
Entities and attributes to
remove information silos
Unified data in the application
of your choice via APIs
“Embrace” Variety -- Tamr’s NextGen Approach
Tamr’s Design Pattern: “Back to the Future”
1990’s Web:
Yahoo’s top-down
organization
2020’s Enterprise:
Probabilistic data source cataloging,
connection and consumption
13
ARCHITECTURE
DATA &
METADAT
A
SOURCES
Analytics,
visualization,
Data Warehouse
Expert Sourcing
Data
Profiling
Schema
Matching
Record
Deduplication
Data Connection Activities
Data
Security
Data
Governance
Machine Learning
DB, ERP,
CRM, CSV
+ DATA
USES
TAMR WORKS WITH MDM SYSTEMS TO HANDLE EXTREME DATA VARIETY
14
MDM
EDW
Published Keys
Schema map
Few Well
understood
sources
Long tail of
disparate
data
sources
Matches &
Rules
● Cleansing
● Consolidation
● Survivorship
● Governance
Rapid Analytics
Benefits
● Business agility
● Faster MDM implementations (months -> weeks)
● Significantly lower ongoing maintenance
Fortune 50 company -- Optimized Sourcing Analysis
Benefits
● Massive reductions in
supplier list size & number
of distinct suppliers
● Automated data
maintenance; lower cost
of ownership
● Powering strategic
sourcing analytics and
governance
● Empowering individual
procurement team with
global view of payment
terms
Catalog
Tamr helps you catalog
metadata across the entire
enterprise, providing a logical
map of all of your information
Find us at Booth #613
Connect
Tamr helps match entities
and attributes across the
full variety of your sources,
leveraging entity relationships
for high accuracy
Consume
Tamr provides a consolidated
view of entities and records for
downstream applications via
a set of RESTful APIs
learn more at tamr.com
Find us at Booth #613

Weitere ähnliche Inhalte

Was ist angesagt?

Business intelligence architectures.pdf
Business intelligence architectures.pdfBusiness intelligence architectures.pdf
Business intelligence architectures.pdf
Anand572211
 
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Simplilearn
 

Was ist angesagt? (20)

Big Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data QualityBig Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Importance of Big data for your Business
Importance of Big data for your BusinessImportance of Big data for your Business
Importance of Big data for your Business
 
Introduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIntroduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data Science
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICS
 
Business intelligence architectures.pdf
Business intelligence architectures.pdfBusiness intelligence architectures.pdf
Business intelligence architectures.pdf
 
5 Big Data Use Cases for 2013
5 Big Data Use Cases for 20135 Big Data Use Cases for 2013
5 Big Data Use Cases for 2013
 
2015 Trends in Data Intelligence
2015 Trends in Data Intelligence 2015 Trends in Data Intelligence
2015 Trends in Data Intelligence
 
A Dynamic Data Catalog for Autonomy and Self-Service
A Dynamic Data Catalog for Autonomy and Self-ServiceA Dynamic Data Catalog for Autonomy and Self-Service
A Dynamic Data Catalog for Autonomy and Self-Service
 
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
6 levels of big data analytics applications
6 levels of big data analytics applications6 levels of big data analytics applications
6 levels of big data analytics applications
 
What is big data ? | Big Data Applications
What is big data ? | Big Data ApplicationsWhat is big data ? | Big Data Applications
What is big data ? | Big Data Applications
 
Significance of Data Mining
Significance of Data MiningSignificance of Data Mining
Significance of Data Mining
 
Accelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data FabricAccelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data Fabric
 
Thilga
ThilgaThilga
Thilga
 
Intro to big data and applications - day 2
Intro to big data and applications - day 2Intro to big data and applications - day 2
Intro to big data and applications - day 2
 
Big Data SurVey - IOUG - 2013 - 594292
Big Data SurVey - IOUG - 2013 - 594292Big Data SurVey - IOUG - 2013 - 594292
Big Data SurVey - IOUG - 2013 - 594292
 
Importance of data analytics for business
Importance of data analytics for businessImportance of data analytics for business
Importance of data analytics for business
 
Big data Seminar/Presentation
Big data Seminar/PresentationBig data Seminar/Presentation
Big data Seminar/Presentation
 

Andere mochten auch

A seminar report on data aggregation in wireless sensor networks
A seminar report on data aggregation in wireless sensor networksA seminar report on data aggregation in wireless sensor networks
A seminar report on data aggregation in wireless sensor networks
praveen369
 

Andere mochten auch (12)

14 Habits of Great SQL Developers
14 Habits of Great SQL Developers14 Habits of Great SQL Developers
14 Habits of Great SQL Developers
 
Mdm
MdmMdm
Mdm
 
Dive Into Azure Data Lake - PASS 2017
Dive Into Azure Data Lake - PASS 2017Dive Into Azure Data Lake - PASS 2017
Dive Into Azure Data Lake - PASS 2017
 
An introduction to Jupyter Notebooks for Social Science research
An introduction to Jupyter Notebooks for Social Science researchAn introduction to Jupyter Notebooks for Social Science research
An introduction to Jupyter Notebooks for Social Science research
 
Introduction to Data Linkage
Introduction to Data LinkageIntroduction to Data Linkage
Introduction to Data Linkage
 
Biosocial research: Biological data quality issues
Biosocial research: Biological data quality issuesBiosocial research: Biological data quality issues
Biosocial research: Biological data quality issues
 
Sustainability Information in Mining: Technologies and Processes for Data Agg...
Sustainability Information in Mining: Technologies and Processes for Data Agg...Sustainability Information in Mining: Technologies and Processes for Data Agg...
Sustainability Information in Mining: Technologies and Processes for Data Agg...
 
Data Aggregation and Dissemination in Vehicular Ad-Hoc Networks
Data Aggregation and Dissemination in Vehicular Ad-Hoc NetworksData Aggregation and Dissemination in Vehicular Ad-Hoc Networks
Data Aggregation and Dissemination in Vehicular Ad-Hoc Networks
 
Biosocial research missing data
Biosocial research missing dataBiosocial research missing data
Biosocial research missing data
 
A seminar report on data aggregation in wireless sensor networks
A seminar report on data aggregation in wireless sensor networksA seminar report on data aggregation in wireless sensor networks
A seminar report on data aggregation in wireless sensor networks
 
Biosocial research framework
Biosocial research frameworkBiosocial research framework
Biosocial research framework
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 

Ähnlich wie Tamr | MDM and the Data Unification Imperative

Master Data Management
Master Data ManagementMaster Data Management
Master Data Management
Sabir Akhtar
 
1145_October5_NYCDGSummit
1145_October5_NYCDGSummit1145_October5_NYCDGSummit
1145_October5_NYCDGSummit
Robert Quinn
 
Group 2 Handling and Processing of big data (1).pptx
Group 2 Handling and Processing of big data (1).pptxGroup 2 Handling and Processing of big data (1).pptx
Group 2 Handling and Processing of big data (1).pptx
NATASHABANO
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
Trillium Software
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
Chicago Hadoop Users Group
 
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
CompTIA
 
Strata NYC 2015 - Transamerica and INFA v1
Strata NYC 2015 - Transamerica and INFA v1Strata NYC 2015 - Transamerica and INFA v1
Strata NYC 2015 - Transamerica and INFA v1
Vishal Bamba
 

Ähnlich wie Tamr | MDM and the Data Unification Imperative (20)

Tamr gartner bi and analytics summit
Tamr   gartner bi and analytics summitTamr   gartner bi and analytics summit
Tamr gartner bi and analytics summit
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data Management
 
1145_October5_NYCDGSummit
1145_October5_NYCDGSummit1145_October5_NYCDGSummit
1145_October5_NYCDGSummit
 
Group 2 Handling and Processing of big data (1).pptx
Group 2 Handling and Processing of big data (1).pptxGroup 2 Handling and Processing of big data (1).pptx
Group 2 Handling and Processing of big data (1).pptx
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
Top 10 guidelines for deploying modern data architecture for the data driven ...
Top 10 guidelines for deploying modern data architecture for the data driven ...Top 10 guidelines for deploying modern data architecture for the data driven ...
Top 10 guidelines for deploying modern data architecture for the data driven ...
 
Sgcp14dunlea
Sgcp14dunleaSgcp14dunlea
Sgcp14dunlea
 
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
 
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
Making Information Management The Foundation Of The Future (Master Data Manag...
Making Information Management The Foundation Of The Future (Master Data Manag...Making Information Management The Foundation Of The Future (Master Data Manag...
Making Information Management The Foundation Of The Future (Master Data Manag...
 
Strata NYC 2015 - Transamerica and INFA v1
Strata NYC 2015 - Transamerica and INFA v1Strata NYC 2015 - Transamerica and INFA v1
Strata NYC 2015 - Transamerica and INFA v1
 
Data Management
Data ManagementData Management
Data Management
 
Data-Ed Online Webinar: Monetizing Data Management
Data-Ed Online Webinar: Monetizing Data ManagementData-Ed Online Webinar: Monetizing Data Management
Data-Ed Online Webinar: Monetizing Data Management
 
Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management  Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management
 
Data Analytics.pptx
Data Analytics.pptxData Analytics.pptx
Data Analytics.pptx
 
Salesforce Master Data Management Webinar
Salesforce Master Data Management WebinarSalesforce Master Data Management Webinar
Salesforce Master Data Management Webinar
 
Effectively Leveraging Graph Technology - Ann Grubbs, Lockheed Martin
Effectively Leveraging Graph Technology - Ann Grubbs, Lockheed MartinEffectively Leveraging Graph Technology - Ann Grubbs, Lockheed Martin
Effectively Leveraging Graph Technology - Ann Grubbs, Lockheed Martin
 

Kürzlich hochgeladen

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Kürzlich hochgeladen (20)

%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 

Tamr | MDM and the Data Unification Imperative

  • 1. MDM AND THE DATA UNIFICATION IMPERATIVE JAMES MARKARIAN | ADVISOR, TAMR
  • 2. Data Heterogeneity is Inherent in Large Companies Data sources are bound to applications with idiosyncratic bias Sales Marketing Manufacturing HR Support Finance AppsStoreApps Store
  • 3. Sales Marketing Manufacturing HR Support Finance Aggregation of Data Creates Ambiguity/Complexity Broad analytics create need to bring data together from many sources
  • 4. Outside Forces = More Confusion + Complexity Leadership Changes Mergers & Acquisitions Reorganizations
  • 5. Result: Just 10% of Data is Consumable by Any One Person And 80% of data scientist time is spent preparing it 90% Dark Data
  • 6. Expectations for Global Corporate IT as Data Broker Increasing quickly -- along with the hype about Big Data/Analytics 3.0 HR Sales Finance Divisions Marketing MFG ENG
  • 7. Some Options Option #1 - Deny Variety - use information that is easiest/closest Option #2 - Manage Variety incrementally - using traditional approaches: ● Standardization ● Aggregation ● Master Data Management ● Rationalize Systems ● Throw Bodies at it ● Improve Individual Productivity Option #3 - Embrace Variety using probabalistic/model based approach - Tamr
  • 8. Traditional Data Management Approaches: Necessary but not sufficient ● Standardization ● Aggregation ● Master Data Management ● Rationalize Systems ● Throw Bodies at it ● Improve Individual Productivity Option #2: “Manage” Variety Using Traditional Approaches
  • 9. Logical Evolution to Probabilistic/Model-Based Approach Probabilistic Deterministic Probabilistic Deterministic Today Future Probabilistic (Tamr) complements, NOT Replaces, Deterministic (MDM)
  • 10. INTRODUCING TAMR ▪ Founded in 2013 by enterprise database software veterans ▪ World-class engineering team ▪ Top tier venture backing (Google Ventures, NEA) Jerry Held, PhD Andy Palmer Mike Stonebraker, PhD Ihab Ilyas, PhD Kevin Burke Nidhi Aggarwal, PhD Min Xiao Nik Bates- Haus Kevin Willis 10
  • 11. Managing enterprise information as an asset requires a new, bottom-up design pattern Catalog Connect Consume ALL your metadata and map it to logical entities Entities and attributes to remove information silos Unified data in the application of your choice via APIs “Embrace” Variety -- Tamr’s NextGen Approach
  • 12. Tamr’s Design Pattern: “Back to the Future” 1990’s Web: Yahoo’s top-down organization 2020’s Enterprise: Probabilistic data source cataloging, connection and consumption
  • 13. 13 ARCHITECTURE DATA & METADAT A SOURCES Analytics, visualization, Data Warehouse Expert Sourcing Data Profiling Schema Matching Record Deduplication Data Connection Activities Data Security Data Governance Machine Learning DB, ERP, CRM, CSV + DATA USES
  • 14. TAMR WORKS WITH MDM SYSTEMS TO HANDLE EXTREME DATA VARIETY 14 MDM EDW Published Keys Schema map Few Well understood sources Long tail of disparate data sources Matches & Rules ● Cleansing ● Consolidation ● Survivorship ● Governance Rapid Analytics Benefits ● Business agility ● Faster MDM implementations (months -> weeks) ● Significantly lower ongoing maintenance
  • 15. Fortune 50 company -- Optimized Sourcing Analysis Benefits ● Massive reductions in supplier list size & number of distinct suppliers ● Automated data maintenance; lower cost of ownership ● Powering strategic sourcing analytics and governance ● Empowering individual procurement team with global view of payment terms
  • 16. Catalog Tamr helps you catalog metadata across the entire enterprise, providing a logical map of all of your information Find us at Booth #613 Connect Tamr helps match entities and attributes across the full variety of your sources, leveraging entity relationships for high accuracy Consume Tamr provides a consolidated view of entities and records for downstream applications via a set of RESTful APIs learn more at tamr.com Find us at Booth #613

Hinweis der Redaktion

  1. Key Messages: Introduce yourself as James Markarian I am currently an EIR at at Khosla ventures. Prior to Khosla, I spent 15 years as the CTO of Informatica, a leader in the ETL space, where I focused on <x> Recently, I joined Tamr, a company focused on unifying and enriching internal and external data for enterprise analytics, to advise them on product architecture and strategy. Today I’ll be speaking a bit about how data variety, the natural, siloed nature of data as it’s created, is creating a bottleneck to analytics, and how deterministic data unification approaches aren’t alone sufficient to scale to the variety of hundreds or thousands of data silos found within the enterprise.
  2. e>>> Heterogeneity of information sources is natural in large companies Much of the roughly $3-4 trillion invested in enterprise software over the last 20 years, has gone toward building and deploying software systems and applications to automate and optimize key business processes in context of specific functions (sales, marketing, manufacturing) and/or geographies (countries, regions, states, etc) - essentially these are systems that produce data and do so in a very idiosyncratic manner. As each of these idiosyncratic applications are deployed - an equally idiosyncratic data source is created. The result: the data tied to enterprise investments in software is extremely heterogeneous and siloed - the broad use of the data has been 2ndary to the primary activity of automating business processes - producing the data. The data is almost like an idiosyncratic exhaust of all of these various applications. It’s not surprising (actually natural) that information across a large enterprise is disconnected and is managed more as the exhaust of 30+ years of business process automation. I think of this as a form of enterprise information entropy. The effort to standardize on single vendor platforms as well as creating enterprise-wide data warehouses has largely been an attempt to compensate for natural enterprise data variety/entropy and ironically - the top-down, approaches used to rationalize to a single platform or implement most warehouses (Deterministic ETL, Master Data Management and Waterfall Data Management Methods) - created not fewer silos - but just additional larger silos that increased the overall variety of data sources within an organization.
  3. >>> Heterogeneity of information sources is natural in large companies Much of the roughly $3-4 trillion invested in enterprise software over the last 20 years, has gone toward building and deploying software systems and applications to automate and optimize key business processes in context of specific functions (sales, marketing, manufacturing) and/or geographies (countries, regions, states, etc) - essentially these are systems that produce data and do so in a very idiosyncratic manner. As each of these idiosyncratic applications are deployed - an equally idiosyncratic data source is created. The result: the data tied to enterprise investments in software is extremely heterogeneous and siloed - the broad use of the data has been 2ndary to the primary activity of automating business processes - producing the data. The data is almost like an idiosyncratic exhaust of all of these various applications. It’s not surprising (actually natural) that information across a large enterprise is disconnected and is managed more as the exhaust of 30+ years of business process automation. I think of this as a form of enterprise information entropy. The effort to standardize on single vendor platforms as well as creating enterprise-wide data warehouses has largely been an attempt to compensate for natural enterprise data variety/entropy and ironically - the top-down, approaches used to rationalize to a single platform or implement most warehouses (Deterministic ETL, Master Data Management and Waterfall Data Management Methods) - created not fewer silos - but just additional larger silos that increased the overall variety of data sources within an organization.
  4. On top of the historical pull toward application and organization specific data sources - these systems get even more complicated and disconnected when you add the confusion and complexity that results from : M&A events every quarter Reorganizations every 6-12 months Changes in leadership every few years
  5. Objective estimates of the scale of this problem are surprising - specifically - industry analysts estimate that : 90% of big data is dark (not used or cataloged within the enterprise) 90% of collected data isn’t consumable (requires significant work to be useful) 80% of data scientist time is spent preparing the data for consumption Not being managed as an asset
  6. This challenge is only going to become more critical -- especially as expectations of Global Corporate IT as data broker are increasing quickly along with the hype around Big Data/Analytics 3.0 As we look forward to the next 20 years, most companies have begun investing heavily in Big Data Analytics – $44 billion in 2014 alone according to Gartner << insert reference to Data/Analytics being the top priority for CIOs >>. In this context, merely managing all of a company’s data as an asset presents a significant challenge for a globally missioned IT organization. But now - enter the trend toward proverbial Big Data and Analytics 3.0 -- and the already impossible problem of managing data variety becomes a strategic imperative for the IT organization who is now expected to integrate analytics and data seamlessly and quickly across all of these idiosyncratic silos so that all these users with great new democratized viz tools. We’d like to think that our data integration and preparation capabilities are advanced enough to service this great democratization. And that our “plumbing” is capable of treating the massive reserves of silo’d, heterogeneous data. However - these aspirations and the cool new viz tools that are available to everyone in the enterprise require clean, unified data that spans all the various silos. Most companies are finding this heterogeneity is a massive fundamental roadblock to effectively using state-of-the-art analytics and visualization tools. Basically Big Data Variety and heterogeneity is the dirty little secret of most enterprises and while it’s not sexy to spend time cleaning and preparing data - unified data is as important to enterprise analytics as reliable water treatment is to providing clean drinking water to the population. All of this leaves Corporate IT organizations several options to address the data variety problem as data brokers for their enterprise.
  7. Some orgs are simply ignoring the opportunity to convert variety into value – overwhelmed by the sheer volume of heterogeneous sources and data. So they go ahead and carve out their pile, go to their corner, and work with what they have.
  8. >>> Traditional approaches to managing data are necessary but not sufficient to address the broad enterprise data variety problem In order to realize the opportunity in variety – IT brokers need to recognize that their existing top-down tools/approaches are necessary but not sufficient to solve the variety problem. There is a long list of tools in the enterprise arsenal to try to tackle data variety - I’ve tried all of them over the years - specifically: Master Data Management - most of the efforts to do top-down deterministic data modeling results in useful taxonomies, controlled vocabularies and ontologies. This requires you to “tell” the various divisions what they are going to map to - which inevitably degrades into a debate about who is the Master and who is the “Slave”. These also are necessary - but not sufficient in order to manage the broad variety of tabular data in most enterprises. There are always deviations from whatever the 3 star wizards in labcoats who are responsible for the “Master” reference data.
  9. Multiple approaches have emerged to deal with the Data Variety problem, with the current state dominated by extreme top-down management (95% deterministic to 5% probabilistic). I predict that the shear number of data sources and complexity of change is going to drive us toward a bottom-up approach (80% probabilistic to 20% deterministic). The only viable way to tame enterprise data variety is through “bottom-up, collaborative data curation complements traditional MDM, ETL, data profiling and data quality methods.
  10. A Next-Gen Approach We believe that big companies should start by deploying a fundamentally new design pattern for data management which enables their organization to dynamically catalog, connect, curate ALL of their enterprise information sources from the bottom up using a scalable and agile approach. NOTE that Tamr operationalizes this approach at scale, across the enterprise -- NOT as another idiosyncratic solution -- AND work with existing data management and analytics tools]. Connect - Our emphasis has been on connecting diverse data sources across the enterprise, at scale. We are now expanding the platform to bring this level of scalable data unification and use across the enterprise. Catalog - At the front end, Tamr now solves a very common problem: What data do I use to solve this problem? Consume/Curate - Unified data doesn’t live in Tamr. We make it available to any downstream application or analytic tools -- including something as simple as spreadsheets - via a set of RESTful APIs.
  11. This design pattern is not new - it’s a mimic of the design patterns on the modern world wide web - but is designed to connect the primary information asset of the enterprise - tabular data. In the mid-1990’s - the early days of Yahoo!, they used library sciences professionals and top down information management practices and tools to organize websites and web content for search. Over time - it became clear that Google’s bottom-up probabilistic approach to matching web content with search terms - was going to be a much more scalable and effective approach - so much so that as most of you know - Yahoo! decided to license Google’s tech. Inside the enterprise, tabular data sources are the primary assets to be connected instead of websites … and companies need a new set of tools to register/catalog, connect and curate tabular data that is matched to the data/attributes that analytic users want/need. We believe that our technology at Tamr will be incorporated into existing legacy MDM, ETL and Data Management tools much in the way that Yahoo! licenced Google.
  12. Tamr automates schema mapping using a bottom-up approach Tamr is the master for probabilistic keys MDM MDM provides capabilities for Data cleansing Data consolidation Data survivorship Active and passive data governance Results Reduced MDM implementation time (weeks -> months) Reduce ongoing maintenance Use Tamr without MDM for analytical use cases which prioritize velocity of analysis
  13. Challenge With thousands of suppliers spanning many P&Ls and ERP systems, the company has been challenged to maintain an accurate supplier master file (SMF) to drive strategic sourcing analysis Solution Create a unified data model that leverages all relevant sources, including address, tax and government data Machine learning algorithms continuously evaluate & remove potential SMF duplicates Automated processing incrementally improves as validation is received from SMEs Benefits Massive reductions in supplier list size & number of distinct suppliers Automated data maintenance; lower cost of ownership in production Powering strategic sourcing analytics and governance at a corporate level Empowering individual procurement team with global view of payment terms Here’s the link for the long-form write up the team did, for background: https://docs.google.com/a/tamr.com/document/d/12JvLG4wr_PjpKOGlUyoDx6iVULCAkwm5bhHKMYP7vwU/edit?usp=sharing