SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 1
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 2
Speakers
Dr. Michael Stonebraker
Co-Founder,
Tamr
Anthony Deighton
Chief Product Officer,
Tamr
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
Blunder #1
Not Planning to Move Most EVERYTHING to the Cloud
3
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
It may take a decade, but it is the right thing to do
● Dewitt vignette
● Hamilton vignette
● Elasticity!!!
● Data will move easier than applications -- decision support first
4
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
YABUT...
5
Security
● Cloud security is
likely better than
yours
● Misconfiguration,
rogue employees
Cost
● Likely that
you are
cheating
Geographic
Restrictions
● Cloud guys
respect this
Legal
Restrictions
● Hopefully a
short term
problem
Other
Restrictions
● Your CEO
doesn’t
approve (see
item 11 to
come)
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
YABUT...
6
Where does App run?
● Decision support: move the app
● Other stuff:
○ Start with local deployment; move to remote data (SLOWLY!!!)
○ Migrate to cloud-native as you have resources, starting with the most
costly ones
○ This may be a lot of work and may take a decade or more
○ Issue is legacy code/hardware
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 7
Blunder #2
Not Planning for AI/ML to be Disruptive
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
Blunder #2
Not Planning for AI/ML to be Disruptive
ML (whether deep or conventional) is getting much better
● Will displace workers with easy-to-explain jobs
● Think autonomous vehicles, automatic checkout, drone delivery, actuary
calculations
Likely to be disruptive
● You can be a disruptor or get disrupted - Your choice
● Think Uber/Lyft or taxis
8
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
So what to do?
9
Pay up to get some AI/ML experts
● They are in short supply and very expensive
● Don’t contract this out (See Blunder #8)
Get going on the coming arms race
● You will be a winner or a loser in a winner-take-all sweepstakes
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 10
Blunder #3
Not Solving your REAL Data Science Problem
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
Blunder #3
Not Solving your REAL Data Science Problem
Typical data scientist spends 90+% of his/her time on data discovery, data
integration and data cleaning
● Irobot vignette
● Merck vignette
Nobody quotes less than 80%!!!
● Without clean data ML is worthless!!!
○ More accurately without “clean enough” data, ML is worthless
Obvious directive: Get a strategy in place to do this
● Start by giving Chief Data Officer (CDO) read access to ALL enterprise data!
11
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 12
Blunder #4
Belief that Traditional Data Integration
Techniques Will Solve Issue #3
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
Blunder #4
Belief that Traditional Data Integration Techniques Will
Solve Issue #3
Exact Transformation and Load
(Available from a variety of vendors)
13
Master Data Management
(Also available from the usual suspects)
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
ETL
What’s attempted:
● Decide what data sources to
integrate (top dow)
● Build a global data model (up front)
● For each data source
○ Send a programmer to interview
the data set owner
○ He then builds an extractor, data
cleaning routines (in a proprietary
scripting language)
○ And loads data into the global
schema
14
Why it doesn’t work:
● I have never seen this technique work for
more than 20 data sources
○ Too human intensive
● Building a global schema upfront is way
too different at scale
○ Remember enterprise wide data models
from 15-20 years ago...
● Most enterprises I know have way more
than 20 data sources
○ Merck has 4000+/- Oracle data
bases
○ A data lake
○ Countless files
○ And data from the web is also
important
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
MDM
● Once you have run ETL, you need “match/merge”
● MDM suggests building “golden records” by
○ Implementing match rues (e.g. two entities are the same if they have the same
address)
○ Implementing merge rules (e.g. take the most recent value and ignore older ones)
Doesn’t Scale!
● GE classification problem: 20M spend transactions to be classified
into a pre-built hierarchy
● 500 rules classified only 10% of the spend transaction
15
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
So what to do?
At scale, you need a solution that leverages ML and statistics
● OK to use rules to generate training data
● That’s what Tamr did on the GE problem
16
+
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 17
Blunder #5
Belief that Data Warehouses will Solve all your Problems
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
Blunder #5
Belief that Data Warehouses will Solve all your Problems
18
Data warehouses are good at customer facing structured data
FROM A FEW DATA SOURCES
● But not text, images, video, …
● Use the technology for what it is good for
○ Do not perform unnatural acts!
○ And get rid of the “high price spread”, if you bought into it
○ And remember that your warehouse will move to the cloud (see
Blunder #1)
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 19
Blunder #6
Belief that Hadoop/Spark will Solve all your Problems
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
Blunder #6
Belief that Hadoop/Spark will Solve all your Problems
20
● Hadoop/Spark is not very good at anything
○ E.g. Spark/SQL is not competitive (but getting better)
○ E.g. Spark/Streaming is not competitive (last time I looked)
● Use “best of breed” not “lowest common denominator” -- at least for your
“secret sauce”
○ This is a universal blunder -- desire to use only one vendor
○ Hadoop/Spark is not very good at anything
● And…
○ Spark/Hadoop is useless on Blunders #3 and #4 (i.e. data integration)
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
So what to do with your Hadoop/Spark cluster?
● Repurpose it or a Data Lake
● Repurpose it for Data Integration
● Throw it Away
○ Hardware lifetime is 3 years (maybe)
○ Remember Blunder #1
21
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 22
Blunder #7
Belief that Data Lakes will Solve all your Problems
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
Blunder #7
Belief that Data Lakes will Solve all your Problems
23
Conventional Wisdom
Just load all your data into a “data
lake” and you will be able to
correlate all data sets
Important Fact (Tattoo this on
your Brain):
Independently constructed data
sets are never “plug compatible”
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
Why?
● Schemas don’t match
○ You call it salary; I call it
wages
● Units don’t match
○ You use Euros; I use $$$
● Semantics don’t match
○ My salaries are gross before
taxes; yours are net after
taxes with a lunch allowance
24
● Time granularity doesn't match
○ You have annual data; I have
monthly data
● Data is dirty
○ 99 means null (sometimes)
○ Null means “data missing” or
“data not allowed” or...
● Duplicates must be removed
○ And there are no keys
○ I am Mike Stonebraker in
one data set; M.R.
Stonebreaker in a second
one
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
The Net Result
● Your analytics will be garbage
○ “GIGO”
● Your ML models will fail
○ I.e. produce garbage
○ Again “GIGO”
25
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
So what to do?
● You don’t have a data lake; you have a data swamp
● Need a data curation system
○ Which will solve the aforementioned problems
○ And this will not be trivial!!
● Traditional technology likely to fail (See Blunder #4)
● This is an 800 pound gorilla
○ Make sure you put your best people on it!!!!
○ Chances are your in-house solution is crap
○ Use modern technology (from startups) not your “home brew”
● If you want the best technology, you have to deal with startups!!!!
26
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 27
Blunder #8
Outsourcing your new stuff to Palantir, IBM, Mu Sigma
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
Blunder #8
Outsourcing your new stuff to Palantir, IBM, Mu Sigma
28
● Typical enterprise spends 95% of its IT resources keeping current
(legacy) code running
○ i.e. Maintenance
○ Most are dug in pretty deep
○ Often have the best people “keeping the lights on”
● “Shiny new stuff” gets outsourced
○ Often because here is no appropriate talent internally
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
This is a catch 22
● Your maintenance is boring!
○ So creative people quit
○ So there is no good talent to work on the new stuff
○ And you can’t hire great talent (Takes great people to hire great people)
● Your new stuff is your “secret sauce” over the next decade or so…
○ Please don’t outsource it. This is long-term suicide
○ Instead outsource the diddly-crap (e-mail et. al.)
○ Software is your secret sauce -- invest in your own people
29
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
So what to do?
1. Start by solving Blunder #2
(Not planning for AI/ML to change most everything)
1. Outsource the borning maintenance
2. Cancel the Palantir contract
30
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 31
Blunder #9
Succumbing to the “Innovator’s Dilemma”
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
Blunder #9
Succumbing to the “Innovator’s Dilemma”
32
● Must read book by Clayton
Christensen
● Stream shovel example
○ Cable stream shovels - big payload
○ Hydraulics - much safer, but low
payload
● Used for “small jobs”
○ Payloads increased and hydraulics
won
○ Cable guys went out of business
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
Net-Net
● Have to be willing to give up your current business model
● And reinvent yourself
● Possibly losing some current customers in the process
○ Otherwise, you go out of business in the long run
○ Taxi licenses in Cambridge have gone from $700k to $10k
33
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 34
Blunder #10
Not Paying Up for a Few “Rocket Scientists”
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
Blunder #10
Not Paying Up for a Few “Rocket Scientists”
35
● They will be your guiding light to avoiding these blunders
● They will be “off scale”
○ Your HR folks won’t like what you have to pay
● Chances are they will be weird
○ E.g. no shoes, no socks, no tie, feet on the table, ...
● Please don’t drive them away!
○ As Citibank did to one of my Berkeley students a while ago
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 36
Blunder #11 (Bonus)
Working for a Company That is not Trying to
do Something about the “Sins of the Past”
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
Blunder #11 (Bonus)
Working for a Company That is not Trying to do
Something about the “Sins of the Past”
37
If you work for a company that is succumbing to (even one) of these blunders
then:
1. You should be fixing it
a. Be part of the solution, not part of the problem
2. Or looking for a new employer
a. Tamr is hiring!
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021
Questions?
38
How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 39
Thank You!
To learn more about Tamr visit tamr.com
You’ll receive the 10 Big Data Analytics
Blunders Infographic via email.

Weitere ähnliche Inhalte

Was ist angesagt?

Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...
Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...
Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...
DATAVERSITY
 
Helping HR to Cross the Big Data Chasm
Helping HR to Cross the Big Data ChasmHelping HR to Cross the Big Data Chasm
Helping HR to Cross the Big Data Chasm
DATAVERSITY
 
Data Leadership - Stop Talking About Data and Start Making an Impact!
Data Leadership - Stop Talking About Data and Start Making an Impact!Data Leadership - Stop Talking About Data and Start Making an Impact!
Data Leadership - Stop Talking About Data and Start Making an Impact!
DATAVERSITY
 
Data-Ed Online Webinar: Data-centric Strategy & Roadmap
Data-Ed Online Webinar: Data-centric Strategy & RoadmapData-Ed Online Webinar: Data-centric Strategy & Roadmap
Data-Ed Online Webinar: Data-centric Strategy & Roadmap
DATAVERSITY
 

Was ist angesagt? (20)

Essential Metadata Strategies
Essential Metadata StrategiesEssential Metadata Strategies
Essential Metadata Strategies
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata Management
 
Data-Ed Webinar: Your Data Strategy
Data-Ed Webinar: Your Data StrategyData-Ed Webinar: Your Data Strategy
Data-Ed Webinar: Your Data Strategy
 
ADV Slides: Organizational Change Management in Becoming an Analytic Organiza...
ADV Slides: Organizational Change Management in Becoming an Analytic Organiza...ADV Slides: Organizational Change Management in Becoming an Analytic Organiza...
ADV Slides: Organizational Change Management in Becoming an Analytic Organiza...
 
Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...
Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...
Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...
 
DAS Slides: Graph Databases — Practical Use Cases
DAS Slides: Graph Databases — Practical Use CasesDAS Slides: Graph Databases — Practical Use Cases
DAS Slides: Graph Databases — Practical Use Cases
 
DAS Webinar: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Webinar: Emerging Trends in Data Architecture – What’s the Next Big Thing?DAS Webinar: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Webinar: Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Helping HR to Cross the Big Data Chasm
Helping HR to Cross the Big Data ChasmHelping HR to Cross the Big Data Chasm
Helping HR to Cross the Big Data Chasm
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata Management
 
DAS Slides: Data Governance - Combining Data Management with Organizational ...
DAS Slides: Data Governance -  Combining Data Management with Organizational ...DAS Slides: Data Governance -  Combining Data Management with Organizational ...
DAS Slides: Data Governance - Combining Data Management with Organizational ...
 
Data Leadership - Stop Talking About Data and Start Making an Impact!
Data Leadership - Stop Talking About Data and Start Making an Impact!Data Leadership - Stop Talking About Data and Start Making an Impact!
Data Leadership - Stop Talking About Data and Start Making an Impact!
 
Data-Ed Online Webinar: Data-centric Strategy & Roadmap
Data-Ed Online Webinar: Data-centric Strategy & RoadmapData-Ed Online Webinar: Data-centric Strategy & Roadmap
Data-Ed Online Webinar: Data-centric Strategy & Roadmap
 
DataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best PracticesDataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best Practices
 
Customer digitaldecisioningfinal
Customer digitaldecisioningfinalCustomer digitaldecisioningfinal
Customer digitaldecisioningfinal
 
Implementing the Data Maturity Model (DMM)
Implementing the Data Maturity Model (DMM)Implementing the Data Maturity Model (DMM)
Implementing the Data Maturity Model (DMM)
 
Big Data Strategies – Organizational Structure and Technology
Big Data Strategies – Organizational Structure and TechnologyBig Data Strategies – Organizational Structure and Technology
Big Data Strategies – Organizational Structure and Technology
 
Analytic Platforms Should Be Columnar Orientation
Analytic Platforms Should Be Columnar OrientationAnalytic Platforms Should Be Columnar Orientation
Analytic Platforms Should Be Columnar Orientation
 
DataEd Slides: Getting (Re)Started with Data Stewardship
DataEd Slides: Getting (Re)Started with Data StewardshipDataEd Slides: Getting (Re)Started with Data Stewardship
DataEd Slides: Getting (Re)Started with Data Stewardship
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Data-Ed Online Webinar: Data Governance Strategies
Data-Ed Online Webinar: Data Governance StrategiesData-Ed Online Webinar: Data Governance Strategies
Data-Ed Online Webinar: Data Governance Strategies
 

Ähnlich wie Slides: How to Avoid the 10 Big Data Analytics Blunders — Best Practices for Success in 2021

BIG DATA WORKBOOK OCT 2015
BIG DATA WORKBOOK OCT 2015BIG DATA WORKBOOK OCT 2015
BIG DATA WORKBOOK OCT 2015
Fiona Lew
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
mark madsen
 
Michael Cusumano - Strategy Rules
Michael Cusumano - Strategy RulesMichael Cusumano - Strategy Rules
Michael Cusumano - Strategy Rules
INBOUND
 

Ähnlich wie Slides: How to Avoid the 10 Big Data Analytics Blunders — Best Practices for Success in 2021 (20)

Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI Strategy
 
Pitfalls and pro-tips for effective and transparent Business Intelligence too...
Pitfalls and pro-tips for effective and transparent Business Intelligence too...Pitfalls and pro-tips for effective and transparent Business Intelligence too...
Pitfalls and pro-tips for effective and transparent Business Intelligence too...
 
Webinar | Good Guys vs. Bad Data: How to Be a Data Quality Hero
Webinar | Good Guys vs. Bad Data: How to Be a Data Quality HeroWebinar | Good Guys vs. Bad Data: How to Be a Data Quality Hero
Webinar | Good Guys vs. Bad Data: How to Be a Data Quality Hero
 
"What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual..."What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual...
 
Robotics & AI: Where Are You on Your Automation Journey?
Robotics & AI: Where Are You on Your Automation Journey?Robotics & AI: Where Are You on Your Automation Journey?
Robotics & AI: Where Are You on Your Automation Journey?
 
BIG DATA WORKBOOK OCT 2015
BIG DATA WORKBOOK OCT 2015BIG DATA WORKBOOK OCT 2015
BIG DATA WORKBOOK OCT 2015
 
Modelling for decisions
Modelling for decisionsModelling for decisions
Modelling for decisions
 
Understanding the Data Renaissance in Manufacturing
Understanding the Data Renaissance in ManufacturingUnderstanding the Data Renaissance in Manufacturing
Understanding the Data Renaissance in Manufacturing
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
A DevOps Checklist for Startups
A DevOps Checklist for StartupsA DevOps Checklist for Startups
A DevOps Checklist for Startups
 
"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies
 
Michael Cusumano - Strategy Rules
Michael Cusumano - Strategy RulesMichael Cusumano - Strategy Rules
Michael Cusumano - Strategy Rules
 
Top reasons why big data projects are still a failure
Top reasons why big data projects are still a failureTop reasons why big data projects are still a failure
Top reasons why big data projects are still a failure
 
11 steps you must take before purchasing talent acquisition technology
11 steps you must take before purchasing talent acquisition technology11 steps you must take before purchasing talent acquisition technology
11 steps you must take before purchasing talent acquisition technology
 
Un-dooming IT – a CTO survival manual of how to save your company before it's...
Un-dooming IT – a CTO survival manual of how to save your company before it's...Un-dooming IT – a CTO survival manual of how to save your company before it's...
Un-dooming IT – a CTO survival manual of how to save your company before it's...
 
From Paris Hilton to Walmart: welcome to the Big Data Revolution
From Paris Hilton to Walmart: welcome to the Big Data RevolutionFrom Paris Hilton to Walmart: welcome to the Big Data Revolution
From Paris Hilton to Walmart: welcome to the Big Data Revolution
 
Agile Mumbai 2022 - Abhishek Mishra | How to fail in your AI Endeavors
Agile Mumbai 2022 - Abhishek Mishra | How to fail in your AI EndeavorsAgile Mumbai 2022 - Abhishek Mishra | How to fail in your AI Endeavors
Agile Mumbai 2022 - Abhishek Mishra | How to fail in your AI Endeavors
 
Winning Equation Presentation Nov 12 2015 FINAL
Winning Equation Presentation Nov 12 2015 FINALWinning Equation Presentation Nov 12 2015 FINAL
Winning Equation Presentation Nov 12 2015 FINAL
 
Top 5 Scale Up Mistakes
Top 5 Scale Up MistakesTop 5 Scale Up Mistakes
Top 5 Scale Up Mistakes
 
Top 10 game developer legal mistakes
Top 10 game developer legal mistakesTop 10 game developer legal mistakes
Top 10 game developer legal mistakes
 

Mehr von DATAVERSITY

The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 

Mehr von DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Kürzlich hochgeladen

CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Kürzlich hochgeladen (20)

CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 

Slides: How to Avoid the 10 Big Data Analytics Blunders — Best Practices for Success in 2021

  • 1. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 1
  • 2. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 2 Speakers Dr. Michael Stonebraker Co-Founder, Tamr Anthony Deighton Chief Product Officer, Tamr
  • 3. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 Blunder #1 Not Planning to Move Most EVERYTHING to the Cloud 3
  • 4. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 It may take a decade, but it is the right thing to do ● Dewitt vignette ● Hamilton vignette ● Elasticity!!! ● Data will move easier than applications -- decision support first 4
  • 5. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 YABUT... 5 Security ● Cloud security is likely better than yours ● Misconfiguration, rogue employees Cost ● Likely that you are cheating Geographic Restrictions ● Cloud guys respect this Legal Restrictions ● Hopefully a short term problem Other Restrictions ● Your CEO doesn’t approve (see item 11 to come)
  • 6. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 YABUT... 6 Where does App run? ● Decision support: move the app ● Other stuff: ○ Start with local deployment; move to remote data (SLOWLY!!!) ○ Migrate to cloud-native as you have resources, starting with the most costly ones ○ This may be a lot of work and may take a decade or more ○ Issue is legacy code/hardware
  • 7. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 7 Blunder #2 Not Planning for AI/ML to be Disruptive
  • 8. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 Blunder #2 Not Planning for AI/ML to be Disruptive ML (whether deep or conventional) is getting much better ● Will displace workers with easy-to-explain jobs ● Think autonomous vehicles, automatic checkout, drone delivery, actuary calculations Likely to be disruptive ● You can be a disruptor or get disrupted - Your choice ● Think Uber/Lyft or taxis 8
  • 9. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 So what to do? 9 Pay up to get some AI/ML experts ● They are in short supply and very expensive ● Don’t contract this out (See Blunder #8) Get going on the coming arms race ● You will be a winner or a loser in a winner-take-all sweepstakes
  • 10. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 10 Blunder #3 Not Solving your REAL Data Science Problem
  • 11. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 Blunder #3 Not Solving your REAL Data Science Problem Typical data scientist spends 90+% of his/her time on data discovery, data integration and data cleaning ● Irobot vignette ● Merck vignette Nobody quotes less than 80%!!! ● Without clean data ML is worthless!!! ○ More accurately without “clean enough” data, ML is worthless Obvious directive: Get a strategy in place to do this ● Start by giving Chief Data Officer (CDO) read access to ALL enterprise data! 11
  • 12. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 12 Blunder #4 Belief that Traditional Data Integration Techniques Will Solve Issue #3
  • 13. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 Blunder #4 Belief that Traditional Data Integration Techniques Will Solve Issue #3 Exact Transformation and Load (Available from a variety of vendors) 13 Master Data Management (Also available from the usual suspects)
  • 14. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 ETL What’s attempted: ● Decide what data sources to integrate (top dow) ● Build a global data model (up front) ● For each data source ○ Send a programmer to interview the data set owner ○ He then builds an extractor, data cleaning routines (in a proprietary scripting language) ○ And loads data into the global schema 14 Why it doesn’t work: ● I have never seen this technique work for more than 20 data sources ○ Too human intensive ● Building a global schema upfront is way too different at scale ○ Remember enterprise wide data models from 15-20 years ago... ● Most enterprises I know have way more than 20 data sources ○ Merck has 4000+/- Oracle data bases ○ A data lake ○ Countless files ○ And data from the web is also important
  • 15. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 MDM ● Once you have run ETL, you need “match/merge” ● MDM suggests building “golden records” by ○ Implementing match rues (e.g. two entities are the same if they have the same address) ○ Implementing merge rules (e.g. take the most recent value and ignore older ones) Doesn’t Scale! ● GE classification problem: 20M spend transactions to be classified into a pre-built hierarchy ● 500 rules classified only 10% of the spend transaction 15
  • 16. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 So what to do? At scale, you need a solution that leverages ML and statistics ● OK to use rules to generate training data ● That’s what Tamr did on the GE problem 16 +
  • 17. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 17 Blunder #5 Belief that Data Warehouses will Solve all your Problems
  • 18. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 Blunder #5 Belief that Data Warehouses will Solve all your Problems 18 Data warehouses are good at customer facing structured data FROM A FEW DATA SOURCES ● But not text, images, video, … ● Use the technology for what it is good for ○ Do not perform unnatural acts! ○ And get rid of the “high price spread”, if you bought into it ○ And remember that your warehouse will move to the cloud (see Blunder #1)
  • 19. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 19 Blunder #6 Belief that Hadoop/Spark will Solve all your Problems
  • 20. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 Blunder #6 Belief that Hadoop/Spark will Solve all your Problems 20 ● Hadoop/Spark is not very good at anything ○ E.g. Spark/SQL is not competitive (but getting better) ○ E.g. Spark/Streaming is not competitive (last time I looked) ● Use “best of breed” not “lowest common denominator” -- at least for your “secret sauce” ○ This is a universal blunder -- desire to use only one vendor ○ Hadoop/Spark is not very good at anything ● And… ○ Spark/Hadoop is useless on Blunders #3 and #4 (i.e. data integration)
  • 21. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 So what to do with your Hadoop/Spark cluster? ● Repurpose it or a Data Lake ● Repurpose it for Data Integration ● Throw it Away ○ Hardware lifetime is 3 years (maybe) ○ Remember Blunder #1 21
  • 22. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 22 Blunder #7 Belief that Data Lakes will Solve all your Problems
  • 23. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 Blunder #7 Belief that Data Lakes will Solve all your Problems 23 Conventional Wisdom Just load all your data into a “data lake” and you will be able to correlate all data sets Important Fact (Tattoo this on your Brain): Independently constructed data sets are never “plug compatible”
  • 24. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 Why? ● Schemas don’t match ○ You call it salary; I call it wages ● Units don’t match ○ You use Euros; I use $$$ ● Semantics don’t match ○ My salaries are gross before taxes; yours are net after taxes with a lunch allowance 24 ● Time granularity doesn't match ○ You have annual data; I have monthly data ● Data is dirty ○ 99 means null (sometimes) ○ Null means “data missing” or “data not allowed” or... ● Duplicates must be removed ○ And there are no keys ○ I am Mike Stonebraker in one data set; M.R. Stonebreaker in a second one
  • 25. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 The Net Result ● Your analytics will be garbage ○ “GIGO” ● Your ML models will fail ○ I.e. produce garbage ○ Again “GIGO” 25
  • 26. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 So what to do? ● You don’t have a data lake; you have a data swamp ● Need a data curation system ○ Which will solve the aforementioned problems ○ And this will not be trivial!! ● Traditional technology likely to fail (See Blunder #4) ● This is an 800 pound gorilla ○ Make sure you put your best people on it!!!! ○ Chances are your in-house solution is crap ○ Use modern technology (from startups) not your “home brew” ● If you want the best technology, you have to deal with startups!!!! 26
  • 27. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 27 Blunder #8 Outsourcing your new stuff to Palantir, IBM, Mu Sigma
  • 28. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 Blunder #8 Outsourcing your new stuff to Palantir, IBM, Mu Sigma 28 ● Typical enterprise spends 95% of its IT resources keeping current (legacy) code running ○ i.e. Maintenance ○ Most are dug in pretty deep ○ Often have the best people “keeping the lights on” ● “Shiny new stuff” gets outsourced ○ Often because here is no appropriate talent internally
  • 29. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 This is a catch 22 ● Your maintenance is boring! ○ So creative people quit ○ So there is no good talent to work on the new stuff ○ And you can’t hire great talent (Takes great people to hire great people) ● Your new stuff is your “secret sauce” over the next decade or so… ○ Please don’t outsource it. This is long-term suicide ○ Instead outsource the diddly-crap (e-mail et. al.) ○ Software is your secret sauce -- invest in your own people 29
  • 30. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 So what to do? 1. Start by solving Blunder #2 (Not planning for AI/ML to change most everything) 1. Outsource the borning maintenance 2. Cancel the Palantir contract 30
  • 31. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 31 Blunder #9 Succumbing to the “Innovator’s Dilemma”
  • 32. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 Blunder #9 Succumbing to the “Innovator’s Dilemma” 32 ● Must read book by Clayton Christensen ● Stream shovel example ○ Cable stream shovels - big payload ○ Hydraulics - much safer, but low payload ● Used for “small jobs” ○ Payloads increased and hydraulics won ○ Cable guys went out of business
  • 33. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 Net-Net ● Have to be willing to give up your current business model ● And reinvent yourself ● Possibly losing some current customers in the process ○ Otherwise, you go out of business in the long run ○ Taxi licenses in Cambridge have gone from $700k to $10k 33
  • 34. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 34 Blunder #10 Not Paying Up for a Few “Rocket Scientists”
  • 35. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 Blunder #10 Not Paying Up for a Few “Rocket Scientists” 35 ● They will be your guiding light to avoiding these blunders ● They will be “off scale” ○ Your HR folks won’t like what you have to pay ● Chances are they will be weird ○ E.g. no shoes, no socks, no tie, feet on the table, ... ● Please don’t drive them away! ○ As Citibank did to one of my Berkeley students a while ago
  • 36. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 36 Blunder #11 (Bonus) Working for a Company That is not Trying to do Something about the “Sins of the Past”
  • 37. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 Blunder #11 (Bonus) Working for a Company That is not Trying to do Something about the “Sins of the Past” 37 If you work for a company that is succumbing to (even one) of these blunders then: 1. You should be fixing it a. Be part of the solution, not part of the problem 2. Or looking for a new employer a. Tamr is hiring!
  • 38. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 Questions? 38
  • 39. How to Avoid the 10 Big Data Blunders - Best Practices for Success in 2021 39 Thank You! To learn more about Tamr visit tamr.com You’ll receive the 10 Big Data Analytics Blunders Infographic via email.