SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Daniel Molnar @ Oberlo/Shopify / Data Natives @ Berlin @ 2018-11-22
Where I'm coming from
• senior data analy.cs engineer,
• head of data and analy.cs,
• senior applied and data scien.st,
• data analyst,
• or just data janitor.
Perspec've
• rounded, not complete,
• slow, old, stupid and lazy and
tl;dr (new)
• KISS is the philosophy,
• take the long view, invest in durable knowledge,
• strive for fast and good enough,
• just because you can doesn't mean you should,
• figure what to worry about,
• you are not Google.
it used to be a hype
now this is a war
nobody's your friend
they want your money and data (preferably both locked in)
Things you worry about:
• machine learning,
• deep learning,
• GDPR.
Things you should really worry about:
• machine learning adblockers,
• deep learning ELT,
• GDPR, CRM (yes, CRM).
AGGREGATE
& LABEL
Don't skip
leg day.
Do
make
programma'c KPI defini'ons.
Look at the *** data
Toolset
Python,
(P)SQL,
Metabase.
Usual suspect: NPS
• one, simple number you can squint at,
• sampling is skewed,
• answer is unsure,
• easy to hack step func:on1
,
MONKEYPATCH: look at the change of the distro.
1
Eve Rajca aka @EveTheAnalyst and Jacques Ma9heij aka @jma9heij
Google
Analy&cs?
Hero of the day
Mar$n Loetzsch
@mar$n_loetzsch
-=-
KPIs for e-commerce startups
Data Science in Early Stage
Startups: the Struggle to Create
Value
https://github.com/mara
LEARN &
OPTIMIZE
Half of the *me when companies
say they need "AI" what they really
need is a SELECT clause with
GROUP BY. You're welcome.
— Mat Velloso @matvelloso (Technical Advisor to CTO at Microso9)
Don't do A/B tests
99% it will not worth doing it
... conversion rate is 2% ... detec0ng
a rela0ve change of 1% requires an
experiment with 12 million users ...
— Simon Jackson (Booking.com)
R?Shiny.
Usual suspects
• Non-reproducable experiments and tests.
• R hodpepodge in produc9on.
• Beliefs hidden as implicits in models.
ML~AI~DEEP*
You don't have (enough) data.
Make your own data points!
Deploy good enough fast?
Deep learn my ***
Do you really need it?
Tensorflow! ...
... so distributed deep learning
can compress porn on the end
device.
Hero of the day
Szilard [Deeper than Deep
Learning] @DataScienceLA
-=-
Be#er than Deep Learning:
Gradient Boos4ng Machines
(GBMs)
https://github.com/
szilard/benchm-ml
Spark MLlibs GBM implementa3on is 10x
slower, uses 10x more memory and is buggy/
lower accuracy. Total fucking garbage!
— Szilard [Deeper than Deep Learning] @DataScienceLA
MOVE
STORE
EXPLORE
TRANSFORM
Q: Why are there so many
programmers from Eastern Europe?
A: Slavic pessimism. Everything that
can go wrong will go wrong. With
such a mindset programming comes
naturally.
— Mar&n Sustrik @sustrik (Creator of ZeroMQ, nanomsg, libdill.)
over
engineering
you get an other machine
if you can use
one
Do embrace
dirty reality.
Get cloud agnos.c!
• AWS s'll leads the pack by far
• Azure will sell anyway, and all will cry,
• Google competes with the cheap and uncooked
ETL is #solved OMG
• Airflow is an overengineered underperforming nightmare,
• metl for source mappings in magnitude,
• Mara for generic e-commerce,
• night-shift for explicit minimalism.
Showdown
Hero of the day
Mark Litwintschik @marklit82
Summary of the 1.1 Billion Taxi
Rides Benchmarks (500 GB
uncompressed CSV)
https://
tech.marksblogg.com
Spark
Setup Query Median QM per vCPU Cost/hour
11 x m3.xlarge + HDFS 14,91 0,34 27,5
1 x i3.8xlarge + HDFS 26,00 0,81 2,5
21 x m3.xlarge + HDFS 32,00 0,38 5,67
5 x m3.xlarge + S3 466,50 23,33 1,35
3 x Raspberry Pi 1738,00 144,83
HDFS. RPi = 1/6 VCPU ~100 EUR. Linear scaling.
Presto
Setup Query Median QM per vCPU Cost/hour
50 x n1-standard-4 7,00 0,04 9.50
21 x m3.xlarge 11,50 0,14 5.67
10 x n1-standard-4 16,00 0,36 2.09
1 x i3.8xlarge + HDFS 15,00 0,47 2.50
5 x m3.xlarge + HDFS 51,50 0,26 1.35
50 x m3.xlarge + S3 43,50 0,22 13.50
Workhorse in favour. HDFS. 1 machine. Non-linear scaling.
Lazy Evalua*on
Setup Query Median Cost/hour
Redshi', 6 x ds2.8xlarge 1,91 40.80
BigQuery 2,00
Amazon Athena 6,30
Presto, 50 x n1-standard-4 7,00 9.50
Spark, 11 x m3.xlarge + HDFS 14,91 27.50
The human cost -- in both terms.
One Machine
Setup Query Median QM per vCPU Cost/hour
ClickHouse 4,21 1,05
Elas3csearch tuned 13,14 3,29
Presto, 1 x i3.8xlarge + HDFS 15,00 0.47 2.50
Spark, 1 x i3.8xlarge + HDFS 26,00 0,81 2.50
Ver3ca 32,80 8,20
Elas3csearch 48,89 12,22
PSQL 9.5 + cstore_fdw 205,00 51,25
Intel Core i5 4670K VS i3.8xlarge (32 VCPUs). Desktop example costs <600 EUR.
Do you
use adblocking?
Do you use
Google Analy+cs?
9%of the events are lost to ~all third party trackers
due to adblocking.
Sink > Sieve > Sort
ELT aka SQL on flat files with the minimum amount of code wri:en.
BIRD
OF
PREY
Who are you?
• Lip service provider.
• Fake news producer.
• Kingmaker.
Are you the fool
or the grey eminent?
Don't believe the hype.
HR: good people leave.
Marke&ng
Will this ever get be-er?
• adblocking,
• CPA silver bullets are gone,
• conversion & a8ribu9on are hard nuts,
• FB and GO are not your friends (the 900% on videos),
• but CRM is.
GDPR
• road to hell is paved with good inten2ons,
• it's about the process, matey,
• mostly fair,
• yes, you have to clean up your mess,
• dunno, wouldn't buy programma2c shares1
.
1
Eve Rajca aka @EveTheAnalyst and Jacques Ma9heij aka @jma9heij
Thank you!
@soobrosa
We're hiring!
visuals: @mroga., @xkcd, @DorsaAmir, ˙Cаvin 〄, thelearningcurvedotca, JD Hancock, Thomas Hawk,
jonolist, Kalexanderson, Shopify Burst

Weitere ähnliche Inhalte

Ähnlich wie Data-driven insights from Oberlo/Shopify talk

"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f..."Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...Edge AI and Vision Alliance
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable PythonTravis Oliphant
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascienceAdam Muise
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
 
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin Databricks
 
Version Control in AI/Machine Learning by Datmo
Version Control in AI/Machine Learning by DatmoVersion Control in AI/Machine Learning by Datmo
Version Control in AI/Machine Learning by DatmoNicholas Walsh
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
00_pytorch_and_deep_learning_fundamentals.pdf
00_pytorch_and_deep_learning_fundamentals.pdf00_pytorch_and_deep_learning_fundamentals.pdf
00_pytorch_and_deep_learning_fundamentals.pdfeanyang7
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBMongoDB
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondNUS-ISS
 
Taming Your Deep Learning Workflow by Determined AI
Taming Your Deep Learning Workflow by Determined AITaming Your Deep Learning Workflow by Determined AI
Taming Your Deep Learning Workflow by Determined AIdesmondchanatdet
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflowCharmi Chokshi
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
 
Practical DMD Scripting
Practical DMD Scripting Practical DMD Scripting
Practical DMD Scripting Zenoss
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsDataWorks Summit
 
Agile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsAgile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsRussell Jurney
 

Ähnlich wie Data-driven insights from Oberlo/Shopify talk (20)

"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f..."Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascience
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
 
Version Control in AI/Machine Learning by Datmo
Version Control in AI/Machine Learning by DatmoVersion Control in AI/Machine Learning by Datmo
Version Control in AI/Machine Learning by Datmo
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
00_pytorch_and_deep_learning_fundamentals.pdf
00_pytorch_and_deep_learning_fundamentals.pdf00_pytorch_and_deep_learning_fundamentals.pdf
00_pytorch_and_deep_learning_fundamentals.pdf
 
Dato Keynote
Dato KeynoteDato Keynote
Dato Keynote
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDB
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
 
Django at Scale
Django at ScaleDjango at Scale
Django at Scale
 
Taming Your Deep Learning Workflow by Determined AI
Taming Your Deep Learning Workflow by Determined AITaming Your Deep Learning Workflow by Determined AI
Taming Your Deep Learning Workflow by Determined AI
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
Practical DMD Scripting
Practical DMD Scripting Practical DMD Scripting
Practical DMD Scripting
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics Applications
 
Agile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsAgile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics Applications
 

Mehr von DataconomyGmbH

Technical debt in ML | Jaroslaw Szymczak | DN18
Technical debt in ML | Jaroslaw Szymczak | DN18Technical debt in ML | Jaroslaw Szymczak | DN18
Technical debt in ML | Jaroslaw Szymczak | DN18DataconomyGmbH
 
Accessing Online Text-based conversation | Jay Krall | DN2018
Accessing Online Text-based conversation | Jay Krall | DN2018Accessing Online Text-based conversation | Jay Krall | DN2018
Accessing Online Text-based conversation | Jay Krall | DN2018DataconomyGmbH
 
Journey from Structured to Unstructured Data | Nischal HP | VP, Engineering a...
Journey from Structured to Unstructured Data | Nischal HP | VP, Engineering a...Journey from Structured to Unstructured Data | Nischal HP | VP, Engineering a...
Journey from Structured to Unstructured Data | Nischal HP | VP, Engineering a...DataconomyGmbH
 
Causal inference-for-profit | Dan McKinley | DN18
Causal inference-for-profit | Dan McKinley | DN18Causal inference-for-profit | Dan McKinley | DN18
Causal inference-for-profit | Dan McKinley | DN18DataconomyGmbH
 
How to Lie with Data and Statistics? | Iveta Lohovska, Principal Data Scienti...
How to Lie with Data and Statistics? | Iveta Lohovska, Principal Data Scienti...How to Lie with Data and Statistics? | Iveta Lohovska, Principal Data Scienti...
How to Lie with Data and Statistics? | Iveta Lohovska, Principal Data Scienti...DataconomyGmbH
 
Data Science in Clinical Care | Johannes Starlinger, Charité | DN18
Data Science in Clinical Care | Johannes Starlinger, Charité | DN18Data Science in Clinical Care | Johannes Starlinger, Charité | DN18
Data Science in Clinical Care | Johannes Starlinger, Charité | DN18DataconomyGmbH
 
Building a Data Science Consultancy | Bart Smeets | DN18
Building a Data Science Consultancy | Bart Smeets | DN18Building a Data Science Consultancy | Bart Smeets | DN18
Building a Data Science Consultancy | Bart Smeets | DN18DataconomyGmbH
 
Building Sustainable Machine Learning Products for Communities, by Communit...
Building Sustainable Machine Learning Products  for Communities,  by Communit...Building Sustainable Machine Learning Products  for Communities,  by Communit...
Building Sustainable Machine Learning Products for Communities, by Communit...DataconomyGmbH
 
BIG DATA is DEAD | Marc Weimer-Hablitzel, Etventure | DN18
BIG DATA  is DEAD | Marc Weimer-Hablitzel, Etventure | DN18BIG DATA  is DEAD | Marc Weimer-Hablitzel, Etventure | DN18
BIG DATA is DEAD | Marc Weimer-Hablitzel, Etventure | DN18DataconomyGmbH
 
Undermining democracy | Alisa Kolesnikova | DN18
Undermining  democracy | Alisa Kolesnikova | DN18Undermining  democracy | Alisa Kolesnikova | DN18
Undermining democracy | Alisa Kolesnikova | DN18DataconomyGmbH
 
Support automation with chatbots | Erik Pfannmöller | Founder, Solvemate | DN18
Support automation with chatbots | Erik Pfannmöller | Founder, Solvemate | DN18Support automation with chatbots | Erik Pfannmöller | Founder, Solvemate | DN18
Support automation with chatbots | Erik Pfannmöller | Founder, Solvemate | DN18DataconomyGmbH
 
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18Rent, Rain, and Regulations | Du Phan, Dataiku | DN18
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18DataconomyGmbH
 
Linked data in an era of data surrealism Hans Constandt | CEO & FOUNDER
Linked data in an era of data surrealism Hans Constandt | CEO & FOUNDERLinked data in an era of data surrealism Hans Constandt | CEO & FOUNDER
Linked data in an era of data surrealism Hans Constandt | CEO & FOUNDERDataconomyGmbH
 
Living in an Era of Data Surrealism | Hans Constandt | CEO & FOUNDER
Living in an Era of Data Surrealism | Hans Constandt | CEO & FOUNDERLiving in an Era of Data Surrealism | Hans Constandt | CEO & FOUNDER
Living in an Era of Data Surrealism | Hans Constandt | CEO & FOUNDERDataconomyGmbH
 
Are You Ready for the Quickening!
Are You Ready for the Quickening!Are You Ready for the Quickening!
Are You Ready for the Quickening!DataconomyGmbH
 

Mehr von DataconomyGmbH (15)

Technical debt in ML | Jaroslaw Szymczak | DN18
Technical debt in ML | Jaroslaw Szymczak | DN18Technical debt in ML | Jaroslaw Szymczak | DN18
Technical debt in ML | Jaroslaw Szymczak | DN18
 
Accessing Online Text-based conversation | Jay Krall | DN2018
Accessing Online Text-based conversation | Jay Krall | DN2018Accessing Online Text-based conversation | Jay Krall | DN2018
Accessing Online Text-based conversation | Jay Krall | DN2018
 
Journey from Structured to Unstructured Data | Nischal HP | VP, Engineering a...
Journey from Structured to Unstructured Data | Nischal HP | VP, Engineering a...Journey from Structured to Unstructured Data | Nischal HP | VP, Engineering a...
Journey from Structured to Unstructured Data | Nischal HP | VP, Engineering a...
 
Causal inference-for-profit | Dan McKinley | DN18
Causal inference-for-profit | Dan McKinley | DN18Causal inference-for-profit | Dan McKinley | DN18
Causal inference-for-profit | Dan McKinley | DN18
 
How to Lie with Data and Statistics? | Iveta Lohovska, Principal Data Scienti...
How to Lie with Data and Statistics? | Iveta Lohovska, Principal Data Scienti...How to Lie with Data and Statistics? | Iveta Lohovska, Principal Data Scienti...
How to Lie with Data and Statistics? | Iveta Lohovska, Principal Data Scienti...
 
Data Science in Clinical Care | Johannes Starlinger, Charité | DN18
Data Science in Clinical Care | Johannes Starlinger, Charité | DN18Data Science in Clinical Care | Johannes Starlinger, Charité | DN18
Data Science in Clinical Care | Johannes Starlinger, Charité | DN18
 
Building a Data Science Consultancy | Bart Smeets | DN18
Building a Data Science Consultancy | Bart Smeets | DN18Building a Data Science Consultancy | Bart Smeets | DN18
Building a Data Science Consultancy | Bart Smeets | DN18
 
Building Sustainable Machine Learning Products for Communities, by Communit...
Building Sustainable Machine Learning Products  for Communities,  by Communit...Building Sustainable Machine Learning Products  for Communities,  by Communit...
Building Sustainable Machine Learning Products for Communities, by Communit...
 
BIG DATA is DEAD | Marc Weimer-Hablitzel, Etventure | DN18
BIG DATA  is DEAD | Marc Weimer-Hablitzel, Etventure | DN18BIG DATA  is DEAD | Marc Weimer-Hablitzel, Etventure | DN18
BIG DATA is DEAD | Marc Weimer-Hablitzel, Etventure | DN18
 
Undermining democracy | Alisa Kolesnikova | DN18
Undermining  democracy | Alisa Kolesnikova | DN18Undermining  democracy | Alisa Kolesnikova | DN18
Undermining democracy | Alisa Kolesnikova | DN18
 
Support automation with chatbots | Erik Pfannmöller | Founder, Solvemate | DN18
Support automation with chatbots | Erik Pfannmöller | Founder, Solvemate | DN18Support automation with chatbots | Erik Pfannmöller | Founder, Solvemate | DN18
Support automation with chatbots | Erik Pfannmöller | Founder, Solvemate | DN18
 
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18Rent, Rain, and Regulations | Du Phan, Dataiku | DN18
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18
 
Linked data in an era of data surrealism Hans Constandt | CEO & FOUNDER
Linked data in an era of data surrealism Hans Constandt | CEO & FOUNDERLinked data in an era of data surrealism Hans Constandt | CEO & FOUNDER
Linked data in an era of data surrealism Hans Constandt | CEO & FOUNDER
 
Living in an Era of Data Surrealism | Hans Constandt | CEO & FOUNDER
Living in an Era of Data Surrealism | Hans Constandt | CEO & FOUNDERLiving in an Era of Data Surrealism | Hans Constandt | CEO & FOUNDER
Living in an Era of Data Surrealism | Hans Constandt | CEO & FOUNDER
 
Are You Ready for the Quickening!
Are You Ready for the Quickening!Are You Ready for the Quickening!
Are You Ready for the Quickening!
 

Kürzlich hochgeladen

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 

Kürzlich hochgeladen (20)

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 

Data-driven insights from Oberlo/Shopify talk

  • 1. Daniel Molnar @ Oberlo/Shopify / Data Natives @ Berlin @ 2018-11-22
  • 2. Where I'm coming from • senior data analy.cs engineer, • head of data and analy.cs, • senior applied and data scien.st, • data analyst, • or just data janitor.
  • 3. Perspec've • rounded, not complete, • slow, old, stupid and lazy and
  • 4. tl;dr (new) • KISS is the philosophy, • take the long view, invest in durable knowledge, • strive for fast and good enough, • just because you can doesn't mean you should, • figure what to worry about, • you are not Google.
  • 5. it used to be a hype now this is a war nobody's your friend they want your money and data (preferably both locked in)
  • 6. Things you worry about: • machine learning, • deep learning, • GDPR.
  • 7. Things you should really worry about: • machine learning adblockers, • deep learning ELT, • GDPR, CRM (yes, CRM).
  • 8.
  • 12. Look at the *** data
  • 14. Usual suspect: NPS • one, simple number you can squint at, • sampling is skewed, • answer is unsure, • easy to hack step func:on1 , MONKEYPATCH: look at the change of the distro. 1 Eve Rajca aka @EveTheAnalyst and Jacques Ma9heij aka @jma9heij
  • 16. Hero of the day Mar$n Loetzsch @mar$n_loetzsch -=- KPIs for e-commerce startups Data Science in Early Stage Startups: the Struggle to Create Value https://github.com/mara
  • 17.
  • 19. Half of the *me when companies say they need "AI" what they really need is a SELECT clause with GROUP BY. You're welcome. — Mat Velloso @matvelloso (Technical Advisor to CTO at Microso9)
  • 20. Don't do A/B tests 99% it will not worth doing it
  • 21. ... conversion rate is 2% ... detec0ng a rela0ve change of 1% requires an experiment with 12 million users ... — Simon Jackson (Booking.com)
  • 23. Usual suspects • Non-reproducable experiments and tests. • R hodpepodge in produc9on. • Beliefs hidden as implicits in models.
  • 24.
  • 26. You don't have (enough) data.
  • 27. Make your own data points!
  • 29. Deep learn my *** Do you really need it? Tensorflow! ... ... so distributed deep learning can compress porn on the end device.
  • 30. Hero of the day Szilard [Deeper than Deep Learning] @DataScienceLA -=- Be#er than Deep Learning: Gradient Boos4ng Machines (GBMs) https://github.com/ szilard/benchm-ml
  • 31. Spark MLlibs GBM implementa3on is 10x slower, uses 10x more memory and is buggy/ lower accuracy. Total fucking garbage! — Szilard [Deeper than Deep Learning] @DataScienceLA
  • 32.
  • 33.
  • 35. Q: Why are there so many programmers from Eastern Europe? A: Slavic pessimism. Everything that can go wrong will go wrong. With such a mindset programming comes naturally. — Mar&n Sustrik @sustrik (Creator of ZeroMQ, nanomsg, libdill.)
  • 36.
  • 38. you get an other machine if you can use one
  • 40. Get cloud agnos.c! • AWS s'll leads the pack by far • Azure will sell anyway, and all will cry, • Google competes with the cheap and uncooked
  • 41. ETL is #solved OMG • Airflow is an overengineered underperforming nightmare, • metl for source mappings in magnitude, • Mara for generic e-commerce, • night-shift for explicit minimalism.
  • 43. Hero of the day Mark Litwintschik @marklit82 Summary of the 1.1 Billion Taxi Rides Benchmarks (500 GB uncompressed CSV) https:// tech.marksblogg.com
  • 44. Spark Setup Query Median QM per vCPU Cost/hour 11 x m3.xlarge + HDFS 14,91 0,34 27,5 1 x i3.8xlarge + HDFS 26,00 0,81 2,5 21 x m3.xlarge + HDFS 32,00 0,38 5,67 5 x m3.xlarge + S3 466,50 23,33 1,35 3 x Raspberry Pi 1738,00 144,83 HDFS. RPi = 1/6 VCPU ~100 EUR. Linear scaling.
  • 45. Presto Setup Query Median QM per vCPU Cost/hour 50 x n1-standard-4 7,00 0,04 9.50 21 x m3.xlarge 11,50 0,14 5.67 10 x n1-standard-4 16,00 0,36 2.09 1 x i3.8xlarge + HDFS 15,00 0,47 2.50 5 x m3.xlarge + HDFS 51,50 0,26 1.35 50 x m3.xlarge + S3 43,50 0,22 13.50 Workhorse in favour. HDFS. 1 machine. Non-linear scaling.
  • 46. Lazy Evalua*on Setup Query Median Cost/hour Redshi', 6 x ds2.8xlarge 1,91 40.80 BigQuery 2,00 Amazon Athena 6,30 Presto, 50 x n1-standard-4 7,00 9.50 Spark, 11 x m3.xlarge + HDFS 14,91 27.50 The human cost -- in both terms.
  • 47. One Machine Setup Query Median QM per vCPU Cost/hour ClickHouse 4,21 1,05 Elas3csearch tuned 13,14 3,29 Presto, 1 x i3.8xlarge + HDFS 15,00 0.47 2.50 Spark, 1 x i3.8xlarge + HDFS 26,00 0,81 2.50 Ver3ca 32,80 8,20 Elas3csearch 48,89 12,22 PSQL 9.5 + cstore_fdw 205,00 51,25 Intel Core i5 4670K VS i3.8xlarge (32 VCPUs). Desktop example costs <600 EUR.
  • 49. Do you use Google Analy+cs?
  • 50. 9%of the events are lost to ~all third party trackers due to adblocking.
  • 51. Sink > Sieve > Sort ELT aka SQL on flat files with the minimum amount of code wri:en.
  • 52.
  • 54. Who are you? • Lip service provider. • Fake news producer. • Kingmaker. Are you the fool or the grey eminent?
  • 55. Don't believe the hype. HR: good people leave.
  • 57. Will this ever get be-er? • adblocking, • CPA silver bullets are gone, • conversion & a8ribu9on are hard nuts, • FB and GO are not your friends (the 900% on videos), • but CRM is.
  • 58. GDPR • road to hell is paved with good inten2ons, • it's about the process, matey, • mostly fair, • yes, you have to clean up your mess, • dunno, wouldn't buy programma2c shares1 . 1 Eve Rajca aka @EveTheAnalyst and Jacques Ma9heij aka @jma9heij
  • 59. Thank you! @soobrosa We're hiring! visuals: @mroga., @xkcd, @DorsaAmir, ˙Cаvin 〄, thelearningcurvedotca, JD Hancock, Thomas Hawk, jonolist, Kalexanderson, Shopify Burst