SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
From Science to Data
Dr. Anestis Fachantidis
From Science to Data
Following a principled path
to Data Science
Dr. Anestis Fachantidis
CEO & Chief Data Scientist
Medoid AI
Anestis Fachantidis – From Science to Data 2
A Data Scientist's Ikigai
Anestis Fachantidis – From Science to Data 3
A Data Scientist's Ikigai
Anestis Fachantidis – From Science to Data 4
How to become
good at DS
What kind of
DS does the
world need?
What to love in
DS?
Which Industries
pay for DS now?
What you are good at
Become good at some of the tools used in real projects
Anestis Fachantidis – From Science to Data 5
Become good at
• Next slides:
• Reasons that the theoretical background is mandatory
• No one will ever master all these skills ! This is a maximal set, we
are just looking for your entry point as a beginner.
• This could be the pair of your weakest skill with the strongest one,
one to make you a better DS and one to give you early
gradification.
• The field is leaning more and more towards specialization.
Theoretical Background Web apps & services for DS
Accessing Data Sources Cluster Computation
Data Preparation Git & Documentation
Learning Models Business Understanding
Data Visualization & Reporting Team Work & Project Management
Anestis Fachantidis – From Science to Data 6
Theoretical Background
Minimum set of keywords for beginners
Anestis Fachantidis – From Science to Data 7
*External .png file available
Accessing Data
• Reading Data 101: read .csv and excel files
(read.xlsx)
• Connect to a database
• Set up MySQL or SQL Server Express locally
and download a sample DB (e.g., Employers
DB)
• Connect and read through R using
frameworks like DBI and dbplyr
• You will still need basic knowledge of SQL!
Anestis Fachantidis – From Science to Data 8
Accessing Data
dbplyr Example:
con <- DBI::dbConnect(RSQLite::SQLite())
flights <- tbl(con,”flights”)
flights %>%
select(distance, air_time) %>%
mutate(speed = distance / (air_time / 60))%>%
show_query()
#> <SQL>
#> SELECT `distance`, `air_time`, `distance` /
(`air_time` / 60.0) AS `speed`
#> FROM (SELECT `distance`, `air_time`
#> FROM `nycflights13::flights`)
Anestis Fachantidis – From Science to Data 9
Data Preparation
• Data preparation takes 60% to 80% of the whole
analytical pipeline in a real DS project
• In R, frameworks like dplyr and data.table make data
handling and processing easier
• Numerous packages and libraries for data cleaning
• Learning magrittr pipelines will make your code
readable and will clear up the data manipulation
process
• Understanding feature extraction, feature selection
and their interconnection with the business context at
hand.
Anestis Fachantidis – From Science to Data 10
Learning Models
• Frameworks like scikit-learn (Python) and
caret (R) will make your first ML
experimentation steps much easier.
• They provide a standardized interface to
training, testing and hyper-parameter tuning.
• Try them on a Kaggle dataset!
Anestis Fachantidis – From Science to Data 11
GIT & Documentation
GIT, the most popular version control system today. Among other
reasons you need that in DS too:
• Results are paired with {parameters, features selected, code}
which comprise an (almost) deterministic state. Capture that state
for all your results!
• Make a Bitbucket, GitLab or GitHub account to:
– Create a small DS “portfolio” of personal projects
– Collaborate on open source projects
• Comment your code and always consider packaging for reusability
• Comment your objects:
comment(object) <- “…”
Don’t name their files like:
Results23-5withoutsumofmoney2.rds
Anestis Fachantidis – From Science to Data 12
Data Visualization & Reporting
• Significant DS task on their own, the most powerful
communication tools of a DS
• Learn at least one plotting “language”
• In R, the most well known plotting grammar is that of
ggplot
• For interactive charts you can also use platforms like
plot.ly and rCharts
• The fully reproducible paradigm of a compiled report:
“Code and text in one document side by side”
• Learn tools like rMarkdown (R) and Jupiter (Python)
which use an easy markdown syntax
Anestis Fachantidis – From Science to Data 13
Data Visualization & Reporting
Anestis Fachantidis – From Science to Data 14
Web Apps & Services for DS
Analytics APIs and ML web services:
• For small teams a Platform-as-a-service solution
like Heroku makes it easy to deploy a data
product
• Basic understanding of the HTTP protocol and
data formats such as JSON
Data Products as Web Apps:
• Web app frameworks like Shiny (R) or Django
(Python) can help deliver analytics or ML results
having limited knowledge of web development
Anestis Fachantidis – From Science to Data 15
Web Apps & Services for DS
Anestis Fachantidis – From Science to Data 16
Cluster Computation
• SparkR and Sparklyr will let you easily setup a
distributed computation cluster in R
• Access to MLlib library
• Package H2O for access to H20 open source
engine for analytics and ML
• Set them up locally and try them on sample
data!
Anestis Fachantidis – From Science to Data 17
Business Understanding
• More general: Domain understanding
• Requirements Analysis = following a
structured analytical process +
communication skills
• Beyond understanding the meaning of each
business variable, understanding:
– If the variable is directly controlled or not
– How does this influences other variables
Anestis Fachantidis – From Science to Data 18
Team Work & Project Management
• Learn about some software development
processes like Scrum and Kanban
• Differences to Software Development
processes
• Data mining process model: CRISP-DM
• Dedicated/Sophisticated platforms like
Atlassian JIRA
• For a small team or a small DS project a Trello
board is more than enough!
Anestis Fachantidis – From Science to Data 19
What the world needs
It should not be only about “performance” and revenue streams
Anestis Fachantidis – From Science to Data 20
Insights
• Open Insights
– People need to make sense of data
– Think of any NGO organization you love its purpose
or even the shop around the corner you just love its
products:
• Wouldn't you want to give them relevant insights about
their business environment?
• Insights that will make them act accordingly and become
sustainable
– Data Scientists also have the responsibility to
educate people on the interpretation of results and
on how they could identify bad data journalism
Anestis Fachantidis – From Science to Data 21
…Openness…
Open Source
• The open source communities need support and people that care
about their projects too
– Don’t just use them, think of ways to contribute
• Write your own R package or Python library
– Share it on a Git web platform and it might be the next big thing in
open source!
Open Data, which are of critical value for the following reasons:
– Transparency and democratic control
– Improved or new private products and services
– Improved efficiency and effectiveness of government services
– New knowledge from combined data sources and patterns in large
data volumes
…which all need a Data Scientist!
Anestis Fachantidis – From Science to Data 22
Business
Technology
Computational
Methods &
Algorithms
Maths & Pure
Sciences
Innovation
Known
Anestis Fachantidis – From Science to Data 23
Unknown Unknown
Business
Technology
Computational
Methods &
Algorithms
Maths & Pure
Sciences
The reverse pyramid
of Technology Innovation
Known
“The vast amount
of products is
based on just a
handful of
theorems”
Anestis Fachantidis – From Science to Data 24
Unknown Unknown
Business
Technology
Computational
Methods &
Algorithms
Maths & Pure
Sciences
The reverse pyramid
of Technology Innovation
Unknown UnknownKnown
“The vast amount
of products is
based on just a
handful of
theorems”
Anestis Fachantidis – From Science to Data 25
An innovation
just happened
here !
Business
Technology
Computational
Methods &
Algorithms
Maths & Pure
Sciences
The reverse pyramid
of Technology Innovation
Known
Innovation
Opportunity
Area
Innovation
propagation
time
“The vast amount
of products is
based on just a
handful of
theorems”
Anestis Fachantidis – From Science to Data 26
Innovation
“angle” -
magnitude
Unknown Unknown
Business
Technology
Computational
Methods &
Algorithms
Maths & Pure
Sciences
The reverse pyramid
of Technology Innovation
Known
Innovation
Opportunity
Area
“The vast amount
of products is
based on just a
handful of
theorems”
Anestis Fachantidis – From Science to Data 27
Innovation
“angle” -
magnitude
Innovation
propagation
time
Unknown Unknown
Another
Innovation!
Business
Technology
Computational
Methods &
Algorithms
Maths & Pure
Sciences
The reverse pyramid
of Technology Innovation
Known
Innovation
Opportunity
Area
“The vast amount
of products is
based on just a
handful of
theorems”
Anestis Fachantidis – From Science to Data 28
Innovation
“angle” -
magnitude
Innovation
propagation
time
Unknown Unknown
Business
Technology
Computational
Methods &
Algorithms
Maths & Pure
Sciences
The reverse pyramid
of Technology Innovation
Known
Innovation
Opportunity
Area
“The vast amount
of products is
based on just a
handful of
theorems”
“A DS can act
as innovation
facilitator”
Anestis Fachantidis – From Science to Data 29
Innovation
“angle” -
magnitude
Innovation
propagation
time
Unknown Unknown
What you can be paid for
Get hired in industries that have already adopted DS and ML
Anestis Fachantidis – From Science to Data 30
Hiring, per Indusry
Anestis Fachantidis – From Science to Data 31
KDnuggets survey 2019: 1,001 data scientist’s LinkedIn
profiles
Skills/Industry Comparison
Anestis Fachantidis – From Science to Data 32
DS Importance, per Industry
Anestis Fachantidis – From Science to Data 33
Technology/software
Anestis Fachantidis – From Science to Data 34
source: O’Reilly Media - spring 2017 - 875 respondents
Financial Services
Anestis Fachantidis – From Science to Data 35
source: O’Reilly Media - spring 2017 - 875 respondents
What you love
A reason to wake up happily in the morning
Anestis Fachantidis – From Science to Data 36
“Love at first result”
You will probably love DS when:
• You produce your first insight on actual business
or real-life data
• You see a learning process reducing its error on
test data
• Your result dazzled a business stakeholder and…
…actually made him/her take a decision…
…and a successful one, as measured later
based on some KPI.
Anestis Fachantidis – From Science to Data 37
The Hype Tree
Trunk
=
Solid
understanding of
the underlying
technology
Anestis Fachantidis – From Science to Data 38
≔ 𝐬𝐢𝐧
𝟏 𝟎
𝟎 𝟏
' ( 𝒆 𝒙
Thank You!
• Follow us on LinkedIn!
• Contact Details and Talk Notes in:
j.mp/fromsciencetodata
Anestis Fachantidis – From Science to Data 39
From Science to Path of Data Scientist

Weitere ähnliche Inhalte

Was ist angesagt?

State of the State: What’s Happening in the Database Market?
State of the State: What’s Happening in the Database Market?State of the State: What’s Happening in the Database Market?
State of the State: What’s Happening in the Database Market?Neo4j
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products Dataiku
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4jNeo4j
 
The AI Mindset: Bridging Industry and Academic Perspectives
The AI Mindset: Bridging Industry and Academic PerspectivesThe AI Mindset: Bridging Industry and Academic Perspectives
The AI Mindset: Bridging Industry and Academic PerspectivesSnapLogic
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
Uwe Seiler, Data Architect and Trainer at codecentric AG - "Hadoop & Germany ...
Uwe Seiler, Data Architect and Trainer at codecentric AG - "Hadoop & Germany ...Uwe Seiler, Data Architect and Trainer at codecentric AG - "Hadoop & Germany ...
Uwe Seiler, Data Architect and Trainer at codecentric AG - "Hadoop & Germany ...Dataconomy Media
 
AI-SDV 2021 - Tony Trippe - The Current State of Machine Learning for Patent ...
AI-SDV 2021 - Tony Trippe - The Current State of Machine Learning for Patent ...AI-SDV 2021 - Tony Trippe - The Current State of Machine Learning for Patent ...
AI-SDV 2021 - Tony Trippe - The Current State of Machine Learning for Patent ...Dr. Haxel Consult
 
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION Elvis Muyanja
 
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...Dr. Haxel Consult
 
Big Data: Big Issues for IP
Big Data: Big Issues for IPBig Data: Big Issues for IP
Big Data: Big Issues for IPDr. Haxel Consult
 
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data IntroductionTiago Knoch
 
Big Data Landscape 2018
Big Data Landscape 2018Big Data Landscape 2018
Big Data Landscape 2018Leanne Hwee
 
How I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataHow I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataDomino Data Lab
 
A Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain OptimizationA Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain OptimizationNeo4j
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseVaticle
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceSrishti44
 

Was ist angesagt? (20)

Data Activities in Austria
Data Activities in AustriaData Activities in Austria
Data Activities in Austria
 
State of the State: What’s Happening in the Database Market?
State of the State: What’s Happening in the Database Market?State of the State: What’s Happening in the Database Market?
State of the State: What’s Happening in the Database Market?
 
Big data analysis
Big data analysisBig data analysis
Big data analysis
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
 
The AI Mindset: Bridging Industry and Academic Perspectives
The AI Mindset: Bridging Industry and Academic PerspectivesThe AI Mindset: Bridging Industry and Academic Perspectives
The AI Mindset: Bridging Industry and Academic Perspectives
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Uwe Seiler, Data Architect and Trainer at codecentric AG - "Hadoop & Germany ...
Uwe Seiler, Data Architect and Trainer at codecentric AG - "Hadoop & Germany ...Uwe Seiler, Data Architect and Trainer at codecentric AG - "Hadoop & Germany ...
Uwe Seiler, Data Architect and Trainer at codecentric AG - "Hadoop & Germany ...
 
AI-SDV 2021 - Tony Trippe - The Current State of Machine Learning for Patent ...
AI-SDV 2021 - Tony Trippe - The Current State of Machine Learning for Patent ...AI-SDV 2021 - Tony Trippe - The Current State of Machine Learning for Patent ...
AI-SDV 2021 - Tony Trippe - The Current State of Machine Learning for Patent ...
 
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
 
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
 
Big Data
Big DataBig Data
Big Data
 
Big Data: Big Issues for IP
Big Data: Big Issues for IPBig Data: Big Issues for IP
Big Data: Big Issues for IP
 
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
 
Big Data Landscape 2018
Big Data Landscape 2018Big Data Landscape 2018
Big Data Landscape 2018
 
How I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataHow I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked Data
 
A Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain OptimizationA Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain Optimization
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 

Ähnlich wie From Science to Path of Data Scientist

Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachMihai Criveti
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceJuuso Parkkinen
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016iECARUS
 
The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)BalĂĄzs KĂŠgl
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) SkillsOscar Corcho
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
The Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data ScienceThe Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data ScienceBalĂĄzs KĂŠgl
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Tomasz Bednarz
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceAnnie Flippo
 
Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Betacowork
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadVamsiNihal
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabadsaitejavella
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training HyderabadNithinsunil1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabadVamsiNihal
 

Ähnlich wie From Science to Path of Data Scientist (20)

Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016
 
The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
The Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data ScienceThe Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data Science
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data Science
 
Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 

Mehr von Institute of Contemporary Sciences

Building valuable (online and offline) Data Science communities - Experience ...
Building valuable (online and offline) Data Science communities - Experience ...Building valuable (online and offline) Data Science communities - Experience ...
Building valuable (online and offline) Data Science communities - Experience ...Institute of Contemporary Sciences
 
Data Science Master 4.0 on Belgrade University - Drazen Draskovic
Data Science Master 4.0 on Belgrade University - Drazen DraskovicData Science Master 4.0 on Belgrade University - Drazen Draskovic
Data Science Master 4.0 on Belgrade University - Drazen DraskovicInstitute of Contemporary Sciences
 
Deep learning fast and slow, a responsible and explainable AI framework - Ahm...
Deep learning fast and slow, a responsible and explainable AI framework - Ahm...Deep learning fast and slow, a responsible and explainable AI framework - Ahm...
Deep learning fast and slow, a responsible and explainable AI framework - Ahm...Institute of Contemporary Sciences
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena PekezInstitute of Contemporary Sciences
 
Application of Business Intelligence in bank risk management - Dimitar Dilov
Application of Business Intelligence in bank risk management - Dimitar DilovApplication of Business Intelligence in bank risk management - Dimitar Dilov
Application of Business Intelligence in bank risk management - Dimitar DilovInstitute of Contemporary Sciences
 
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...Institute of Contemporary Sciences
 
Recommender systems for personalized financial advice from concept to product...
Recommender systems for personalized financial advice from concept to product...Recommender systems for personalized financial advice from concept to product...
Recommender systems for personalized financial advice from concept to product...Institute of Contemporary Sciences
 
Advanced tools in real time analytics and AI in customer support - Milan Sima...
Advanced tools in real time analytics and AI in customer support - Milan Sima...Advanced tools in real time analytics and AI in customer support - Milan Sima...
Advanced tools in real time analytics and AI in customer support - Milan Sima...Institute of Contemporary Sciences
 
Complex AI forecasting methods for investments portfolio optimization - Pawel...
Complex AI forecasting methods for investments portfolio optimization - Pawel...Complex AI forecasting methods for investments portfolio optimization - Pawel...
Complex AI forecasting methods for investments portfolio optimization - Pawel...Institute of Contemporary Sciences
 
Data and data scientists are not equal to money david hoyle
Data and data scientists are not equal to money   david hoyleData and data scientists are not equal to money   david hoyle
Data and data scientists are not equal to money david hoyleInstitute of Contemporary Sciences
 
When it's raining gold, bring a bucket - Andjela Culibrk
When it's raining gold, bring a bucket - Andjela CulibrkWhen it's raining gold, bring a bucket - Andjela Culibrk
When it's raining gold, bring a bucket - Andjela CulibrkInstitute of Contemporary Sciences
 
Reality and traps of real time data engineering - Milos Solujic
Reality and traps of real time data engineering - Milos SolujicReality and traps of real time data engineering - Milos Solujic
Reality and traps of real time data engineering - Milos SolujicInstitute of Contemporary Sciences
 
Sensor networks for personalized health monitoring - Vladimir Brusic
Sensor networks for personalized health monitoring - Vladimir BrusicSensor networks for personalized health monitoring - Vladimir Brusic
Sensor networks for personalized health monitoring - Vladimir BrusicInstitute of Contemporary Sciences
 
Prediction of good patterns for future sales using image recognition
Prediction of good patterns for future sales using image recognitionPrediction of good patterns for future sales using image recognition
Prediction of good patterns for future sales using image recognitionInstitute of Contemporary Sciences
 
Using data to fight corruption: full budget transparency in local government
Using data to fight corruption: full budget transparency in local governmentUsing data to fight corruption: full budget transparency in local government
Using data to fight corruption: full budget transparency in local governmentInstitute of Contemporary Sciences
 

Mehr von Institute of Contemporary Sciences (20)

First 5 years of PSI:ML - Filip Panjevic
First 5 years of PSI:ML - Filip PanjevicFirst 5 years of PSI:ML - Filip Panjevic
First 5 years of PSI:ML - Filip Panjevic
 
Building valuable (online and offline) Data Science communities - Experience ...
Building valuable (online and offline) Data Science communities - Experience ...Building valuable (online and offline) Data Science communities - Experience ...
Building valuable (online and offline) Data Science communities - Experience ...
 
Data Science Master 4.0 on Belgrade University - Drazen Draskovic
Data Science Master 4.0 on Belgrade University - Drazen DraskovicData Science Master 4.0 on Belgrade University - Drazen Draskovic
Data Science Master 4.0 on Belgrade University - Drazen Draskovic
 
Deep learning fast and slow, a responsible and explainable AI framework - Ahm...
Deep learning fast and slow, a responsible and explainable AI framework - Ahm...Deep learning fast and slow, a responsible and explainable AI framework - Ahm...
Deep learning fast and slow, a responsible and explainable AI framework - Ahm...
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena Pekez
 
Application of Business Intelligence in bank risk management - Dimitar Dilov
Application of Business Intelligence in bank risk management - Dimitar DilovApplication of Business Intelligence in bank risk management - Dimitar Dilov
Application of Business Intelligence in bank risk management - Dimitar Dilov
 
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
 
Recommender systems for personalized financial advice from concept to product...
Recommender systems for personalized financial advice from concept to product...Recommender systems for personalized financial advice from concept to product...
Recommender systems for personalized financial advice from concept to product...
 
Advanced tools in real time analytics and AI in customer support - Milan Sima...
Advanced tools in real time analytics and AI in customer support - Milan Sima...Advanced tools in real time analytics and AI in customer support - Milan Sima...
Advanced tools in real time analytics and AI in customer support - Milan Sima...
 
Complex AI forecasting methods for investments portfolio optimization - Pawel...
Complex AI forecasting methods for investments portfolio optimization - Pawel...Complex AI forecasting methods for investments portfolio optimization - Pawel...
Complex AI forecasting methods for investments portfolio optimization - Pawel...
 
From Zero to ML Hero for Underdogs - Amir Tabakovic
From Zero to ML Hero for Underdogs  - Amir TabakovicFrom Zero to ML Hero for Underdogs  - Amir Tabakovic
From Zero to ML Hero for Underdogs - Amir Tabakovic
 
Data and data scientists are not equal to money david hoyle
Data and data scientists are not equal to money   david hoyleData and data scientists are not equal to money   david hoyle
Data and data scientists are not equal to money david hoyle
 
The price is right - Tomislav Krizan
The price is right - Tomislav KrizanThe price is right - Tomislav Krizan
The price is right - Tomislav Krizan
 
When it's raining gold, bring a bucket - Andjela Culibrk
When it's raining gold, bring a bucket - Andjela CulibrkWhen it's raining gold, bring a bucket - Andjela Culibrk
When it's raining gold, bring a bucket - Andjela Culibrk
 
Reality and traps of real time data engineering - Milos Solujic
Reality and traps of real time data engineering - Milos SolujicReality and traps of real time data engineering - Milos Solujic
Reality and traps of real time data engineering - Milos Solujic
 
Sensor networks for personalized health monitoring - Vladimir Brusic
Sensor networks for personalized health monitoring - Vladimir BrusicSensor networks for personalized health monitoring - Vladimir Brusic
Sensor networks for personalized health monitoring - Vladimir Brusic
 
Improving Data Quality with Product Similarity Search
Improving Data Quality with Product Similarity SearchImproving Data Quality with Product Similarity Search
Improving Data Quality with Product Similarity Search
 
Prediction of good patterns for future sales using image recognition
Prediction of good patterns for future sales using image recognitionPrediction of good patterns for future sales using image recognition
Prediction of good patterns for future sales using image recognition
 
Using data to fight corruption: full budget transparency in local government
Using data to fight corruption: full budget transparency in local governmentUsing data to fight corruption: full budget transparency in local government
Using data to fight corruption: full budget transparency in local government
 
Geospatial Analysis and Open Data - Forest and Climate
Geospatial Analysis and Open Data - Forest and ClimateGeospatial Analysis and Open Data - Forest and Climate
Geospatial Analysis and Open Data - Forest and Climate
 

KĂźrzlich hochgeladen

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra
 

KĂźrzlich hochgeladen (20)

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 

From Science to Path of Data Scientist

  • 1. From Science to Data Dr. Anestis Fachantidis
  • 2. From Science to Data Following a principled path to Data Science Dr. Anestis Fachantidis CEO & Chief Data Scientist Medoid AI Anestis Fachantidis – From Science to Data 2
  • 3. A Data Scientist's Ikigai Anestis Fachantidis – From Science to Data 3
  • 4. A Data Scientist's Ikigai Anestis Fachantidis – From Science to Data 4 How to become good at DS What kind of DS does the world need? What to love in DS? Which Industries pay for DS now?
  • 5. What you are good at Become good at some of the tools used in real projects Anestis Fachantidis – From Science to Data 5
  • 6. Become good at • Next slides: • Reasons that the theoretical background is mandatory • No one will ever master all these skills ! This is a maximal set, we are just looking for your entry point as a beginner. • This could be the pair of your weakest skill with the strongest one, one to make you a better DS and one to give you early gradification. • The field is leaning more and more towards specialization. Theoretical Background Web apps & services for DS Accessing Data Sources Cluster Computation Data Preparation Git & Documentation Learning Models Business Understanding Data Visualization & Reporting Team Work & Project Management Anestis Fachantidis – From Science to Data 6
  • 7. Theoretical Background Minimum set of keywords for beginners Anestis Fachantidis – From Science to Data 7 *External .png file available
  • 8. Accessing Data • Reading Data 101: read .csv and excel files (read.xlsx) • Connect to a database • Set up MySQL or SQL Server Express locally and download a sample DB (e.g., Employers DB) • Connect and read through R using frameworks like DBI and dbplyr • You will still need basic knowledge of SQL! Anestis Fachantidis – From Science to Data 8
  • 9. Accessing Data dbplyr Example: con <- DBI::dbConnect(RSQLite::SQLite()) flights <- tbl(con,”flights”) flights %>% select(distance, air_time) %>% mutate(speed = distance / (air_time / 60))%>% show_query() #> <SQL> #> SELECT `distance`, `air_time`, `distance` / (`air_time` / 60.0) AS `speed` #> FROM (SELECT `distance`, `air_time` #> FROM `nycflights13::flights`) Anestis Fachantidis – From Science to Data 9
  • 10. Data Preparation • Data preparation takes 60% to 80% of the whole analytical pipeline in a real DS project • In R, frameworks like dplyr and data.table make data handling and processing easier • Numerous packages and libraries for data cleaning • Learning magrittr pipelines will make your code readable and will clear up the data manipulation process • Understanding feature extraction, feature selection and their interconnection with the business context at hand. Anestis Fachantidis – From Science to Data 10
  • 11. Learning Models • Frameworks like scikit-learn (Python) and caret (R) will make your first ML experimentation steps much easier. • They provide a standardized interface to training, testing and hyper-parameter tuning. • Try them on a Kaggle dataset! Anestis Fachantidis – From Science to Data 11
  • 12. GIT & Documentation GIT, the most popular version control system today. Among other reasons you need that in DS too: • Results are paired with {parameters, features selected, code} which comprise an (almost) deterministic state. Capture that state for all your results! • Make a Bitbucket, GitLab or GitHub account to: – Create a small DS “portfolio” of personal projects – Collaborate on open source projects • Comment your code and always consider packaging for reusability • Comment your objects: comment(object) <- “…” Don’t name their files like: Results23-5withoutsumofmoney2.rds Anestis Fachantidis – From Science to Data 12
  • 13. Data Visualization & Reporting • Significant DS task on their own, the most powerful communication tools of a DS • Learn at least one plotting “language” • In R, the most well known plotting grammar is that of ggplot • For interactive charts you can also use platforms like plot.ly and rCharts • The fully reproducible paradigm of a compiled report: “Code and text in one document side by side” • Learn tools like rMarkdown (R) and Jupiter (Python) which use an easy markdown syntax Anestis Fachantidis – From Science to Data 13
  • 14. Data Visualization & Reporting Anestis Fachantidis – From Science to Data 14
  • 15. Web Apps & Services for DS Analytics APIs and ML web services: • For small teams a Platform-as-a-service solution like Heroku makes it easy to deploy a data product • Basic understanding of the HTTP protocol and data formats such as JSON Data Products as Web Apps: • Web app frameworks like Shiny (R) or Django (Python) can help deliver analytics or ML results having limited knowledge of web development Anestis Fachantidis – From Science to Data 15
  • 16. Web Apps & Services for DS Anestis Fachantidis – From Science to Data 16
  • 17. Cluster Computation • SparkR and Sparklyr will let you easily setup a distributed computation cluster in R • Access to MLlib library • Package H2O for access to H20 open source engine for analytics and ML • Set them up locally and try them on sample data! Anestis Fachantidis – From Science to Data 17
  • 18. Business Understanding • More general: Domain understanding • Requirements Analysis = following a structured analytical process + communication skills • Beyond understanding the meaning of each business variable, understanding: – If the variable is directly controlled or not – How does this influences other variables Anestis Fachantidis – From Science to Data 18
  • 19. Team Work & Project Management • Learn about some software development processes like Scrum and Kanban • Differences to Software Development processes • Data mining process model: CRISP-DM • Dedicated/Sophisticated platforms like Atlassian JIRA • For a small team or a small DS project a Trello board is more than enough! Anestis Fachantidis – From Science to Data 19
  • 20. What the world needs It should not be only about “performance” and revenue streams Anestis Fachantidis – From Science to Data 20
  • 21. Insights • Open Insights – People need to make sense of data – Think of any NGO organization you love its purpose or even the shop around the corner you just love its products: • Wouldn't you want to give them relevant insights about their business environment? • Insights that will make them act accordingly and become sustainable – Data Scientists also have the responsibility to educate people on the interpretation of results and on how they could identify bad data journalism Anestis Fachantidis – From Science to Data 21
  • 22. …Openness… Open Source • The open source communities need support and people that care about their projects too – Don’t just use them, think of ways to contribute • Write your own R package or Python library – Share it on a Git web platform and it might be the next big thing in open source! Open Data, which are of critical value for the following reasons: – Transparency and democratic control – Improved or new private products and services – Improved efficiency and effectiveness of government services – New knowledge from combined data sources and patterns in large data volumes …which all need a Data Scientist! Anestis Fachantidis – From Science to Data 22
  • 23. Business Technology Computational Methods & Algorithms Maths & Pure Sciences Innovation Known Anestis Fachantidis – From Science to Data 23 Unknown Unknown
  • 24. Business Technology Computational Methods & Algorithms Maths & Pure Sciences The reverse pyramid of Technology Innovation Known “The vast amount of products is based on just a handful of theorems” Anestis Fachantidis – From Science to Data 24 Unknown Unknown
  • 25. Business Technology Computational Methods & Algorithms Maths & Pure Sciences The reverse pyramid of Technology Innovation Unknown UnknownKnown “The vast amount of products is based on just a handful of theorems” Anestis Fachantidis – From Science to Data 25 An innovation just happened here !
  • 26. Business Technology Computational Methods & Algorithms Maths & Pure Sciences The reverse pyramid of Technology Innovation Known Innovation Opportunity Area Innovation propagation time “The vast amount of products is based on just a handful of theorems” Anestis Fachantidis – From Science to Data 26 Innovation “angle” - magnitude Unknown Unknown
  • 27. Business Technology Computational Methods & Algorithms Maths & Pure Sciences The reverse pyramid of Technology Innovation Known Innovation Opportunity Area “The vast amount of products is based on just a handful of theorems” Anestis Fachantidis – From Science to Data 27 Innovation “angle” - magnitude Innovation propagation time Unknown Unknown Another Innovation!
  • 28. Business Technology Computational Methods & Algorithms Maths & Pure Sciences The reverse pyramid of Technology Innovation Known Innovation Opportunity Area “The vast amount of products is based on just a handful of theorems” Anestis Fachantidis – From Science to Data 28 Innovation “angle” - magnitude Innovation propagation time Unknown Unknown
  • 29. Business Technology Computational Methods & Algorithms Maths & Pure Sciences The reverse pyramid of Technology Innovation Known Innovation Opportunity Area “The vast amount of products is based on just a handful of theorems” “A DS can act as innovation facilitator” Anestis Fachantidis – From Science to Data 29 Innovation “angle” - magnitude Innovation propagation time Unknown Unknown
  • 30. What you can be paid for Get hired in industries that have already adopted DS and ML Anestis Fachantidis – From Science to Data 30
  • 31. Hiring, per Indusry Anestis Fachantidis – From Science to Data 31 KDnuggets survey 2019: 1,001 data scientist’s LinkedIn profiles
  • 32. Skills/Industry Comparison Anestis Fachantidis – From Science to Data 32
  • 33. DS Importance, per Industry Anestis Fachantidis – From Science to Data 33
  • 34. Technology/software Anestis Fachantidis – From Science to Data 34 source: O’Reilly Media - spring 2017 - 875 respondents
  • 35. Financial Services Anestis Fachantidis – From Science to Data 35 source: O’Reilly Media - spring 2017 - 875 respondents
  • 36. What you love A reason to wake up happily in the morning Anestis Fachantidis – From Science to Data 36
  • 37. “Love at first result” You will probably love DS when: • You produce your first insight on actual business or real-life data • You see a learning process reducing its error on test data • Your result dazzled a business stakeholder and… …actually made him/her take a decision… …and a successful one, as measured later based on some KPI. Anestis Fachantidis – From Science to Data 37
  • 38. The Hype Tree Trunk = Solid understanding of the underlying technology Anestis Fachantidis – From Science to Data 38 ≔ 𝐬𝐢𝐧 𝟏 𝟎 𝟎 𝟏 ' ( 𝒆 𝒙
  • 39. Thank You! • Follow us on LinkedIn! • Contact Details and Talk Notes in: j.mp/fromsciencetodata Anestis Fachantidis – From Science to Data 39