SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
Data Science &
Culture
(Or how to stop worrying and love data driven culture)
Ícaro Medeiros
Data Science Forum
São Paulo, Jun 2017
Inspired by
(not limited to)
refs
Big Data
http://www.kdnuggets.com/2017/02/origins-big-data.html
✦ Fundamental blocks: evolutions on CS e.g.
distributed systems, databases, massive AI, etc

✦ Fuzzy concept, ill-defined

✦ Popularized by Gartner

(hype-fueled consulting firm)
✦ Big Data no longer considered an emerging
technology (pervasive in industry)

✦ Entered Trough of Disillusionment in 2013
https://knowledgeimmersion.wordpress.com/2016/06/22/disillusionment-of-big-data/
http://www.mikelnino.com/2016/03/chronology-big-data.html
Chronology of antecedents
Data science
✦ Statistics (late 19th century)

✦ Computer Science (1950s)

✦ Machine Learning (1950s)

✦ Data Mining (1990s)

✦ Data Science (2010s)
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
yet another hyped term
Beware: controversy
✦ Data science is not all-science
✴ It’s getting more and more engineering-like, a practice

✴ Data storytelling is a creative endeavor
✦ Hyper-inflated expectations, misunderstood
concepts and hurry to get value: a dangerous
recipe
A new hope
machine learning
big data
https://trends.google.com/trends/explore?date=today%2012-m&geo=US&q=machine%20learning,big%20data
or hype
Hype: not that bad
✦ Haters gonna hate i.e. don’t fully hate the hype

✴ more practitioners = faster tech and processes evolution
✴ Highly skilled professionals and innovation

✦ Academics sometimes look for difficult unwanted
problems

✴
industry is more pragmatic, specially in tech
https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science
What we need…
✦ Forget about Big Data pokémons

✴ OH so in Big Data we don’t need people to think schemas?

✦ Forget about misunderstood business expectations

✴ OH in deep learning we don’t need people to train models?

✦ You need PEOPLE

✴ Collaborating with shared values

✴ Awesome in tech but more importantly: CREATIVE
Shared values
and practices
Culture
Data Science and Culture
Good people
✦ People are more important than ideas

✴ A mediocre team will screw up a good idea

✴ Mediocre idea to great team: they will fix it or rethink it

✦ A good lab: different kinds of autonomous thinkers

✴ Why hire smart people if they can't fix what’s broken?

✦ Prefer a heterogeneous and complimentary team
instead of looking for unicorns
The mythical 10x professional
https://twitter.com/icaromedeiros/status/838968884023668737
Good communication
✦ Honesty, excellence, originality and self-
criticism (values)

✦ Communication structure <> organizational

✦ Be ready to hear the truth

✴ Sincerity is only valuable if people are open and willing to give
up on ideas that will not work

✦ Braintrust: Leave ego and Jobs outside the door
Power to the people!
✦ Product quality is everyone’s responsibility
✴ Don’t ask permission to take responsibility

✦ Passion and excellence versus autonomy

✦ Good things might shadow the bad

✴ People struggle to explore bad things to avoid being called
“complainers”
Rebels
http://qaspire.com/2017/05/19/sketchnote-what-rebels-want-from-their-boss/
Destroy data silos!
✦ Without information about data there is no science

✦ Software and data should be a collective property
within the company

✦ Knowledge management matter

✦ Communication between areas must be enforced
Data portals
✦ Self-service platforms to publish datasets

✴ Descriptions, schemas, samples, relations between datasets,
etc

✦ Open Data initiatives, mostly governments

✦ OSS platforms: CKAN, AirBNB’s Dataportal

✦ Examples: data.gov.uk, dados.gov.br, etc
“When it comes to creative
inspiration, job titles and
hierarchy are meaningless”
Data Science and Culture
Data storytelling
✦ Explain what numbers tell in layman, clear terms

✦ Make hidden premises clear

✴ Outside data insights

✦ Convince others about actions

✴ Decreases insights-to-value interval
✦ From data to knowledge
https://www.forbes.com/sites/brentdykes/2016/03/31/data-storytelling-the-essential-data-science-skill-everyone-needs
What is creativity
✦ Unexpected connections of concepts and ideas

✦ It's a marathon, it needs rhythm

✦ Creativity must start somewhere and there’s power
on healthy feedback in a iterative process
Visual communication
✦ Clean straightforward graphs > visually appealing

✴ Choose dataviz libs wisely

✦ “Don’t make me think”

✦ The right graph for the right audience

✴ Prefer a language everyone understands
Visual communication 101
Stats are not enough
https://www.autodeskresearch.com/publications/samestats
Stats are not enough
https://www.autodeskresearch.com/publications/samestats
Strateg a
Avoid egotrip data science
✦ “OH my cluster has 10 Petabytes, I’m awesome”

✦ Fancy ML algorithms are not the goal

✦ The most important V in Big Data is value
https://twitter.com/amyhoy/status/847097034536554497
KPI versus HiPPO
✦ Tech adoption per se is meaningless

✴ Slide-driven Big Data

✴ KPIs should grow from Big Data and data insights initatives

✦ Poor defined goals -> bad decisions

✦ Define viable but ambitious goals

✦ Data beats opinion
Set goal, plan and GO!
✦ Business questions can't be like “OH we want to
detect things related to millennials”

✦ Clear goals must be set, with actionable metrics

✦ Balance perfect models versus time-to-market

✦ Brad Bird: “Sometimes, as a director, you’re
guiding. Sometimes you’re letting the car drive”
https://hbr.org/2017/02/how-chief-data-officers-can-get-their-companies-to-collect-clean-data
The process
✦ The process is not the goal

✴ It has no agenda or taste, it’s just a tool

✦ Quality is the best business plan

✦ Agile is a mindset: not only kanbans or scrum

✦ If the model will become operational, mix scientists
and engineers from start
Build vs Buy
✦ If you buy and your core business is not techie, you can be
illiterate in tech
✴ Benchmark before buying

✴ Accelerate results and boost internal knowledge

✦ If you build and have a good-enough techie culture, you’re
more or less good to go

✴ Assess pros and cons consciously

✦ If you surf the tech hype AND build good systems you’re
awesome
https://twitter.com/Doug_Laney/status/847452219641356288
When data goes to vendors…
http://www.louisdorard.com/machine-learning-canvas/
DATA
ENGINEERING
Big Data vs Great Data
✦ If your logical models do not make sense

✦ Most performed queries are slow

✦ If you have string-only databases

✦ If you have unused expensive data

✦ Maybe your data lake is a swamp
“The data is a mess”
✦ First step: accelerate human understanding of data

✴ Metadata, context, hidden assumptions

✦ Datasets might serves multiple purposes

✴ Define rationale and context

✴ Data portals and understandable datasets > Dashboards
https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-science
https://medium.com/airbnb-engineering/democratizing-data-at-airbnb-852d76c51770
Data lost in translation
✦ Heterogeneous and siloed databases (and people)

✦ Rethink ESB (microservices network)

✦ State-of-the-art: data workflow

✴ Luigi, Airflow (open source), almost every big tech vendor

✴ Transparency, reusability, reproducibility, traceability

✴ Automation and monitoring all the way!
https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-science
Beyond relational models
✦ Not all data problems fits well in traditional SQL or
DW models

✴ Key-value, columnar, graph-based, inverted index, etc

✦ Models are a framework for problem-solving
✴ Not the ultimate answer

✴ There’s no one-size-fits-all model
Do not forget fluency
✦ Check the company lingua franca

✦ Make it easy for critical decision-makers

✴ Adhoc SQL queries?

✴ Dashboards?

✴ Reports?
EXPERIMENTATION
Experiments
✦ Missions to discover facts towards understanding

✴ They don’t fail, any result produces new information

✴ If the initial theory was wrong: good

✴ With new facts you can reformulate the question

✦ Get more modeling questions asked more often

✦ Iterative data science
Product experimentation (A/B)
✦ Product experimentation should be hypothesis-
driven (not feature-driven)

✦ Define the proper exposed population
✴ No new users, no heavy users only, no early adopters

✦ Understanding effect is essential
https://medium.com/airbnb-engineering/4-principles-for-making-experimentation-count-7a5f1a5268a
5 stages of A/B tests
https://www.linkedin.com/pulse/ab-testing-which-do-i-pick-sahar-heidari
Some other quick tips
✦ Focus on outcomes (not algorithms or methods)

✦ Design the right metric and evaluation
✦ Good experiments don't produce obvious insights

✦ Mix of data and intuition
https://twitter.com/mrdatascience/status/869957499662860288
Being data driven
✦ Be BAYESIAN - uncertainty is everywhere

✦ Be CURIOUS - keep learning
✦ Be AGILE - Fail fast, not too fast: evidence comes first
https://www.reaktor.com/blog/culture-eats-data-science-for-breakfast/
Being data driven
✦ Be TRUTHFUL - don’t torture data to please opinions

✦ Be HELPFUL - work across silos, support democracy
✦ Be WISE - know when to be analytical or intuitive
https://www.reaktor.com/blog/culture-eats-data-science-for-breakfast/
With the right people,
Democracy,
Creativity,
Strategy,
Big Great Data™
and Experiments
there's a good chance to do great
SCIENCE
Take-away message
Ícaro Medeiros
Data Scientist
icaromedeiros

Weitere ähnliche Inhalte

Was ist angesagt?

The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data ScienceEMC
 
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...ux singapore
 
Lessons Learned The Hard Way: 32+ Data Science Interviews
Lessons Learned The Hard Way: 32+ Data Science InterviewsLessons Learned The Hard Way: 32+ Data Science Interviews
Lessons Learned The Hard Way: 32+ Data Science InterviewsGregory Kamradt
 
Science in the context of journals, Open, and the future
Science in the context of journals, Open, and the futureScience in the context of journals, Open, and the future
Science in the context of journals, Open, and the futureBenjamin Laken
 
Less is More: Behind the Data at Risk I/O
Less is More: Behind the Data at Risk I/OLess is More: Behind the Data at Risk I/O
Less is More: Behind the Data at Risk I/OMichael Roytman
 
Data and Algorithmic Bias in the Web
Data and Algorithmic Bias in the WebData and Algorithmic Bias in the Web
Data and Algorithmic Bias in the WebWebVisions
 
Mental Health Informatics - What we can learn from the past and where we can be
Mental Health Informatics - What we can learn from the past and where we can beMental Health Informatics - What we can learn from the past and where we can be
Mental Health Informatics - What we can learn from the past and where we can beHimanshu Tyagi
 
The data science handbook pre release (1)
The data science handbook   pre release (1)The data science handbook   pre release (1)
The data science handbook pre release (1)Lakshmi Prasanna
 
A data view of the data science process
A data view of the data science processA data view of the data science process
A data view of the data science processMathieu d'Aquin
 
Solving the Wanamaker Problem for Healthcare (keynote file)
Solving the Wanamaker Problem for Healthcare (keynote file)Solving the Wanamaker Problem for Healthcare (keynote file)
Solving the Wanamaker Problem for Healthcare (keynote file)Tim O'Reilly
 
2015 data-science-salary-survey
2015 data-science-salary-survey2015 data-science-salary-survey
2015 data-science-salary-surveyAdam Rabinovitch
 
Trusting a Distributed Data Pipeline | Masters of Conversion
Trusting a Distributed Data Pipeline | Masters of ConversionTrusting a Distributed Data Pipeline | Masters of Conversion
Trusting a Distributed Data Pipeline | Masters of ConversionVWO
 

Was ist angesagt? (17)

The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data Science
 
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...
 
Lessons Learned The Hard Way: 32+ Data Science Interviews
Lessons Learned The Hard Way: 32+ Data Science InterviewsLessons Learned The Hard Way: 32+ Data Science Interviews
Lessons Learned The Hard Way: 32+ Data Science Interviews
 
Science in the context of journals, Open, and the future
Science in the context of journals, Open, and the futureScience in the context of journals, Open, and the future
Science in the context of journals, Open, and the future
 
Less is More: Behind the Data at Risk I/O
Less is More: Behind the Data at Risk I/OLess is More: Behind the Data at Risk I/O
Less is More: Behind the Data at Risk I/O
 
Data and Algorithmic Bias in the Web
Data and Algorithmic Bias in the WebData and Algorithmic Bias in the Web
Data and Algorithmic Bias in the Web
 
Mental Health Informatics - What we can learn from the past and where we can be
Mental Health Informatics - What we can learn from the past and where we can beMental Health Informatics - What we can learn from the past and where we can be
Mental Health Informatics - What we can learn from the past and where we can be
 
The data science handbook pre release (1)
The data science handbook   pre release (1)The data science handbook   pre release (1)
The data science handbook pre release (1)
 
A data view of the data science process
A data view of the data science processA data view of the data science process
A data view of the data science process
 
Big data to big understanding
Big data to big understandingBig data to big understanding
Big data to big understanding
 
Designing Data for Dignity StrataRx
Designing Data for Dignity StrataRxDesigning Data for Dignity StrataRx
Designing Data for Dignity StrataRx
 
Solving the Wanamaker Problem for Healthcare (keynote file)
Solving the Wanamaker Problem for Healthcare (keynote file)Solving the Wanamaker Problem for Healthcare (keynote file)
Solving the Wanamaker Problem for Healthcare (keynote file)
 
2015 data-science-salary-survey
2015 data-science-salary-survey2015 data-science-salary-survey
2015 data-science-salary-survey
 
Connect, communicate, collaborate
Connect, communicate, collaborateConnect, communicate, collaborate
Connect, communicate, collaborate
 
How Change Happens
How Change HappensHow Change Happens
How Change Happens
 
Small data big impact
Small data big impactSmall data big impact
Small data big impact
 
Trusting a Distributed Data Pipeline | Masters of Conversion
Trusting a Distributed Data Pipeline | Masters of ConversionTrusting a Distributed Data Pipeline | Masters of Conversion
Trusting a Distributed Data Pipeline | Masters of Conversion
 

Ähnlich wie Data Science and Culture

How AI is revolutionizing the world
How AI is revolutionizing the worldHow AI is revolutionizing the world
How AI is revolutionizing the worldSK Reddy
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressedBonnie Holub
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressedBonnie Holub
 
Big Data; Big Potential: How to find the talent who can harness its power
Big Data; Big Potential: How to find the talent who can harness its powerBig Data; Big Potential: How to find the talent who can harness its power
Big Data; Big Potential: How to find the talent who can harness its powerLucas Group
 
DevSecOps Through Blunt Force Trauma, I'm the Trauma
DevSecOps Through Blunt Force Trauma, I'm the TraumaDevSecOps Through Blunt Force Trauma, I'm the Trauma
DevSecOps Through Blunt Force Trauma, I'm the TraumaDevOpsDays DFW
 
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...DATAVERSITY
 
How to get on the AI journey?
How to get on the AI journey? How to get on the AI journey?
How to get on the AI journey? Aarthi Srinivasan
 
Data Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph DatabasesData Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph DatabasesDATAVERSITY
 
15 Ways to Stand Out on Your IT Team
15 Ways to Stand Out on Your IT Team15 Ways to Stand Out on Your IT Team
15 Ways to Stand Out on Your IT TeamAll Things Open
 
Maximizing Business Connections Through Social Media
Maximizing Business Connections Through Social MediaMaximizing Business Connections Through Social Media
Maximizing Business Connections Through Social Mediadrewblue
 
The New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and SemanticsThe New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and SemanticsJohn Blossom
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a professionJose Quesada
 
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...Kai Wähner
 
Why Your Data Management Strategy Isn't Working (and How to Fix It)
Why Your Data Management Strategy Isn't Working (and How to Fix It)Why Your Data Management Strategy Isn't Working (and How to Fix It)
Why Your Data Management Strategy Isn't Working (and How to Fix It)DATAVERSITY
 
DataEd Slides: The Seven Deadly Data Sins
DataEd Slides: The Seven Deadly Data SinsDataEd Slides: The Seven Deadly Data Sins
DataEd Slides: The Seven Deadly Data SinsDATAVERSITY
 
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Chris Dagdigian
 
Big data & data science challenges and opportunities
Big data & data science   challenges and opportunitiesBig data & data science   challenges and opportunities
Big data & data science challenges and opportunitiesJose Quesada
 
Data science market insights usa
Data science market insights usaData science market insights usa
Data science market insights usaKaitlin McAndrews
 

Ähnlich wie Data Science and Culture (20)

How AI is revolutionizing the world
How AI is revolutionizing the worldHow AI is revolutionizing the world
How AI is revolutionizing the world
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressed
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressed
 
Big Data; Big Potential: How to find the talent who can harness its power
Big Data; Big Potential: How to find the talent who can harness its powerBig Data; Big Potential: How to find the talent who can harness its power
Big Data; Big Potential: How to find the talent who can harness its power
 
DevSecOps Through Blunt Force Trauma, I'm the Trauma
DevSecOps Through Blunt Force Trauma, I'm the TraumaDevSecOps Through Blunt Force Trauma, I'm the Trauma
DevSecOps Through Blunt Force Trauma, I'm the Trauma
 
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
 
How to get on the AI journey?
How to get on the AI journey? How to get on the AI journey?
How to get on the AI journey?
 
Data Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph DatabasesData Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph Databases
 
Technical Communication, Marketing , Truth
Technical Communication, Marketing , TruthTechnical Communication, Marketing , Truth
Technical Communication, Marketing , Truth
 
15 Ways to Stand Out on Your IT Team
15 Ways to Stand Out on Your IT Team15 Ways to Stand Out on Your IT Team
15 Ways to Stand Out on Your IT Team
 
Maximizing Business Connections Through Social Media
Maximizing Business Connections Through Social MediaMaximizing Business Connections Through Social Media
Maximizing Business Connections Through Social Media
 
The New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and SemanticsThe New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a profession
 
Tf wdvds
Tf wdvdsTf wdvds
Tf wdvds
 
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
 
Why Your Data Management Strategy Isn't Working (and How to Fix It)
Why Your Data Management Strategy Isn't Working (and How to Fix It)Why Your Data Management Strategy Isn't Working (and How to Fix It)
Why Your Data Management Strategy Isn't Working (and How to Fix It)
 
DataEd Slides: The Seven Deadly Data Sins
DataEd Slides: The Seven Deadly Data SinsDataEd Slides: The Seven Deadly Data Sins
DataEd Slides: The Seven Deadly Data Sins
 
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
 
Big data & data science challenges and opportunities
Big data & data science   challenges and opportunitiesBig data & data science   challenges and opportunities
Big data & data science challenges and opportunities
 
Data science market insights usa
Data science market insights usaData science market insights usa
Data science market insights usa
 

Mehr von Ícaro Medeiros

Why Python is better for Data Science
Why Python is better for Data ScienceWhy Python is better for Data Science
Why Python is better for Data ScienceÍcaro Medeiros
 
Statistics: the grammar of Data Science
Statistics: the grammar of Data ScienceStatistics: the grammar of Data Science
Statistics: the grammar of Data ScienceÍcaro Medeiros
 
Linked Data, Big Data, and User Science at Globo.com
Linked Data, Big Data, and User Science at Globo.comLinked Data, Big Data, and User Science at Globo.com
Linked Data, Big Data, and User Science at Globo.comÍcaro Medeiros
 
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...Ícaro Medeiros
 
Futuro da busca na Web - Alagoas Dev Day 2014
Futuro da busca na Web - Alagoas Dev Day 2014Futuro da busca na Web - Alagoas Dev Day 2014
Futuro da busca na Web - Alagoas Dev Day 2014Ícaro Medeiros
 
Web Semântica na Globo.com (Novas Mídias UFRJ)
Web Semântica na Globo.com (Novas Mídias UFRJ)Web Semântica na Globo.com (Novas Mídias UFRJ)
Web Semântica na Globo.com (Novas Mídias UFRJ)Ícaro Medeiros
 
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013Ícaro Medeiros
 
Engenharia de ontologias
Engenharia de ontologiasEngenharia de ontologias
Engenharia de ontologiasÍcaro Medeiros
 
Schema.org - HTML semântico - Front in Maceio 2012
Schema.org - HTML semântico - Front in Maceio 2012Schema.org - HTML semântico - Front in Maceio 2012
Schema.org - HTML semântico - Front in Maceio 2012Ícaro Medeiros
 
R2R Framework: Ontology Mapping
R2R Framework: Ontology MappingR2R Framework: Ontology Mapping
R2R Framework: Ontology MappingÍcaro Medeiros
 
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...Ícaro Medeiros
 
Tag Suggestion using Multiple Sources of Knowledge
Tag Suggestion using Multiple Sources of KnowledgeTag Suggestion using Multiple Sources of Knowledge
Tag Suggestion using Multiple Sources of KnowledgeÍcaro Medeiros
 
Expressões regulares no Linux
Expressões regulares no LinuxExpressões regulares no Linux
Expressões regulares no LinuxÍcaro Medeiros
 

Mehr von Ícaro Medeiros (16)

Why Python is better for Data Science
Why Python is better for Data ScienceWhy Python is better for Data Science
Why Python is better for Data Science
 
Statistics: the grammar of Data Science
Statistics: the grammar of Data ScienceStatistics: the grammar of Data Science
Statistics: the grammar of Data Science
 
Linked Data, Big Data, and User Science at Globo.com
Linked Data, Big Data, and User Science at Globo.comLinked Data, Big Data, and User Science at Globo.com
Linked Data, Big Data, and User Science at Globo.com
 
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
 
Futuro da busca na Web - Alagoas Dev Day 2014
Futuro da busca na Web - Alagoas Dev Day 2014Futuro da busca na Web - Alagoas Dev Day 2014
Futuro da busca na Web - Alagoas Dev Day 2014
 
Web Semântica na Globo.com (Novas Mídias UFRJ)
Web Semântica na Globo.com (Novas Mídias UFRJ)Web Semântica na Globo.com (Novas Mídias UFRJ)
Web Semântica na Globo.com (Novas Mídias UFRJ)
 
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
 
Engenharia de ontologias
Engenharia de ontologiasEngenharia de ontologias
Engenharia de ontologias
 
Schema.org - HTML semântico - Front in Maceio 2012
Schema.org - HTML semântico - Front in Maceio 2012Schema.org - HTML semântico - Front in Maceio 2012
Schema.org - HTML semântico - Front in Maceio 2012
 
Ontology matching
Ontology matchingOntology matching
Ontology matching
 
R2R Framework: Ontology Mapping
R2R Framework: Ontology MappingR2R Framework: Ontology Mapping
R2R Framework: Ontology Mapping
 
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
 
Tag Suggestion using Multiple Sources of Knowledge
Tag Suggestion using Multiple Sources of KnowledgeTag Suggestion using Multiple Sources of Knowledge
Tag Suggestion using Multiple Sources of Knowledge
 
Expressões regulares no Linux
Expressões regulares no LinuxExpressões regulares no Linux
Expressões regulares no Linux
 
Ontology Learning
Ontology LearningOntology Learning
Ontology Learning
 
Tag Suggestion
Tag SuggestionTag Suggestion
Tag Suggestion
 

Kürzlich hochgeladen

Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 

Kürzlich hochgeladen (17)

Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 

Data Science and Culture

  • 1. Data Science & Culture (Or how to stop worrying and love data driven culture) Ícaro Medeiros Data Science Forum São Paulo, Jun 2017
  • 3. Big Data http://www.kdnuggets.com/2017/02/origins-big-data.html ✦ Fundamental blocks: evolutions on CS e.g. distributed systems, databases, massive AI, etc ✦ Fuzzy concept, ill-defined ✦ Popularized by Gartner
 (hype-fueled consulting firm)
  • 4. ✦ Big Data no longer considered an emerging technology (pervasive in industry) ✦ Entered Trough of Disillusionment in 2013 https://knowledgeimmersion.wordpress.com/2016/06/22/disillusionment-of-big-data/
  • 6. Data science ✦ Statistics (late 19th century) ✦ Computer Science (1950s) ✦ Machine Learning (1950s) ✦ Data Mining (1990s) ✦ Data Science (2010s) https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century yet another hyped term
  • 7. Beware: controversy ✦ Data science is not all-science ✴ It’s getting more and more engineering-like, a practice ✴ Data storytelling is a creative endeavor ✦ Hyper-inflated expectations, misunderstood concepts and hurry to get value: a dangerous recipe
  • 8. A new hope machine learning big data https://trends.google.com/trends/explore?date=today%2012-m&geo=US&q=machine%20learning,big%20data or hype
  • 9. Hype: not that bad ✦ Haters gonna hate i.e. don’t fully hate the hype ✴ more practitioners = faster tech and processes evolution ✴ Highly skilled professionals and innovation ✦ Academics sometimes look for difficult unwanted problems ✴ industry is more pragmatic, specially in tech https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science
  • 10. What we need… ✦ Forget about Big Data pokémons ✴ OH so in Big Data we don’t need people to think schemas? ✦ Forget about misunderstood business expectations ✴ OH in deep learning we don’t need people to train models? ✦ You need PEOPLE ✴ Collaborating with shared values ✴ Awesome in tech but more importantly: CREATIVE
  • 13. Good people ✦ People are more important than ideas ✴ A mediocre team will screw up a good idea ✴ Mediocre idea to great team: they will fix it or rethink it ✦ A good lab: different kinds of autonomous thinkers ✴ Why hire smart people if they can't fix what’s broken? ✦ Prefer a heterogeneous and complimentary team instead of looking for unicorns
  • 14. The mythical 10x professional https://twitter.com/icaromedeiros/status/838968884023668737
  • 15. Good communication ✦ Honesty, excellence, originality and self- criticism (values) ✦ Communication structure <> organizational ✦ Be ready to hear the truth ✴ Sincerity is only valuable if people are open and willing to give up on ideas that will not work ✦ Braintrust: Leave ego and Jobs outside the door
  • 16. Power to the people! ✦ Product quality is everyone’s responsibility ✴ Don’t ask permission to take responsibility ✦ Passion and excellence versus autonomy ✦ Good things might shadow the bad ✴ People struggle to explore bad things to avoid being called “complainers”
  • 18. Destroy data silos! ✦ Without information about data there is no science ✦ Software and data should be a collective property within the company ✦ Knowledge management matter ✦ Communication between areas must be enforced
  • 19. Data portals ✦ Self-service platforms to publish datasets ✴ Descriptions, schemas, samples, relations between datasets, etc ✦ Open Data initiatives, mostly governments ✦ OSS platforms: CKAN, AirBNB’s Dataportal ✦ Examples: data.gov.uk, dados.gov.br, etc
  • 20. “When it comes to creative inspiration, job titles and hierarchy are meaningless”
  • 22. Data storytelling ✦ Explain what numbers tell in layman, clear terms ✦ Make hidden premises clear ✴ Outside data insights ✦ Convince others about actions ✴ Decreases insights-to-value interval ✦ From data to knowledge https://www.forbes.com/sites/brentdykes/2016/03/31/data-storytelling-the-essential-data-science-skill-everyone-needs
  • 23. What is creativity ✦ Unexpected connections of concepts and ideas ✦ It's a marathon, it needs rhythm ✦ Creativity must start somewhere and there’s power on healthy feedback in a iterative process
  • 24. Visual communication ✦ Clean straightforward graphs > visually appealing ✴ Choose dataviz libs wisely ✦ “Don’t make me think” ✦ The right graph for the right audience ✴ Prefer a language everyone understands
  • 26. Stats are not enough https://www.autodeskresearch.com/publications/samestats
  • 27. Stats are not enough https://www.autodeskresearch.com/publications/samestats
  • 29. Avoid egotrip data science ✦ “OH my cluster has 10 Petabytes, I’m awesome” ✦ Fancy ML algorithms are not the goal ✦ The most important V in Big Data is value https://twitter.com/amyhoy/status/847097034536554497
  • 30. KPI versus HiPPO ✦ Tech adoption per se is meaningless ✴ Slide-driven Big Data ✴ KPIs should grow from Big Data and data insights initatives ✦ Poor defined goals -> bad decisions ✦ Define viable but ambitious goals ✦ Data beats opinion
  • 31. Set goal, plan and GO! ✦ Business questions can't be like “OH we want to detect things related to millennials” ✦ Clear goals must be set, with actionable metrics ✦ Balance perfect models versus time-to-market ✦ Brad Bird: “Sometimes, as a director, you’re guiding. Sometimes you’re letting the car drive” https://hbr.org/2017/02/how-chief-data-officers-can-get-their-companies-to-collect-clean-data
  • 32. The process ✦ The process is not the goal ✴ It has no agenda or taste, it’s just a tool ✦ Quality is the best business plan ✦ Agile is a mindset: not only kanbans or scrum ✦ If the model will become operational, mix scientists and engineers from start
  • 33. Build vs Buy ✦ If you buy and your core business is not techie, you can be illiterate in tech ✴ Benchmark before buying ✴ Accelerate results and boost internal knowledge ✦ If you build and have a good-enough techie culture, you’re more or less good to go ✴ Assess pros and cons consciously ✦ If you surf the tech hype AND build good systems you’re awesome
  • 37. Big Data vs Great Data ✦ If your logical models do not make sense ✦ Most performed queries are slow ✦ If you have string-only databases ✦ If you have unused expensive data ✦ Maybe your data lake is a swamp
  • 38. “The data is a mess” ✦ First step: accelerate human understanding of data ✴ Metadata, context, hidden assumptions ✦ Datasets might serves multiple purposes ✴ Define rationale and context ✴ Data portals and understandable datasets > Dashboards https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-science https://medium.com/airbnb-engineering/democratizing-data-at-airbnb-852d76c51770
  • 39. Data lost in translation ✦ Heterogeneous and siloed databases (and people) ✦ Rethink ESB (microservices network) ✦ State-of-the-art: data workflow ✴ Luigi, Airflow (open source), almost every big tech vendor ✴ Transparency, reusability, reproducibility, traceability ✴ Automation and monitoring all the way! https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-science
  • 40. Beyond relational models ✦ Not all data problems fits well in traditional SQL or DW models ✴ Key-value, columnar, graph-based, inverted index, etc ✦ Models are a framework for problem-solving ✴ Not the ultimate answer ✴ There’s no one-size-fits-all model
  • 41. Do not forget fluency ✦ Check the company lingua franca ✦ Make it easy for critical decision-makers ✴ Adhoc SQL queries? ✴ Dashboards? ✴ Reports?
  • 43. Experiments ✦ Missions to discover facts towards understanding ✴ They don’t fail, any result produces new information ✴ If the initial theory was wrong: good ✴ With new facts you can reformulate the question ✦ Get more modeling questions asked more often ✦ Iterative data science
  • 44. Product experimentation (A/B) ✦ Product experimentation should be hypothesis- driven (not feature-driven) ✦ Define the proper exposed population ✴ No new users, no heavy users only, no early adopters ✦ Understanding effect is essential https://medium.com/airbnb-engineering/4-principles-for-making-experimentation-count-7a5f1a5268a
  • 45. 5 stages of A/B tests https://www.linkedin.com/pulse/ab-testing-which-do-i-pick-sahar-heidari
  • 46. Some other quick tips ✦ Focus on outcomes (not algorithms or methods) ✦ Design the right metric and evaluation ✦ Good experiments don't produce obvious insights ✦ Mix of data and intuition https://twitter.com/mrdatascience/status/869957499662860288
  • 47. Being data driven ✦ Be BAYESIAN - uncertainty is everywhere ✦ Be CURIOUS - keep learning ✦ Be AGILE - Fail fast, not too fast: evidence comes first https://www.reaktor.com/blog/culture-eats-data-science-for-breakfast/
  • 48. Being data driven ✦ Be TRUTHFUL - don’t torture data to please opinions ✦ Be HELPFUL - work across silos, support democracy ✦ Be WISE - know when to be analytical or intuitive https://www.reaktor.com/blog/culture-eats-data-science-for-breakfast/
  • 49. With the right people, Democracy, Creativity, Strategy, Big Great Data™ and Experiments there's a good chance to do great SCIENCE Take-away message