The document discusses trends in data science. It provides background on data science, describing its multidisciplinary nature and noting key definitions and frameworks from 1974-2013. Real-world applications of data science are discussed in marketing, banking, healthcare, social media and more. The presentation then focuses on Opileak, a company that performs opinion mining and sentiment analysis on text. It describes Opileak's features and the presenter's PhD research on improving polarity analysis by accounting for conditional opinions. The presentation concludes that data science has many applications and Opileak's technology is promising.
2. @FernanOrtega
About me
• Co-Founder of Opileak
• CRO of Opileak
• PhD candidate at US
• Advisor: R. Corchuelo
• Member of TDG-Group
• Lecturer at D&T subject
2
16. @FernanOrtega
Brief history
• 1974 – Peter Naur – Datalogy & Data science
• 2002 – Committee on Data for Science & Technology
• 2003 – Journal of Data Science
• 2010 – Drew Conway – The data science Venn diagram
• 2010 – Mike Loukadis – What is Data science?
• 2011 – Irizarry, Peng & Leek – The keyword in “Data Science”
• 2013 – Vasant Dhar – “Data Science and Prediction”
19. @FernanOrtega
Irizarry, Peng & Leek, 2013
19
“The key word in data science is not ‘data’; but
‘science’. Data science is only useful when the
data are used to answer a question”
28. @FernanOrtega
Everything is about hypes
Cloud
computing
IOT Big data
Data mining Data science
Business
intelligence
Opinion
mining
Social media
analysis
28
30. @FernanOrtega
Data science general process
1.Ask a
question
2.Get the data
3.Explore the
data
4.Model the
data
5.Communicate
6.Implement
product
31. @FernanOrtega
Data science general process
1.Ask a
question
2.Get the data
3.Explore the
data
4.Model the
data
5.Communicate
6.Implement
product
32. @FernanOrtega
Ask a question
• Skills:
– Science
– Domain expertise
– Curiosity
• Tools:
– Your brain
– Talking to experts
– Experience
32
33. @FernanOrtega
Data science general process
1.Ask a
question
2.Get the data
3.Explore the
data
4.Model the
data
5.Communicate
6.Implement
product
34. @FernanOrtega
Get the data
• Skills:
– Web scraping
– Data cleaning
– Querying databases
• Tools:
– Web parsers
– SQL
– Python (pandas)
34
35. @FernanOrtega
Data science general process
1.Ask a
question
2.Get the data
3.Explore the
data
4.Model the
data
5.Communicate
6.Implement
product
36. @FernanOrtega
Explore the data
• Skills:
– Get to know data
– Develop hypotheses
– Detect pattern or
anomalies
• Tools:
– D3.js
– Matplotlib
– Excel!
36
37. @FernanOrtega
Data science general process
1.Ask a
question
2.Get the data
3.Explore the
data
4.Model the
data
5.Communicate
6.Implement
product
38. @FernanOrtega
Model the data
• Skills:
– Regression
– Machine learning
– Big data
• Tools:
– Spark (MLlib)
– Hadoop
– Mrjob
38
39. @FernanOrtega
Data science general process
1.Ask a
question
2.Get the data
3.Explore the
data
4.Model the
data
5.Communicate
6.Implement
product