SlideShare ist ein Scribd-Unternehmen logo
1 von 68
Me, A Data Scientist?
Fabricio Quintanilla, MSc, PhD
fabricio.quintanilla@gmail.com
@fabrixq
/fquintanilla
http://www.inteligenciadenegocios.net
MCP, MCPD, MCTS
Organiza
5/21/2016 Me, A Data Scientist?2 |
Patrocinadores del SQL Saturday
5/21/2016 Me, A Data Scientist?3 |
Agenda
Not Rocket Science….
Just Data Science…
5/21/2016 Me, A Data Scientist?4 |
Man on the Moon – 1969
5/21/2016 Me, A Data Scientist?5 |
Man on the Moon – Small Data
Computer Program
Date: 1969
64Kb, 2Kb RAM,
Fortran
Must Work 1st time
5/21/2016 Me, A Data Scientist?6 |
Apollo XI
Speed: 3,500 Km/h
Weight: 13,500 Kg
Lots of complex data
Man on the Moon
Distance: 356,500 Km
Never been there before
Must return to Earth
Skydive Stratos, 2012
5/21/2016 Me, A Data Scientist?7 |
Tens of Gigabytes!!!
Think about it ... We live in crazy times…
What is Big Data? mumbo-jumbo
§ A fashionable term typically used by some IT
vendors to remarket old fashioned software
and hardware
5/21/2016 Me, A Data Scientist?8 |
Big Data is not about Data Volume
5/21/2016 Me, A Data Scientist?9 |
No way!!!! Water Coller Chat
§ We need to parallelize data operations but it’s too costly & complex…
§ The business can’t get access to all the relevant data, we need external data
§ We can’t match customer master data to live customer interactions…
§ We can’t just force everything into a star-schema…
§ These BI reports and chart don’t tell us anything we didn’t know…
§ We are missing the ETL window, the data we needed didn’t arrive on time…
§ We can’t predict with confidence if we can’t explore data & develop our own
models
5/21/2016 Me, A Data Scientist?10 |
What is big data?
11
Big Data is
any thing
which is
crash Excel.
Small Data is
when is fit in RAM.
Big Data is when is
crash because is
not fit in RAM.
Or, in other words, Big Data is data
in volumes too great to process by
traditional methods.
https://twitter.com/devops_borat
What is Big Data? Force of Change
§ Big Data forces you to change the way you
collect, store, manage, analyze and visualize
data.
5/21/2016 Me, A Data Scientist?12 |
Big Data = “Crude Oil” [not useful oil]
§ Think data as ‘Crude Oil’
§ Big data is about extracting the ‘Crude Oil’,
transporting it in ‘mega-tankers’, siphoning it
through ‘pipelines’and storing it in massive
‘silos’…
§ All ‘this’ is about IT Big Data… fine and well…
§ BUT………..
5/21/2016 Me, A Data Scientist?13 |
You need to refine the ‘Crude Oil’
Enter Data Science
5/21/2016 Me, A Data Scientist?14 |
The Science [and Art] of…
§ Discovering what we don’t know from data
§ Obtaining predictive, actionable insight from data
§ Creating Data Products that have business impact now
§ Communicating relevant business stories from data
§ Building confidence in decisions that drive business value
5/21/2016 Me, A Data Scientist?15 |
What is a data scientist?
5/21/2016 Me, A Data Scientist?16 |
Class DataScientist {
Is skeptical, curious. Has inquisitive mind
Knows Machine Learning, Statistics, Probability
Applies Scientific Method. Runs Experiment
Is good at Coding & Hacking
Able to deal IT Data Engineering
Knows how to build data products
Able to find answers to known unknowns
Tells relevant business stories from data
Has Domain Knowledge
}
5/21/2016 Me, A Data Scientist?17 |
What does a Data Scientist Do?
5/21/2016 Me, A Data Scientist?18 |
10 Things [most] Data Scientists Do
§ Ask Good Questions, What is What
§ …we don’t know?
§ …we’d like to know?
§ Define and Test an Hypothesis, Run experiments
§ Scoop, Scrap, Sink & Sample Business Relevant Data
§ Purge and Wrestle Data, Tame Data
§ Explore Data, Discover Data Playfully. Discover
Unknowns.
§ Model Data. Model Algorithms
§ Understand Data Relationships
§ Tell the Machine How to Learn from Data
§ Create Data Products that DeliverActionable insight
§ Tell Relevant Business Stories from Data
5/21/2016 Me, A Data Scientist?19 |
[Sort of a] Data Scientist Toolkit
§ Java, R, Phyton… (bonus: Clojure, Haskell, Scala)
§ Hadoop, HDFS & MapReduce… (bonus: Spark, Storm)
§ Hbase, Pig & Hive… (bonus: Shark, Impala, Cascalog)
§ ETL, Webscrapers, Flume, Sqoop… (bonus: Hume)
§ SQL, RDBMS, DW, OLAP…
§ Knime, Weka, RapidMiner… (bonus: SciPy, NumPy, scikit-
learn, pandas)
§ D3.js, Gephi, ggplot2, Tableu, Flare, Shiny…
§ SPSS, Matlab, SAS… (the Enterprise man)
§ NoSQL, MongoDB, Couchbase, Cassandra…
§ And Yes!!! … MS-Excel: the most used, most underrated
DS tool…
5/21/2016 Me, A Data Scientist?20 |
Types of algorithms
21
§ Clustering
§ Association learning
§ Parameter estimation
§ Recommendation engines
§ Classification
§ Similarity matching
§ Neural networks
§ Bayesian networks
§ Genetic algorithms
Basically, it’s all maths...
22
§ Linear algebra
§ Calculus
§ Probability theory
§ Graph theory
§ ...
22
https://twitter.com/devops_borat
Only 10% in
devopsknow
how to work
with Big Data.
Only 1% are
realize they need
2 Big Data for
fault tolerance
Big data skills gap
§ Hardly anyone knows this stuff
§ It’s a big field, with lots and lots of theory
§ And it’s all maths, so it’s tricky to learn
23
http://wikibon.org/wiki/v/Big_Data:_Hadoop,_Business_Analytics_and_Beyond#The_Big_Data_Skills_Gap
http://www.ibmbigdatahub.com/blog/addressing-big-data-skills-
gap
Two orthogonal aspects
24
§ Analytics / machine learning
§ learning insights from data
§ Big data
§ handling massive data volumes
§ Can be combined, or used separately
Data science?
25
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
How to process Big Data?
26
§ If relational databases are not enough,
what is?
https://twitter.com/devops_borat
Mining ofBig
Data is
problem solved
in 2013 with
zgrep
MapReduce
27
§ A framework for writing massively
parallel code
§ Simple, straightforward model
§ Based on “map” and “reduce” functions
from functional programming (LISP)
NoSQL and Big Data
28
§ Not really that relevant
§ Traditional databases handle big data
sets, too
§ NoSQL databases have poor analytics
§ MapReduce often works from text files
§ can obviously work from SQL and NoSQL,
too
§ NoSQL is more for high throughput
§ basically, AP from the CAP theorem, instead
of CP
§ In practice, really Big Data is likely to be a
mix
§ text files, NoSQL, and SQL
The 4th V: Veracity
29
“The greatest enemy of knowledge is not
ignorance, it is the illusion of knowledge.”
Daniel Borstin, in The Discoverers
(1983)
https://twitter.com/devops_borat
95% of time,
when is clean Big
Data is get Little
Data
Data quality
§ A huge problem in practice
§ any manually entered data is suspect
§ most data sets are in practice deeply
problematic
§ Even automatically gathered data can be
a problem
§ systematic problems with sensors
§ errors causing data loss
§ incorrect metadata about the sensor
§ Never, never, never trust the data without
checking it!
§ garbage in, garbage out, etc
30
31
http://www.slideshare.net/Hadoop_Summit/scaling-big-data-mining-infrastructure-twitter-experience/12
Conclusion
§ Vast potential
§ to both big data and machine learning
§ Very difficult to realize that potential
§ requires mathematics, which nobody
knows
§ We need to wake up!
32
Theory
33
Two kinds of learning
34
§ Supervised
§ we have training data with correct
answers
§ use training data to prepare the algorithm
§ then apply it to data without a correct
answer
§ Unsupervised
§ no training data
§ throw data into the algorithm, hope it
makes some kind of sense out of the data
Some types of algorithms
§ Prediction
§ predicting a variable from data
§ Classification
§ assigning records to predefined groups
§ Clustering
§ splitting records into groups based on similarity
§ Association learning
§ seeing what often appears together with what
35
Issues
§ Data is usually noisy in some way
§ imprecise input values
§ hidden/latent input values
§ Inductive bias
§ basically, the shape of the algorithm we
choose
§ may not fit the data at all
§ may induce underfitting or overfitting
§ Machine learning without inductive bias
is not possible
36
Testing
37
§ When doing this for real, testing is crucial
§ Testing means splitting your data set
§ training data (used as input to algorithm)
§ test data (used for evaluation only)
§ Need to compute some measure of performance
§ precision/recall
§ root mean square error
§ A huge field of theory here
§ will not go into it in this course
§ very important in practice
Missing values
38
§ Usually, there are missing values in the data set
§ that is, some records have some NULL values
§ These cause problems for many machine
learning algorithms
§ Need to solve somehow
§ remove all records with NULLs
§ use a default value
§ estimate a replacement value
§ ...
Terminology
39
§ Vector
§ one-dimensional array
§ Matrix
§ two-dimensional array
§ Linear algebra
§ algebra with vectors and matrices
§ addition, multiplication, transposition, ...
Top 10 algorithms
40
Top 10 machine learning algs
1. C4.5 No
2. k-means clustering Yes
3. Support vector machines No
4. the Apriori algorithm No
5. the EM algorithm No
6. PageRank No
7. AdaBoost No
8. k-nearest neighbours class. Kind of
9. Naïve Bayes Yes
10.CART No
41
From a survey at IEEE International Conference on Data Mining (ICDM) in December 2006.“Top
10 algorithms in data mining”,by X. Wu et al
C4.5
42
§ Algorithm for building decision trees
§ basically trees of boolean expressions
§ each node split the data set in two
§ leaves assign items to classes
§ Decision trees are useful not just for classification
§ they can also teach you something about the classes
§ C4.5 is a bit involved to learn
§ the ID3 algorithm is much simpler
§ CART (#10) is another algorithm for learning
decision trees
Support Vector Machines
43
§ A way to do binary classification on
matrices
§ Support vectors are the data points
nearest to the hyperplane that divides the
classes
§ SVMs maximize the distance between
SVs and the boundary
§ Particularly valuable because of “the
kernel trick”
§ using a transformation to a higher dimension
to handle more complex class boundaries
§ A bit of work to learn, but manageable
Apriori
44
§ An algorithm for “frequent itemsets”
§ basically, working out which items frequently
appear together
§ for example, what goods are often bought
together in the supermarket?
§ used for Amazon’s “customers who bought
this...”
§ Can also be used to find association rules
§ that is, “people who buy X often buy Y” or
similar
§ Apriori is slow
§ a faster, further development is FP-growth
http://www.dssresources.com/newsletters/66.php
Expectation Maximization
45
§ A deeply interesting algorithm I’ve seen
used in a number of contexts
§ very hard to understand what it does
§ very heavy on the maths
§ Essentially an iterative algorithm
§ skips between “expectation” step and
“maximization” step
§ tries to optimize the output of a function
§ Can be used for
§ clustering
§ a number of more specialized examples, too
PageRank
46
§ Basically a graph analysis algorithm
§ identifies the most prominent nodes
§ used for weighting search results on Google
§ Can be applied to any graph
§ for example an RDF data set
§ Basically works by simulating random walk
§ estimating the likelihood that a walker would be
on a given node at a given time
§ actual implementation is linear algebra
§ The basic algorithm has some issues
§ “spider traps”
§ graph must be connected
§ straightforward solutions to these exist
AdaBoost
47
§ Algorithm for “ensemble learning”
§ That is, for combining several algorithms
§ and training them on the same data
§ Combining more algorithms can be very
effective
§ usually better than a single algorithm
§ AdaBoost basically weights training
samples
§ giving the most weight to those which are
classified the worst
Recommendations
48
Collaborative filtering
§ Basically, you’ve got some set of items
§ these can be movies, books, beers, whatever
§ You’ve also got ratings from users
§ on a scale of 1-5, 1-10, whatever
§ Can you use this to recommend items to a
user, based on their ratings?
§ if you use the connection between their
ratings and other people’s ratings, it’s called
collaborative filtering
§ other approaches are possible
49
Feature-based recommendation
50
§ Use user’s ratings of items
§ run an algorithm to learn what features of
items the user likes
§ Can be difficult to apply because
§ requires detailed information about items
§ key features may not be present in data
§ Recommending music may be difficult,
for example
Naïve Bayes
51
Bayes’s Theorem
52
§ Basically a theorem for combining
probabilities
§ I’ve observed A, which indicates H is true with
probability 70%
§ I’ve also observed B, which indicates H is
true with probability 85%
§ what should I conclude?
§ Naïve Bayes is basically using this
theorem
§ with the assumption that A and B are
indepedent
§ this assumption is nearly always false, hence
“naïve”
Simple example
53
§ Is the coin fair or not?
§ we throw it 10 times, get 9 heads and one tail
§ we try again, get 8 heads and two tails
§ What do we know now?
§ can combine data and recompute
§ or just use Bayes’s Theorem directly
http://www.bbc.co.uk/news/magazine-22310186
MapReduce
54
University pre-lecture, 1991
55
§ My first meeting with university was Open
University Day, in 1991
§ Professor Bjørn Kirkerud gave the computer
science talk
§ His subject
§ some day processors will stop becoming faster
§ we’re already building machines with many
processors
§ what we need is a way to parallelize software
§ preferably automatically, by feeding in normal
source code and getting it parallelized back
§ MapReduce is basically the state of the art
on that today
MapReduce
56
§ A framework for writing massively
parallel code
§ Simple, straightforward model
§ Based on “map” and “reduce” functions
from functional programming (LISP)
57
http://research.google.com/archive/mapreduce.html
Appeared in:
OSDI'04: Sixth Symposium on Operating System Design
and Implementation,
San Francisco, CA, December, 2004.
map and reduce
58
>>> "1 2 3 4 5 6 7 8".split()
['1', '2', '3', '4', '5', '6', '7', '8']
>>> l = map(int, "1 2 3 4 5 6 7 8".split())
>>> l
[1, 2, 3, 4, 5, 6, 7, 8]
>>> import operator
>>> reduce(operator.add, l)
36
MapReduce
59
1. Split data into fragments
2. Create a Map task for each fragment
§ the task outputs a set of (key, value) pairs
3. Group the pairs by key
4. Call Reduce once for each key
§ all pairs with same key passed in together
§ reduce outputs new (key, value) pairs
Communications
60
§ HDFS
§ Hadoop Distributed File System
§ input data, temporary results, and results are stored
as files here
§ Hadoop takes care of making files available to nodes
§ Hadoop RPC
§ how Hadoop communicates between nodes
§ used for scheduling tasks, heartbeat etc
§ Most of this is in practice hidden from the
developer
The Hadoop ecosystem
61
§ Pig
§ dataflow language for setting up MR jobs
§ HBase
§ NoSQL database to store MR input in
§ Hive
§ SQL-like query language on top of Hadoop
§ Mahout
§ machine learning library on top of Hadoop
§ Hadoop Streaming
§ utility for writing mappers and reducers as
command-line tools in other languages
Applications of MapReduce
62
§ Linear algebra operations
§ easily mapreducible
§ SQL queries over heterogeneous data
§ basically requires only a mapping to tables
§ relational algebra easy to do in MapReduce
§ PageRank
§ basically one big set of matrix multiplications
§ the original application of MapReduce
§ Recommendation engines
§ the SON algorithm
§ ...
Apache Mahout
63
§ Has three main application areas
§ others are welcome, but this is mainly what’s there now
§ Recommendation engines
§ several different similarity measures
§ collaborative filtering
§ Slope-one algorithm
§ Clustering
§ k-means and fuzzy k-means
§ Latent Dirichlet Allocation
§ Classification
§ stochastic gradient descent
§ Support Vector Machines
§ Naïve Bayes
Lots of SQL-on-MapReduce tools
64
§ Tenzing Google
§ Hive Apache Hadoop
§ YSmart Ohio State
§ SQL-MR AsterData
§ HadoopDB Hadapt
§ Polybase Microsoft
§ RainStor RainStor Inc.
§ ParAccel ParAccel Inc.
§ Impala Cloudera
§ ...
Conclusion
65
Big data & machine learning
66
§ This is a huge field, growing very fast
§ Many algorithms and techniques
§ can be seen as a giant toolbox with wide-ranging
applications
§ Ranging from the very simple to the extremely
sophisticated
§ Difficult to see the big picture
§ Huge range of applications
§ Math skills are crucial
Take a look around Data Scientists’ Tools
Using SQL Server!!!
5/21/2016 Me, A Data Scientist?67 |
Fabricio	
Quintanilla
fabricio.quintanilla@gmail.co
m inteligenciadenegocios.net
@fabrixq
PREGUNTAS Y RESPUESTAS
5/21/2016 Me, A Data Scientist?68 |

Weitere ähnliche Inhalte

Was ist angesagt?

Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk KnowledgeKrishna Sankar
 
Data Science presentation for elementary school students
Data Science presentation for elementary school studentsData Science presentation for elementary school students
Data Science presentation for elementary school studentsMelanie Manning, CFA
 
How to become a Data Scientist?
How to become a Data Scientist? How to become a Data Scientist?
How to become a Data Scientist? HackerEarth
 
EDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko GrobelnikEDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko GrobelnikEuropean Data Forum
 
Open Data, Big Data and Machine Learning
Open Data, Big Data and Machine LearningOpen Data, Big Data and Machine Learning
Open Data, Big Data and Machine LearningSteven Van Vaerenbergh
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...Armando Vieira
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learningGiuseppe Manco
 
Machine Learning Introduction for Digital Business Leaders
Machine Learning Introduction for Digital Business LeadersMachine Learning Introduction for Digital Business Leaders
Machine Learning Introduction for Digital Business LeadersSudha Jamthe
 
Crowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic PerspectivesCrowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic PerspectivesAditya Parameswaran
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceEdureka!
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Intro to Machine Learning
Intro to Machine LearningIntro to Machine Learning
Intro to Machine LearningCorey Chivers
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
 
Three Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceThree Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceAditya Parameswaran
 

Was ist angesagt? (20)

Unit 3 part 2
Unit  3 part 2Unit  3 part 2
Unit 3 part 2
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk Knowledge
 
Data Science presentation for elementary school students
Data Science presentation for elementary school studentsData Science presentation for elementary school students
Data Science presentation for elementary school students
 
How to become a Data Scientist?
How to become a Data Scientist? How to become a Data Scientist?
How to become a Data Scientist?
 
Data science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebookData science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebook
 
EDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko GrobelnikEDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko Grobelnik
 
Open Data, Big Data and Machine Learning
Open Data, Big Data and Machine LearningOpen Data, Big Data and Machine Learning
Open Data, Big Data and Machine Learning
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Using hadoop for big data
Using hadoop for big dataUsing hadoop for big data
Using hadoop for big data
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learning
 
Machine Learning Introduction for Digital Business Leaders
Machine Learning Introduction for Digital Business LeadersMachine Learning Introduction for Digital Business Leaders
Machine Learning Introduction for Digital Business Leaders
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
 
Crowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic PerspectivesCrowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic Perspectives
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Intro to Machine Learning
Intro to Machine LearningIntro to Machine Learning
Intro to Machine Learning
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Three Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceThree Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data Science
 

Andere mochten auch

Hack Kid Con - Learn to be a Data Scientist for $1
Hack Kid Con - Learn to be a Data Scientist for $1Hack Kid Con - Learn to be a Data Scientist for $1
Hack Kid Con - Learn to be a Data Scientist for $1Adrian Cockcroft
 
Your Data Scientist Hates You
Your Data Scientist Hates YouYour Data Scientist Hates You
Your Data Scientist Hates YouBradford Stephens
 
The First Data Scientist: Forgotten Lessons From Ancient Greece On Winning Wi...
The First Data Scientist: Forgotten Lessons From Ancient Greece On Winning Wi...The First Data Scientist: Forgotten Lessons From Ancient Greece On Winning Wi...
The First Data Scientist: Forgotten Lessons From Ancient Greece On Winning Wi...Joe Clements
 
What kind of Data Scientist do you need?
What kind of Data Scientist do you need?What kind of Data Scientist do you need?
What kind of Data Scientist do you need?Agnieszka Zdebiak
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist ToolboxAndrei Savu
 
Becoming a Data Scientist: Advice From My Podcast Guests
Becoming a Data Scientist: Advice From My Podcast GuestsBecoming a Data Scientist: Advice From My Podcast Guests
Becoming a Data Scientist: Advice From My Podcast GuestsRenee Teate
 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyLyn Fenex
 
What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist Experian_US
 
Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyPeter Kua
 
Be a Data Scientist in 8 steps!
Be a Data Scientist in 8 steps! Be a Data Scientist in 8 steps!
Be a Data Scientist in 8 steps! PromptCloud
 
The path to be a data scientist
The path to be a data scientistThe path to be a data scientist
The path to be a data scientistPoo Kuan Hoong
 
A Data Scientist Experiment
A Data Scientist ExperimentA Data Scientist Experiment
A Data Scientist ExperimentJan Chipchase
 
Вебинар: Инструменты для работы Data Scientist
Вебинар: Инструменты для работы Data ScientistВебинар: Инструменты для работы Data Scientist
Вебинар: Инструменты для работы Data ScientistFlyElephant
 
Data Science Day New York: Data Scientist - The New Data Analyst
Data Science Day New York: Data Scientist - The New Data AnalystData Science Day New York: Data Scientist - The New Data Analyst
Data Science Day New York: Data Scientist - The New Data AnalystCloudera, Inc.
 
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...Dataconomy Media
 
Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Edureka!
 
How Will AI Change the Role of the Data Scientist?
How Will AI Change the Role of the Data Scientist?How Will AI Change the Role of the Data Scientist?
How Will AI Change the Role of the Data Scientist?Hugo Gävert
 

Andere mochten auch (20)

So you want to be a Data Scientist?
So you want to be a Data Scientist?So you want to be a Data Scientist?
So you want to be a Data Scientist?
 
Hack Kid Con - Learn to be a Data Scientist for $1
Hack Kid Con - Learn to be a Data Scientist for $1Hack Kid Con - Learn to be a Data Scientist for $1
Hack Kid Con - Learn to be a Data Scientist for $1
 
Your Data Scientist Hates You
Your Data Scientist Hates YouYour Data Scientist Hates You
Your Data Scientist Hates You
 
The First Data Scientist: Forgotten Lessons From Ancient Greece On Winning Wi...
The First Data Scientist: Forgotten Lessons From Ancient Greece On Winning Wi...The First Data Scientist: Forgotten Lessons From Ancient Greece On Winning Wi...
The First Data Scientist: Forgotten Lessons From Ancient Greece On Winning Wi...
 
Data scientist start now!
Data scientist   start now!Data scientist   start now!
Data scientist start now!
 
What kind of Data Scientist do you need?
What kind of Data Scientist do you need?What kind of Data Scientist do you need?
What kind of Data Scientist do you need?
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
Becoming a Data Scientist: Advice From My Podcast Guests
Becoming a Data Scientist: Advice From My Podcast GuestsBecoming a Data Scientist: Advice From My Podcast Guests
Becoming a Data Scientist: Advice From My Podcast Guests
 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st Century
 
What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist
 
Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi Periasamy
 
Be a Data Scientist in 8 steps!
Be a Data Scientist in 8 steps! Be a Data Scientist in 8 steps!
Be a Data Scientist in 8 steps!
 
Data Scientist Why now?
Data Scientist Why now?Data Scientist Why now?
Data Scientist Why now?
 
The path to be a data scientist
The path to be a data scientistThe path to be a data scientist
The path to be a data scientist
 
A Data Scientist Experiment
A Data Scientist ExperimentA Data Scientist Experiment
A Data Scientist Experiment
 
Вебинар: Инструменты для работы Data Scientist
Вебинар: Инструменты для работы Data ScientistВебинар: Инструменты для работы Data Scientist
Вебинар: Инструменты для работы Data Scientist
 
Data Science Day New York: Data Scientist - The New Data Analyst
Data Science Day New York: Data Scientist - The New Data AnalystData Science Day New York: Data Scientist - The New Data Analyst
Data Science Day New York: Data Scientist - The New Data Analyst
 
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...
 
Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!
 
How Will AI Change the Role of the Data Scientist?
How Will AI Change the Role of the Data Scientist?How Will AI Change the Role of the Data Scientist?
How Will AI Change the Role of the Data Scientist?
 

Ähnlich wie Sql saturday el salvador 2016 - Me, A Data Scientist?

Decoding Data Science
Decoding Data ScienceDecoding Data Science
Decoding Data ScienceMatt Fornito
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Trieu Nguyen
 
Building Data Scientists
Building Data ScientistsBuilding Data Scientists
Building Data ScientistsMitch Sanders
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingPaco Nathan
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachMihai Criveti
 
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons LearnedBio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons LearnedChris Dagdigian
 
Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data CommonsSimon Twigger
 
Big Data Rampage
Big Data RampageBig Data Rampage
Big Data RampageNiko Vuokko
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist prateek kumar
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceAnnie Flippo
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceInstitute of Contemporary Sciences
 
Why Your Database Queries Stink -SeaGl.org November 11th, 2016
Why Your Database Queries Stink -SeaGl.org November 11th, 2016Why Your Database Queries Stink -SeaGl.org November 11th, 2016
Why Your Database Queries Stink -SeaGl.org November 11th, 2016Dave Stokes
 

Ähnlich wie Sql saturday el salvador 2016 - Me, A Data Scientist? (20)

Big data 101
Big data 101Big data 101
Big data 101
 
Decoding Data Science
Decoding Data ScienceDecoding Data Science
Decoding Data Science
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
 
Building Data Scientists
Building Data ScientistsBuilding Data Scientists
Building Data Scientists
 
Future se oct15
Future se oct15Future se oct15
Future se oct15
 
Big Data Pitfalls
Big Data PitfallsBig Data Pitfalls
Big Data Pitfalls
 
Big data pipelines
Big data pipelinesBig data pipelines
Big data pipelines
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely heading
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons LearnedBio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
 
Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data Commons
 
Spark
SparkSpark
Spark
 
Big Data Rampage
Big Data RampageBig Data Rampage
Big Data Rampage
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data Science
 
2014 pycon-talk
2014 pycon-talk2014 pycon-talk
2014 pycon-talk
 
Research skills
Research skillsResearch skills
Research skills
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data Science
 
Why Your Database Queries Stink -SeaGl.org November 11th, 2016
Why Your Database Queries Stink -SeaGl.org November 11th, 2016Why Your Database Queries Stink -SeaGl.org November 11th, 2016
Why Your Database Queries Stink -SeaGl.org November 11th, 2016
 

Kürzlich hochgeladen

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 

Kürzlich hochgeladen (20)

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 

Sql saturday el salvador 2016 - Me, A Data Scientist?

  • 1. Me, A Data Scientist? Fabricio Quintanilla, MSc, PhD fabricio.quintanilla@gmail.com @fabrixq /fquintanilla http://www.inteligenciadenegocios.net MCP, MCPD, MCTS
  • 2. Organiza 5/21/2016 Me, A Data Scientist?2 |
  • 3. Patrocinadores del SQL Saturday 5/21/2016 Me, A Data Scientist?3 |
  • 4. Agenda Not Rocket Science…. Just Data Science… 5/21/2016 Me, A Data Scientist?4 |
  • 5. Man on the Moon – 1969 5/21/2016 Me, A Data Scientist?5 |
  • 6. Man on the Moon – Small Data Computer Program Date: 1969 64Kb, 2Kb RAM, Fortran Must Work 1st time 5/21/2016 Me, A Data Scientist?6 | Apollo XI Speed: 3,500 Km/h Weight: 13,500 Kg Lots of complex data Man on the Moon Distance: 356,500 Km Never been there before Must return to Earth
  • 7. Skydive Stratos, 2012 5/21/2016 Me, A Data Scientist?7 | Tens of Gigabytes!!! Think about it ... We live in crazy times…
  • 8. What is Big Data? mumbo-jumbo § A fashionable term typically used by some IT vendors to remarket old fashioned software and hardware 5/21/2016 Me, A Data Scientist?8 |
  • 9. Big Data is not about Data Volume 5/21/2016 Me, A Data Scientist?9 |
  • 10. No way!!!! Water Coller Chat § We need to parallelize data operations but it’s too costly & complex… § The business can’t get access to all the relevant data, we need external data § We can’t match customer master data to live customer interactions… § We can’t just force everything into a star-schema… § These BI reports and chart don’t tell us anything we didn’t know… § We are missing the ETL window, the data we needed didn’t arrive on time… § We can’t predict with confidence if we can’t explore data & develop our own models 5/21/2016 Me, A Data Scientist?10 |
  • 11. What is big data? 11 Big Data is any thing which is crash Excel. Small Data is when is fit in RAM. Big Data is when is crash because is not fit in RAM. Or, in other words, Big Data is data in volumes too great to process by traditional methods. https://twitter.com/devops_borat
  • 12. What is Big Data? Force of Change § Big Data forces you to change the way you collect, store, manage, analyze and visualize data. 5/21/2016 Me, A Data Scientist?12 |
  • 13. Big Data = “Crude Oil” [not useful oil] § Think data as ‘Crude Oil’ § Big data is about extracting the ‘Crude Oil’, transporting it in ‘mega-tankers’, siphoning it through ‘pipelines’and storing it in massive ‘silos’… § All ‘this’ is about IT Big Data… fine and well… § BUT……….. 5/21/2016 Me, A Data Scientist?13 |
  • 14. You need to refine the ‘Crude Oil’ Enter Data Science 5/21/2016 Me, A Data Scientist?14 |
  • 15. The Science [and Art] of… § Discovering what we don’t know from data § Obtaining predictive, actionable insight from data § Creating Data Products that have business impact now § Communicating relevant business stories from data § Building confidence in decisions that drive business value 5/21/2016 Me, A Data Scientist?15 |
  • 16. What is a data scientist? 5/21/2016 Me, A Data Scientist?16 |
  • 17. Class DataScientist { Is skeptical, curious. Has inquisitive mind Knows Machine Learning, Statistics, Probability Applies Scientific Method. Runs Experiment Is good at Coding & Hacking Able to deal IT Data Engineering Knows how to build data products Able to find answers to known unknowns Tells relevant business stories from data Has Domain Knowledge } 5/21/2016 Me, A Data Scientist?17 |
  • 18. What does a Data Scientist Do? 5/21/2016 Me, A Data Scientist?18 |
  • 19. 10 Things [most] Data Scientists Do § Ask Good Questions, What is What § …we don’t know? § …we’d like to know? § Define and Test an Hypothesis, Run experiments § Scoop, Scrap, Sink & Sample Business Relevant Data § Purge and Wrestle Data, Tame Data § Explore Data, Discover Data Playfully. Discover Unknowns. § Model Data. Model Algorithms § Understand Data Relationships § Tell the Machine How to Learn from Data § Create Data Products that DeliverActionable insight § Tell Relevant Business Stories from Data 5/21/2016 Me, A Data Scientist?19 |
  • 20. [Sort of a] Data Scientist Toolkit § Java, R, Phyton… (bonus: Clojure, Haskell, Scala) § Hadoop, HDFS & MapReduce… (bonus: Spark, Storm) § Hbase, Pig & Hive… (bonus: Shark, Impala, Cascalog) § ETL, Webscrapers, Flume, Sqoop… (bonus: Hume) § SQL, RDBMS, DW, OLAP… § Knime, Weka, RapidMiner… (bonus: SciPy, NumPy, scikit- learn, pandas) § D3.js, Gephi, ggplot2, Tableu, Flare, Shiny… § SPSS, Matlab, SAS… (the Enterprise man) § NoSQL, MongoDB, Couchbase, Cassandra… § And Yes!!! … MS-Excel: the most used, most underrated DS tool… 5/21/2016 Me, A Data Scientist?20 |
  • 21. Types of algorithms 21 § Clustering § Association learning § Parameter estimation § Recommendation engines § Classification § Similarity matching § Neural networks § Bayesian networks § Genetic algorithms
  • 22. Basically, it’s all maths... 22 § Linear algebra § Calculus § Probability theory § Graph theory § ... 22 https://twitter.com/devops_borat Only 10% in devopsknow how to work with Big Data. Only 1% are realize they need 2 Big Data for fault tolerance
  • 23. Big data skills gap § Hardly anyone knows this stuff § It’s a big field, with lots and lots of theory § And it’s all maths, so it’s tricky to learn 23 http://wikibon.org/wiki/v/Big_Data:_Hadoop,_Business_Analytics_and_Beyond#The_Big_Data_Skills_Gap http://www.ibmbigdatahub.com/blog/addressing-big-data-skills- gap
  • 24. Two orthogonal aspects 24 § Analytics / machine learning § learning insights from data § Big data § handling massive data volumes § Can be combined, or used separately
  • 26. How to process Big Data? 26 § If relational databases are not enough, what is? https://twitter.com/devops_borat Mining ofBig Data is problem solved in 2013 with zgrep
  • 27. MapReduce 27 § A framework for writing massively parallel code § Simple, straightforward model § Based on “map” and “reduce” functions from functional programming (LISP)
  • 28. NoSQL and Big Data 28 § Not really that relevant § Traditional databases handle big data sets, too § NoSQL databases have poor analytics § MapReduce often works from text files § can obviously work from SQL and NoSQL, too § NoSQL is more for high throughput § basically, AP from the CAP theorem, instead of CP § In practice, really Big Data is likely to be a mix § text files, NoSQL, and SQL
  • 29. The 4th V: Veracity 29 “The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge.” Daniel Borstin, in The Discoverers (1983) https://twitter.com/devops_borat 95% of time, when is clean Big Data is get Little Data
  • 30. Data quality § A huge problem in practice § any manually entered data is suspect § most data sets are in practice deeply problematic § Even automatically gathered data can be a problem § systematic problems with sensors § errors causing data loss § incorrect metadata about the sensor § Never, never, never trust the data without checking it! § garbage in, garbage out, etc 30
  • 32. Conclusion § Vast potential § to both big data and machine learning § Very difficult to realize that potential § requires mathematics, which nobody knows § We need to wake up! 32
  • 34. Two kinds of learning 34 § Supervised § we have training data with correct answers § use training data to prepare the algorithm § then apply it to data without a correct answer § Unsupervised § no training data § throw data into the algorithm, hope it makes some kind of sense out of the data
  • 35. Some types of algorithms § Prediction § predicting a variable from data § Classification § assigning records to predefined groups § Clustering § splitting records into groups based on similarity § Association learning § seeing what often appears together with what 35
  • 36. Issues § Data is usually noisy in some way § imprecise input values § hidden/latent input values § Inductive bias § basically, the shape of the algorithm we choose § may not fit the data at all § may induce underfitting or overfitting § Machine learning without inductive bias is not possible 36
  • 37. Testing 37 § When doing this for real, testing is crucial § Testing means splitting your data set § training data (used as input to algorithm) § test data (used for evaluation only) § Need to compute some measure of performance § precision/recall § root mean square error § A huge field of theory here § will not go into it in this course § very important in practice
  • 38. Missing values 38 § Usually, there are missing values in the data set § that is, some records have some NULL values § These cause problems for many machine learning algorithms § Need to solve somehow § remove all records with NULLs § use a default value § estimate a replacement value § ...
  • 39. Terminology 39 § Vector § one-dimensional array § Matrix § two-dimensional array § Linear algebra § algebra with vectors and matrices § addition, multiplication, transposition, ...
  • 41. Top 10 machine learning algs 1. C4.5 No 2. k-means clustering Yes 3. Support vector machines No 4. the Apriori algorithm No 5. the EM algorithm No 6. PageRank No 7. AdaBoost No 8. k-nearest neighbours class. Kind of 9. Naïve Bayes Yes 10.CART No 41 From a survey at IEEE International Conference on Data Mining (ICDM) in December 2006.“Top 10 algorithms in data mining”,by X. Wu et al
  • 42. C4.5 42 § Algorithm for building decision trees § basically trees of boolean expressions § each node split the data set in two § leaves assign items to classes § Decision trees are useful not just for classification § they can also teach you something about the classes § C4.5 is a bit involved to learn § the ID3 algorithm is much simpler § CART (#10) is another algorithm for learning decision trees
  • 43. Support Vector Machines 43 § A way to do binary classification on matrices § Support vectors are the data points nearest to the hyperplane that divides the classes § SVMs maximize the distance between SVs and the boundary § Particularly valuable because of “the kernel trick” § using a transformation to a higher dimension to handle more complex class boundaries § A bit of work to learn, but manageable
  • 44. Apriori 44 § An algorithm for “frequent itemsets” § basically, working out which items frequently appear together § for example, what goods are often bought together in the supermarket? § used for Amazon’s “customers who bought this...” § Can also be used to find association rules § that is, “people who buy X often buy Y” or similar § Apriori is slow § a faster, further development is FP-growth http://www.dssresources.com/newsletters/66.php
  • 45. Expectation Maximization 45 § A deeply interesting algorithm I’ve seen used in a number of contexts § very hard to understand what it does § very heavy on the maths § Essentially an iterative algorithm § skips between “expectation” step and “maximization” step § tries to optimize the output of a function § Can be used for § clustering § a number of more specialized examples, too
  • 46. PageRank 46 § Basically a graph analysis algorithm § identifies the most prominent nodes § used for weighting search results on Google § Can be applied to any graph § for example an RDF data set § Basically works by simulating random walk § estimating the likelihood that a walker would be on a given node at a given time § actual implementation is linear algebra § The basic algorithm has some issues § “spider traps” § graph must be connected § straightforward solutions to these exist
  • 47. AdaBoost 47 § Algorithm for “ensemble learning” § That is, for combining several algorithms § and training them on the same data § Combining more algorithms can be very effective § usually better than a single algorithm § AdaBoost basically weights training samples § giving the most weight to those which are classified the worst
  • 49. Collaborative filtering § Basically, you’ve got some set of items § these can be movies, books, beers, whatever § You’ve also got ratings from users § on a scale of 1-5, 1-10, whatever § Can you use this to recommend items to a user, based on their ratings? § if you use the connection between their ratings and other people’s ratings, it’s called collaborative filtering § other approaches are possible 49
  • 50. Feature-based recommendation 50 § Use user’s ratings of items § run an algorithm to learn what features of items the user likes § Can be difficult to apply because § requires detailed information about items § key features may not be present in data § Recommending music may be difficult, for example
  • 52. Bayes’s Theorem 52 § Basically a theorem for combining probabilities § I’ve observed A, which indicates H is true with probability 70% § I’ve also observed B, which indicates H is true with probability 85% § what should I conclude? § Naïve Bayes is basically using this theorem § with the assumption that A and B are indepedent § this assumption is nearly always false, hence “naïve”
  • 53. Simple example 53 § Is the coin fair or not? § we throw it 10 times, get 9 heads and one tail § we try again, get 8 heads and two tails § What do we know now? § can combine data and recompute § or just use Bayes’s Theorem directly http://www.bbc.co.uk/news/magazine-22310186
  • 55. University pre-lecture, 1991 55 § My first meeting with university was Open University Day, in 1991 § Professor Bjørn Kirkerud gave the computer science talk § His subject § some day processors will stop becoming faster § we’re already building machines with many processors § what we need is a way to parallelize software § preferably automatically, by feeding in normal source code and getting it parallelized back § MapReduce is basically the state of the art on that today
  • 56. MapReduce 56 § A framework for writing massively parallel code § Simple, straightforward model § Based on “map” and “reduce” functions from functional programming (LISP)
  • 57. 57 http://research.google.com/archive/mapreduce.html Appeared in: OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004.
  • 58. map and reduce 58 >>> "1 2 3 4 5 6 7 8".split() ['1', '2', '3', '4', '5', '6', '7', '8'] >>> l = map(int, "1 2 3 4 5 6 7 8".split()) >>> l [1, 2, 3, 4, 5, 6, 7, 8] >>> import operator >>> reduce(operator.add, l) 36
  • 59. MapReduce 59 1. Split data into fragments 2. Create a Map task for each fragment § the task outputs a set of (key, value) pairs 3. Group the pairs by key 4. Call Reduce once for each key § all pairs with same key passed in together § reduce outputs new (key, value) pairs
  • 60. Communications 60 § HDFS § Hadoop Distributed File System § input data, temporary results, and results are stored as files here § Hadoop takes care of making files available to nodes § Hadoop RPC § how Hadoop communicates between nodes § used for scheduling tasks, heartbeat etc § Most of this is in practice hidden from the developer
  • 61. The Hadoop ecosystem 61 § Pig § dataflow language for setting up MR jobs § HBase § NoSQL database to store MR input in § Hive § SQL-like query language on top of Hadoop § Mahout § machine learning library on top of Hadoop § Hadoop Streaming § utility for writing mappers and reducers as command-line tools in other languages
  • 62. Applications of MapReduce 62 § Linear algebra operations § easily mapreducible § SQL queries over heterogeneous data § basically requires only a mapping to tables § relational algebra easy to do in MapReduce § PageRank § basically one big set of matrix multiplications § the original application of MapReduce § Recommendation engines § the SON algorithm § ...
  • 63. Apache Mahout 63 § Has three main application areas § others are welcome, but this is mainly what’s there now § Recommendation engines § several different similarity measures § collaborative filtering § Slope-one algorithm § Clustering § k-means and fuzzy k-means § Latent Dirichlet Allocation § Classification § stochastic gradient descent § Support Vector Machines § Naïve Bayes
  • 64. Lots of SQL-on-MapReduce tools 64 § Tenzing Google § Hive Apache Hadoop § YSmart Ohio State § SQL-MR AsterData § HadoopDB Hadapt § Polybase Microsoft § RainStor RainStor Inc. § ParAccel ParAccel Inc. § Impala Cloudera § ...
  • 66. Big data & machine learning 66 § This is a huge field, growing very fast § Many algorithms and techniques § can be seen as a giant toolbox with wide-ranging applications § Ranging from the very simple to the extremely sophisticated § Difficult to see the big picture § Huge range of applications § Math skills are crucial
  • 67. Take a look around Data Scientists’ Tools Using SQL Server!!! 5/21/2016 Me, A Data Scientist?67 |