SlideShare ist ein Scribd-Unternehmen logo
1 von 95
using Apache Spark MLlib
#javaone
https://ua.linkedin.com/in/tarasmatyashovsky
2
I am not
a data science
engineer
3
4
lyrics
genre
5
“I'm a rolling thunder, a pouring rain
I'm comin' on like a hurricane
My lightning's flashing across the sky
You're only young but you're gonna die
I won't take no prisoners, won't spare no lives
Nobody's putting up a fight
I got my bell, I'm gonna take you to hell
I'm gonna get you, Satan get you”
https://github.com/tmatyashovsky/spark-ml-samples
6
“I'm a rolling thunder, a pouring rain
I'm comin' on like a hurricane
My lightning's flashing across the sky
You're only young but you're gonna die
I won't take no prisoners, won't spare no lives
Nobody's putting up a fight
I got my bell, I'm gonna take you to hell
I'm gonna get you, Satan get you”
https://github.com/tmatyashovsky/spark-ml-samples
7
8
 Look for particular words like “fear”, “fight”, “kill”,
“devil”, ”death”, etc.?
 Count length of a verse?
 Count unique words in a verse?
9
10
15
11
is the study of
computer
algorithms that
improve
automatically
through
experience
12
Supervise
d
learning
Unsupervise
d
learning
Reinforcemen
t
learning
13
14
 Date & time
 Conference name
 Speaker
 Talk name
 Track
 Duration
 Type
 Overall impression
 Overall rating
 Number of slides
 Time spent on live
coding
 Number of jokes
 Etc.
15
Learning algorithms
Hypotheses:
Сost function:
Features:
Target variable:
Training example:
Training set:
16
http://www.slideshare.net/liweiyang5/spark-mllib-training-material
17
Number of jokes during a talk
Speaker’s
rating
18
19
20
21
22
23
24
Positive
Negative
Impression
Number of jokes during a talk
25
26
27
28
29
30
31
 Collect data set of lyrics:
 Abba, Ace of base, Backstreet Boys, Britney Spears,
Christina Aguilera, Madonna, etc.
 Black Sabbath, In Flames, Iron Maiden, Metallica,
Moonspell, Nightwish, Sentenced, etc.
 Create training set, i.e. label (0|1) + features
 Train logistic regression (or other classification
algorithm)
https://github.com/tmatyashovsky/spark-ml-samples
32
https://github.com/tmatyashovsky/spark-ml-samples
33
34
GloV
e Bag
of
Words
Word2VecTF-
IDF
http://spark.apache.org/docs/latest/ml-features.html#feature-extractors
35
 Produces unique fixed-size dense vectors
 Captures semantic and morphologic similarity
https://code.google.com/archive/p/word2vec/
36
Similar
scores
(cos ~ 1)
Opposite
scores
(cos ~ -1)
Unrelated
scores
(cos ~ 0)
http://bionlp-www.utu.fi/wv_demo/ http://blog.christianperone.com/wp-content/uploads/2013/09/cosinesimilarityfq1.png
37
38
Verse Cosine Distance
baby one more time 0.482028
crazy for you 0.437875
show me the meaning
of being lonely
0.258147
highway to hell -0.1120049
kill them all -0.231876
https://github.com/tmatyashovsky/spark-ml-samples
https://github.com/tmatyashovsky/spark-ml-samples
39
Under-fitting
(high bias)
Over-fitting
(high variance)
Appropriate
fitting
http://mlwiki.org/index.php/Overfitting
42
Training set (66,6%)
Test set (33%)
K = 3
43
Training set (66,6%)
Test set (33%)
K = 3
44
Training set (33,3%)
Test set (33%)
Training set (33,3%)
K = 3
45
46
Java
47
Weka
Encog
AerosolveFlinkM
L
https://github.com/josephmisiti/awesome-machine-learning
48
Easy of
use
Cloud
computing
Spee
d
Generali
ty
Data
processing
49
https://databricks.com/blog/2015/02/09/learning-spark-book-available-from-oreilly.html
50
Is a library of ML algorithms and utilities
designed to run in parallel on Spark cluster
51
 Introduces a few new data types, e.g.
vector (dense and sparse), labeled point,
rating, etc.
 Allows to invoke various algorithms on
distributed datasets (RDD/Dataset)
http://spark.apache.org/docs/latest/mllib-guide.html
52
http://spark.apache.org/docs/latest/mllib-guide.html
Build on
top of
RDDs
Build on
top of
Datasets
spark.mll
ib
spark.ml
53
 Utilities: linear algebra, statistics, etc.
 Features extraction, features transforming, etc.
 Regression
 Classification
 Clustering
 Collaborative filtering, e.g. alternating least squares
 Dimensionality reduction
 And many more
http://spark.apache.org/docs/latest/mllib-guide.html
54
”All” spark.mllib features plus:
• Pipelines
• Persistence
• Model selection and tuning:
• Train validation split
• K-folds cross validation
http://spark.apache.org/docs/latest/ml-guide.html
55
Raw data Transformer
Estimator
[parameters]
Transformer
[parameters]
Estimator
[parameters]
Dataset Dataset
Dataset
Dataset
http://spark.apache.org/docs/latest/ml-pipeline.html
Cross
Validator
[pipeline,
evaluator,
parameters]
Dataset
56
Using Spark MLlib Pipeline
Lyrics
https://github.com/tmatyashovsky/spark-ml-samples
58
I'm a rolling thunder, a pouring rain
I'm comin' on like a hurricane
My lightning's flashing across the sky
You're only young but you're gonna die
I won't take no prisoners, won't spare no lives
Nobody's putting up a fight
I got my bell, I'm gonna take you to hell
I'm gonna get you, Satan get you
https://github.com/tmatyashovsky/spark-ml-samples
59
Lyrics Cleanser
Dataset
https://github.com/tmatyashovsky/spark-ml-samples
60
I'm a rolling thunder, a pouring rain
I'm comin' on like a hurricane
My lightning's flashing across the sky
You're only young but you're gonna die
I won't take no prisoners, won't spare no lives
Nobody's putting up a fight
I got my bell, I'm gonna take you to hell
I'm gonna get you, Satan get you
https://github.com/tmatyashovsky/spark-ml-samples
61
Lyrics Cleanser
Dataset
Numerator
Dataset
https://github.com/tmatyashovsky/spark-ml-samples
62
Im a rolling thunder a pouring rain
Im comin on like a hurricane
My lightnings flashing across the sky
Youre only young but youre gonna die
I wont take no prisoners wont spare no lives
Nobodys putting up a fight
I got my bell Im gonna take you to hell
Im gonna get you Satan get you
https://github.com/tmatyashovsky/spark-ml-samples
63
1
2
3
4
5
6
7
8
Lyrics Cleanser
Dataset
Numerator Tokenizer
Stop Words
Remover
Dataset Dataset
Dataset
https://github.com/tmatyashovsky/spark-ml-samples
64
im a rolling thunder a pouring rain
im comin on like a hurricane
My lightnings flashing across the sky
youre only young but youre gonna die
I wont take no prisoners wont spare no lives
nobodys putting up a fight
I got my bell im gonna take you to hell
im gonna get you satan get you
https://github.com/tmatyashovsky/spark-ml-samples
65
1
2
3
4
5
6
7
8
Lyrics Cleanser
Dataset
Dataset
Numerator Tokenizer
Stop Words
Remover
Dataset Dataset
ExploderStemmer
Dataset
Uniter
Dataset
Dataset
https://github.com/tmatyashovsky/spark-ml-samples
66
im rolling thunder pouring rain
im comin like hurricane
lightnings flashing across sky
youre young youre gonna die
wont take prisoners wont spare lives
nobodiys putting fight
got bell im gonna take hell
im gonna get satan get
https://github.com/tmatyashovsky/spark-ml-samples
67
1
2
3
4
5
6
7
8
Lyrics Cleanser
Dataset
Dataset
Numerator Tokenizer
Stop Words
Remover
Dataset Dataset
ExploderStemmer
Dataset
Uniter
Dataset
Verser
[Sentences
in verse]
Dataset
Dataset
https://github.com/tmatyashovsky/spark-ml-samples
68
4
im roll thunder pour rain
im comin like hurrican
lightn flash across sky
your young your gonna die
wont take prison wont spare live
nobodi put fight
got bell im gonna take hell
im gonna get satan get
https://github.com/tmatyashovsky/spark-ml-samples
69
1
2
3
4
5
6
7
8
verse1
verse2
8
im roll thunder pour rain
im comin like hurrican
Light n flash across sky
your young your gonna die
wont take prison wont spare live
nobodi put fight
got bell im gonna take hell
im gonna get satan get
https://github.com/tmatyashovsky/spark-ml-samples
70
1
2
3
4
5
6
7
8
verse1
Lyrics Cleanser
Word2Vec
[Vector size]
Dataset
Dataset
Numerator Tokenizer
Stop Words
Remover
Dataset Dataset
ExploderStemmer
Dataset
Uniter
Dataset
Verser
[Sentences
in verse]
Dataset
Dataset
Dataset
https://github.com/tmatyashovsky/spark-ml-samples
71
4
[0.036463763926011056,
-0.013076733228398295,
...
0.03816963326281462]
https://github.com/tmatyashovsky/spark-ml-samples
72
feature1
feature2
[-0.013962931134021625,
0.049275818325650804,
...
-0.058982484615766086]
8
[0.036463763926011056,
-0.013076733228398295,
0.044362547532774695,
0.03816963326281462,
...
-0.013962931134021625,
0.049275818325650804,
-0.058982484615766086]
https://github.com/tmatyashovsky/spark-ml-samples
73
feature1
Lyrics Cleanser
Word2Vec
[Vector size]
Dataset
Dataset
Numerator Tokenizer
Stop Words
Remover
Dataset Dataset
ExploderStemmer
Dataset
Uniter
Dataset
Verser
[Sentences
in verse]
Dataset
Logistic
Regression
[Max iterations,
Reg parameter]
Dataset
Dataset
Dataset
https://github.com/tmatyashovsky/spark-ml-samples
74
Probability:
[0.9212126972383768,
0.07878730276162313]
Prediction:
0.0
https://github.com/tmatyashovsky/spark-ml-samples
75
Lyrics Cleanser
Word2Vec
[Vector size]
Dataset
Dataset
Numerator Tokenizer
Stop Words
Remover
Dataset Dataset
ExploderStemmer
Dataset
Uniter
Dataset
Verser
[Sentences
in verse]
Dataset
Logistic
Regression
[Max iterations,
Reg parameter]
Dataset
Dataset
Cross
Validator
Model
Dataset
https://github.com/tmatyashovsky/spark-ml-samples
76
[0.8454839775240359,
0.9061236588248319,
0.9527128936788524,
0.9522790271664413,
...
0.9526248129757111,
0.9522790271664411]
https://github.com/tmatyashovsky/spark-ml-samples
77
Lyrics Cleanser
Word2Vec
[Vector size]
Dataset
Dataset
Numerator Tokenizer
Stop Words
Remover
Dataset Dataset
ExploderStemmer
Dataset
Uniter
Dataset
Verser
[Sentences
in verse]
Dataset
Logistic
Regression
[Max iterations,
Reg parameter]
Dataset
Dataset
Cross
Validator
Model
Dataset
https://github.com/tmatyashovsky/spark-ml-samples
78
79
• Other feature extractors:
• Term Frequency – Inverse Document
Frequency (TD-IDF), Token counts (TF), etc.
• Other classification algorithms:
• Naive Bayes, Random Forest, Support Vector
Machines (SVM), etc.
http://spark.apache.org/docs/latest/ml-guide.html
81
https://spark.apache.org/docs/latest/ml-classification-regression.html#naive-bayes
82
0.3 Lov
e
Lif
e
0.4
Deat
h
0.3
Lov
e
Lif
e
0.6
0.3
Deat
h
0.1
”Love Life Death”?
https://spark.apache.org/docs/latest/ml-classification-regression.html#naive-bayes
83
Lov
e
Lif
e
0.6
0.3
Deat
h
0.1
0.3 Lov
e
Lif
e
0.4
Deat
h
0.3
https://spark.apache.org/docs/latest/ml-classification-regression.html#naive-bayes
84
Lov
e
Lif
e
0.6
0.3
Deat
h
0.1
0.3 Lov
e
Lif
e
0.4
Deat
h
0.3
https://spark.apache.org/docs/latest/ml-classification-regression.html#naive-bayes
85
”Love Life”?
Lov
e
Lif
e
0.6
0.3
Deat
h
0.1
0.3 Lov
e
Lif
e
0.4
Deat
h
0.3
https://spark.apache.org/docs/latest/ml-classification-regression.html#naive-bayes
86
Lov
e
Lif
e
0.6
0.3
Deat
h
0.1
0.3 Lov
e
Lif
e
0.4
Deat
h
0.3
https://spark.apache.org/docs/latest/ml-classification-regression.html#naive-bayes
87
Lov
e
Lif
e
0.6
0.3
Deat
h
0.1
0.3 Lov
e
Lif
e
0.4
Deat
h
0.3
https://spark.apache.org/docs/latest/ml-features.html#feature-extractors
88
Lyrics Cleanser
Word2Vec
[Vector size]
Dataset
Dataset
Numerator Tokenizer
Stop Words
Remover
Dataset Dataset
ExploderStemmer
Dataset
Uniter
Dataset
Verser
[Sentences
in verse]
Dataset
Logistic
Regression
[Max iterations,
Reg parameter]
Dataset
Dataset
Cross
Validator
Model
Dataset
https://github.com/tmatyashovsky/spark-ml-samples
89
Lyrics Cleanser
Dataset
Dataset
Numerator Tokenizer
Stop Words
Remover
Dataset Dataset
ExploderStemmer
Dataset
Uniter
Dataset
Verser
[Sentences
in verse]
Dataset
Naive
Bayes
Dataset Dataset
Dataset
https://github.com/tmatyashovsky/spark-ml-samples
90
Hashing
TF
[Num
Features]
IDF
[Min Doc
Freq]
Dataset
Cross
Validator
Model
91
92
93
 ML is not as complex as it seems from an applied
perspective
 Existing libraries and frameworks reduce a lot of
tedious work
 For instance, Spark MLlib can help to build nice ML
pipelines
Design by
94
 https://www.quora.com/What-is-the-difference-between-supervised-and-unsupervised-learning-algorithms
 Learning Spark, by Holden Karau, Andy Konwinski, Patrick Wendell and Matei Zaharia
 https://databricks.com/blog/2015/01/07/ml-pipelines-a-new-high-level-api-for-mllib.html
 https://databricks.com/blog/2016/05/31/apache-spark-2-0-preview-machine-learning-model-persistence.html
 https://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research
 https://www.kaggle.com/c/dogs-vs-cats/
 http://yann.lecun.com/exdb/mnist/
 http://www.bcl.hamilton.ie/~barak/teach/F98/ECE547/hw1/index.html
 http://www.slideshare.net/jeykottalam/pipelines-ampcamp
 https://github.com/master/spark-stemming
 https://databricks.com/blog/2016/04/01/unreasonable-effectiveness-of-deep-learning-on-apache-spark.html
 http://www.degeneratestate.org/posts/2016/Apr/20/heavy-metal-and-natural-language-processing-part-1/
 https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/functions.html
 https://www.quora.com/What-is-the-difference-between-supervised-and-unsupervised-learning-algorithms
 http://www.slideshare.net/liweiyang5/spark-mllib-training-material
 https://databricks.com/blog/2016/01/25/deep-learning-with-apache-spark-and-tensorflow.htm
 http://www.slideshare.net/databricks/combining-machine-learning-frameworks-with-apache-spark l
 https://databricks.com/blog/2015/10/20/audience-modeling-with-apache-spark-ml-pipelines.html
 https://github.com/deeplearning4j/deeplearning4j
 http://deeplearning4j.org/spark
 http://mlwiki.org/index.php/Overfitting
 http://bionlp-www.utu.fi/wv_demo/
 https://quomodocumque.wordpress.com/2016/01/15/messing-around-with-word2vec/
95

Weitere ähnliche Inhalte

Mehr von Taras Matyashovsky

Influence. The Psychology of Persuasion (in IT)
Influence. The Psychology of Persuasion (in IT)Influence. The Psychology of Persuasion (in IT)
Influence. The Psychology of Persuasion (in IT)Taras Matyashovsky
 
JEEConf 2015 - Introduction to real-time big data with Apache Spark
JEEConf 2015 - Introduction to real-time big data with Apache SparkJEEConf 2015 - Introduction to real-time big data with Apache Spark
JEEConf 2015 - Introduction to real-time big data with Apache SparkTaras Matyashovsky
 
Morning at Lohika 1st anniversary
Morning at Lohika 1st anniversaryMorning at Lohika 1st anniversary
Morning at Lohika 1st anniversaryTaras Matyashovsky
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkTaras Matyashovsky
 
New life inside monolithic application
New life inside monolithic applicationNew life inside monolithic application
New life inside monolithic applicationTaras Matyashovsky
 
Distributed applications using Hazelcast
Distributed applications using HazelcastDistributed applications using Hazelcast
Distributed applications using HazelcastTaras Matyashovsky
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
 

Mehr von Taras Matyashovsky (8)

Influence. The Psychology of Persuasion (in IT)
Influence. The Psychology of Persuasion (in IT)Influence. The Psychology of Persuasion (in IT)
Influence. The Psychology of Persuasion (in IT)
 
JEEConf 2015 - Introduction to real-time big data with Apache Spark
JEEConf 2015 - Introduction to real-time big data with Apache SparkJEEConf 2015 - Introduction to real-time big data with Apache Spark
JEEConf 2015 - Introduction to real-time big data with Apache Spark
 
Morning at Lohika 1st anniversary
Morning at Lohika 1st anniversaryMorning at Lohika 1st anniversary
Morning at Lohika 1st anniversary
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache Spark
 
New life inside monolithic application
New life inside monolithic applicationNew life inside monolithic application
New life inside monolithic application
 
Distributed applications using Hazelcast
Distributed applications using HazelcastDistributed applications using Hazelcast
Distributed applications using Hazelcast
 
Morning at Lohika
Morning at LohikaMorning at Lohika
Morning at Lohika
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
 

Kürzlich hochgeladen

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 

Kürzlich hochgeladen (20)

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 

Distinguish Pop from Heavy Metal using Apache Spark MLlib

Hinweis der Redaktion

  1. Score of the speaker based on xxx.
  2. Quantity of jokes used. Liked or not liked the speaker.
  3. Bag of words – a single word is a one hot encoding vector with the size of the dictionary. As a result – a lot of sparse vectors.
  4. Behind the scenes - a two-layer neural net that processes text. Captures semantic and morphologic similarity so similar words are close in the vector space Similar words would be clustered together in the high dimensional sphere. 
  5. If two words are very close to synonymous, you’d expect them to show up in similar contexts, and indeed synonymous words tend to be close. For two completely random words, the similarity is pretty close to 0. On an opposite side there is not an antonym, but usually just a noise. Used Google News Negative 300.
  6. My corpus - 8316 words
  7. Let’s finally go to the implementation using a library or framework that is going to help us to avoid tedious transformations and provide algorithms as well as feature extractors out-of-the-box.