SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
Guoqiong Song, Intel
Leveraging NLP and Deep
Learning for Document
Recommendation in the Cloud
#UnifiedAnalytics #SparkAISummit
Agenda
• Job/Resume Search Challenges and Opportunity
• Analytics Zoo and BigDL Overview
• Resume Search Analytics Zoo Solution
• Takeaways
Agenda
• Job/Resume Search Challenges and Opportunity
• Analytics Zoo and BigDL Overview
• Resume Search Analytics Zoo Solution
• Takeaways
4#UnifiedAnalytics #SparkAISummit
Job search
5
Personalize Results Value
Job Seekers Employers
Find the right job faster
Resume
Find the right person
Job description
Traditional Information Retrieval Sufferings
Solution challenges:
stemming, synonyms,
ontologies, sensitivity
6
Warehouse
Warehouse
Stemming
Accountant Accounting=!=
7
Stemming Solution
Accountant Accounting=!=
Accountant Accounting=
8
Stemming Sufferings
Accountant Accounting=!=
Accountant Accounting=
Accountant Accounting=
Account
Representative
=
9
Synonyms
Registered
Nurse
RN=!=
10
Synonyms Solution
Registered
Nurse
RN=!=
Registered
Nurse
RN → registered nurse=
11
Synonym Sufferings
Registered
Nurse
RN=!=
Registered
Nurse
RN → registered nurse=
. . .
12
Ontologies
Dishwasher Back of House=!=
13
Ontologies Solution
Dishwasher Back of House=!=
Restaurant Restaurant
Dishwasher Back of House
=
14
Ontologies Sufferings
. . .
Dishwasher Back of House=!=
Restaurant Restaurant
Dishwasher Back of House
=
15
Specificity Suffering
16
17
Personalize Results Value
Job Seekers Employers
Find the right job faster
Resume
Find the right person
Job description
Agenda
• Job/Resume Search Challenges and Opportunity
• Analytics Zoo and BigDL Overview
• Resume Search Analytics Zoo Solution
• Takeaways
AI on
Unifying Analytics + AI on Apache Spark
Distributed, High-Performance
Deep Learning Framework
for Apache Spark
https://github.com/intel-analytics/bigdl
DistributedTensoRflow, Keras and BigDL on Spark
Reference Use Cases,AI Models,
High-level APIs, Feature Engineering, etc.
https://github.com/intel-analytics/analytics-zoo
Unified Big Data Analytics Platform
Data
Input
Flume Kafka Storage HBaseHDFS
Resource Mgmt
& Co-ordination
ZooKeeperYARN
Data
Processing
& Analysis
MR
Storm
Apache Hadoop & Spark Platform
Parquet Avro
Spark Core
SQL Streaming MLlib GraphX
DataFrame
ML Pipelines
SparkR
Flink
Giraph
Batch Streaming Interactive
Machine
Leaning
Graph
Analytics SQL
R PythonJava
Notebook Spreadsheet
Chasm b/w Deep Learning and Big Data
Communities
Average users (big data users, data scientists, analysts, etc.)Deep learning experts
The
Chasm
Bridging the Chasm
Make deep learning more accessible to big data and data
science communities
• Continue the use of familiar SW tools and HW infrastructure to build deep learning
applications
• Analyze “big data” using deep learning on the same Hadoop/Spark cluster where the
data are stored
• Add deep learning functionalities to large-scale big data programs and/or workflow
• Leverage existing Hadoop/Spark clusters to run deep learning applications
• Shared, monitored and managed with other workloads (e.g., ETL, data warehouse, feature
engineering, traditional ML, graph analytics, etc.) in a dynamic and elastic fashion
BigDL
Bringing Deep Learning To Big Data Platform
https://github.com/intel-analytics/BigDL
Spark Core
DataFrame
https://bigdl-project.github.io/
• Distributed deep learning framework for Apache Spark*
• Make deep learning more accessible to big data users
and data scientists
• Write deep learning applications as standard Spark programs
• Run on existing Spark/Hadoop clusters (no changes needed)
• Feature parity with popular deep learning frameworks
• E.g., Caffe, Torch, Tensorflow, etc.
• High performance (on CPU)
• Powered by Intel MKL and multi-threaded programming
• Efficient scale-out
• Leveraging Spark for distributed training & inference
Spark Core
SQL SparkR Streaming
MLlib GraphX
ML Pipeline
DataFrame
BigDL Run as Standard Spark Programs
Standard Spark jobs
• No changes to the Spark or Hadoop clusters needed
Iterative
• Each iteration of the training runs as a Spark job
Data parallel
• Each Spark task runs the same model on a subset of the data (batch)
Distributed Training in BigDL
Peer-2-Peer All-Reduce Synchronization
Analytics Zoo
Unified Analytics + AI Platform for Big Data
Reference Use Cases
• Anomaly detection, sentiment analysis, fraud detection, image
generation, chatbot, etc.
Built-In Deep Learning
Models
• Image classification, object detection, text classification, text matching,
recommendations, sequence-to-sequence, anomaly detection, etc.
Feature Engineering
Feature transformations for
• Image, text, 3D imaging, time series, speech, etc.
High-Level Pipeline APIs
• Distributed TensorFlow and Keras on Spark
• Native support for transfer learning, Spark DataFrame and ML Pipelines
• Model serving API for model serving/inference pipelines
Backbends Spark, TensorFlow, Keras, BigDL, OpenVINO, MKL-DNN, etc.
https://github.com/intel-analytics/analytics-zoo/ https://analytics-zoo.github.io/
Distributed TensorFlow, Keras and BigDL on Spark
Analytics Zoo
Build end-to-end deep learning applications for big data
• Distributed TensorFlow on Spark
• Keras-style APIs (with autograd & transfer learning support)
• nnframes: native DL support for Spark DataFrames and ML Pipelines
• Built-in feature engineering operations for data preprocessing
Productionize deep learning applications for big data at scale
• Model serving APIs (w/ OpenVINO support)
• Support Web Services, Spark, Storm, Flink, Kafka, etc.
Out-of-the-box solutions
• Built-in deep learning models and reference use cases
What Can you do with Analytic Zoo?
Anomaly Detection
• Using LSTM network to detect anomalies in time series data
Fraud Detection
• Using feed-forward neural network to detect frauds in credit card
transaction data
Recommendation
• Use Analytics Zoo Recommendation API (i.e., Neural Collaborative
Filtering, Wide and Deep Learning) for recommendations on data with
explicit feedback.
Sentiment Analysis
• Sentiment analysis using neural network models (e.g. CNN, LSTM, GRU,
Bi-LSTM)
Variational Autoencoder (VAE)
• Use VAE to generate faces and digital numbers
https://github.com/intel-analytics/analytics-zoo/tree/master/apps
Building and Deploying with BigDL/Analytics Zoo
http://software.intel.com/bigdl/build
Not a Full List
Agenda
• Job/Resume Search Challenges and Opportunity
• Analytics Zoo and BigDL Overview
• Resume Search Analytics Zoo Solution
• Takeaways
Word Embeddings and GloVe Vectors
https://nlp.stanford.edu/projects/glove/
• Words or phrases from the vocabulary are mapped
to vectors of real numbers.
• Global log-bilinear regression model for the
unsupervised learning algorithm.
• Training is performed on aggregated global word-word
co-occurrence statistics from a Wikipedia.
• Vector representations showcase meaningful linear
substructures of the word vector space.
31
Analytics Zoo Recommender Model
• Neural collaborative filtering,
Wide and Deep
• Answer the question using
classification methodologies
• Implicit feedback and explicit
feedback
• APIs
• recommendForUser
• recommendForItem
• predictUserItemPair
https://github.com/intel-analytics/analytics-zoo/ https://analytics-zoo.github.io/
He, 2015
32
Recommender model
33
val model = Sequential[Float]()
model.add(Linear(100, 40)).add(ReLU())
.add(Linear(40, 20)).add(ReLU())
.add(Linear(20, 10)).add(ReLU())
.add(Linear(10, 2)).add(ReLU())
.add(LogSoftMax())
Resume Glove vectors
Linear1(40 output)
Linear2(20 output)
Linear3(10 output)
LogSoftMax
Job Glove vectors
Linear4(2 output)
End to End Flow
34
https://software.intel.com/en-us/articles/talroo-uses-analytics-zoo-
and-aws-to-leverage-deep-learning-for-job-recommendations
Evaluation Results
Precision MRR
35
Takeaways
• Analytics Zoo/BigDL integrates well into existing AWS Databricks
Spark ETL and machine learning platform
• Analytics Zoo/BigDL scales with our data and business
• Jobs and resumes can be effectively modeled and processed
through embeddings
• Ensembling multiple models and glove embedding feature
embedding proved to be very effective for rich content
• More information available at https://analytics-zoo.github.io/
36
Unified Analytics + AI Platform
Distributed TensorFlow, Keras and BigDL on Apache Spark
https://github.com/intel-analytics/analytics-zoo
Legal Disclaimers
• Intel technologies’ features and benefits depend on system configuration and may require enabled
hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer.
• No computer system can be absolutely secure.
• Tests document performance of components on a particular test, in specific systems. Differences in
hardware, software, or configuration will affect actual performance. Consult other sources of
information to evaluate performance as you consider your purchase. For more complete information
about performance and benchmark results, visit http://www.intel.com/performance.
Intel, the Intel logo, Xeon, Xeon phi, Lake Crest, etc. are trademarks of Intel Corporation in the U.S.
and/or other countries.
*Other names and brands may be claimed as the property of others.
© 2019 Intel Corporation

Weitere ähnliche Inhalte

Was ist angesagt?

Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
Domino Data Lab
 

Was ist angesagt? (20)

Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
Production machine learning_infrastructure
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructure
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AI
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
 
Open problems big_data_19_feb_2015_ver_0.1
Open problems big_data_19_feb_2015_ver_0.1Open problems big_data_19_feb_2015_ver_0.1
Open problems big_data_19_feb_2015_ver_0.1
 
Democratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn CreatorDemocratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn Creator
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with Anaconda
 
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
 
Distributed deep learning_framework_spark_4_may_2015_ver_0.7
Distributed deep learning_framework_spark_4_may_2015_ver_0.7Distributed deep learning_framework_spark_4_may_2015_ver_0.7
Distributed deep learning_framework_spark_4_may_2015_ver_0.7
 
H2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonH2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in Python
 
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
 
Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
 
Graph tour keynote 2019
Graph tour keynote 2019Graph tour keynote 2019
Graph tour keynote 2019
 
(BDT207) Use Streaming Analytics to Exploit Perishable Insights | AWS re:Inve...
(BDT207) Use Streaming Analytics to Exploit Perishable Insights | AWS re:Inve...(BDT207) Use Streaming Analytics to Exploit Perishable Insights | AWS re:Inve...
(BDT207) Use Streaming Analytics to Exploit Perishable Insights | AWS re:Inve...
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
 

Ähnlich wie Leveraging NLP and Deep Learning for Document Recommendations in the Cloud

Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
DataWorks Summit
 

Ähnlich wie Leveraging NLP and Deep Learning for Document Recommendations in the Cloud (20)

Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
 
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
 
New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 Edition
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientist
 
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
 

Mehr von Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

Mehr von Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Kürzlich hochgeladen

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Kürzlich hochgeladen (20)

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Leveraging NLP and Deep Learning for Document Recommendations in the Cloud

  • 1. Guoqiong Song, Intel Leveraging NLP and Deep Learning for Document Recommendation in the Cloud #UnifiedAnalytics #SparkAISummit
  • 2. Agenda • Job/Resume Search Challenges and Opportunity • Analytics Zoo and BigDL Overview • Resume Search Analytics Zoo Solution • Takeaways
  • 3. Agenda • Job/Resume Search Challenges and Opportunity • Analytics Zoo and BigDL Overview • Resume Search Analytics Zoo Solution • Takeaways
  • 5. 5 Personalize Results Value Job Seekers Employers Find the right job faster Resume Find the right person Job description
  • 6. Traditional Information Retrieval Sufferings Solution challenges: stemming, synonyms, ontologies, sensitivity 6 Warehouse Warehouse
  • 9. Stemming Sufferings Accountant Accounting=!= Accountant Accounting= Accountant Accounting= Account Representative = 9
  • 14. Ontologies Solution Dishwasher Back of House=!= Restaurant Restaurant Dishwasher Back of House = 14
  • 15. Ontologies Sufferings . . . Dishwasher Back of House=!= Restaurant Restaurant Dishwasher Back of House = 15
  • 17. 17 Personalize Results Value Job Seekers Employers Find the right job faster Resume Find the right person Job description
  • 18. Agenda • Job/Resume Search Challenges and Opportunity • Analytics Zoo and BigDL Overview • Resume Search Analytics Zoo Solution • Takeaways
  • 19. AI on Unifying Analytics + AI on Apache Spark Distributed, High-Performance Deep Learning Framework for Apache Spark https://github.com/intel-analytics/bigdl DistributedTensoRflow, Keras and BigDL on Spark Reference Use Cases,AI Models, High-level APIs, Feature Engineering, etc. https://github.com/intel-analytics/analytics-zoo
  • 20. Unified Big Data Analytics Platform Data Input Flume Kafka Storage HBaseHDFS Resource Mgmt & Co-ordination ZooKeeperYARN Data Processing & Analysis MR Storm Apache Hadoop & Spark Platform Parquet Avro Spark Core SQL Streaming MLlib GraphX DataFrame ML Pipelines SparkR Flink Giraph Batch Streaming Interactive Machine Leaning Graph Analytics SQL R PythonJava Notebook Spreadsheet
  • 21. Chasm b/w Deep Learning and Big Data Communities Average users (big data users, data scientists, analysts, etc.)Deep learning experts The Chasm
  • 22. Bridging the Chasm Make deep learning more accessible to big data and data science communities • Continue the use of familiar SW tools and HW infrastructure to build deep learning applications • Analyze “big data” using deep learning on the same Hadoop/Spark cluster where the data are stored • Add deep learning functionalities to large-scale big data programs and/or workflow • Leverage existing Hadoop/Spark clusters to run deep learning applications • Shared, monitored and managed with other workloads (e.g., ETL, data warehouse, feature engineering, traditional ML, graph analytics, etc.) in a dynamic and elastic fashion
  • 23. BigDL Bringing Deep Learning To Big Data Platform https://github.com/intel-analytics/BigDL Spark Core DataFrame https://bigdl-project.github.io/ • Distributed deep learning framework for Apache Spark* • Make deep learning more accessible to big data users and data scientists • Write deep learning applications as standard Spark programs • Run on existing Spark/Hadoop clusters (no changes needed) • Feature parity with popular deep learning frameworks • E.g., Caffe, Torch, Tensorflow, etc. • High performance (on CPU) • Powered by Intel MKL and multi-threaded programming • Efficient scale-out • Leveraging Spark for distributed training & inference Spark Core SQL SparkR Streaming MLlib GraphX ML Pipeline DataFrame
  • 24. BigDL Run as Standard Spark Programs Standard Spark jobs • No changes to the Spark or Hadoop clusters needed Iterative • Each iteration of the training runs as a Spark job Data parallel • Each Spark task runs the same model on a subset of the data (batch)
  • 25. Distributed Training in BigDL Peer-2-Peer All-Reduce Synchronization
  • 26. Analytics Zoo Unified Analytics + AI Platform for Big Data Reference Use Cases • Anomaly detection, sentiment analysis, fraud detection, image generation, chatbot, etc. Built-In Deep Learning Models • Image classification, object detection, text classification, text matching, recommendations, sequence-to-sequence, anomaly detection, etc. Feature Engineering Feature transformations for • Image, text, 3D imaging, time series, speech, etc. High-Level Pipeline APIs • Distributed TensorFlow and Keras on Spark • Native support for transfer learning, Spark DataFrame and ML Pipelines • Model serving API for model serving/inference pipelines Backbends Spark, TensorFlow, Keras, BigDL, OpenVINO, MKL-DNN, etc. https://github.com/intel-analytics/analytics-zoo/ https://analytics-zoo.github.io/ Distributed TensorFlow, Keras and BigDL on Spark
  • 27. Analytics Zoo Build end-to-end deep learning applications for big data • Distributed TensorFlow on Spark • Keras-style APIs (with autograd & transfer learning support) • nnframes: native DL support for Spark DataFrames and ML Pipelines • Built-in feature engineering operations for data preprocessing Productionize deep learning applications for big data at scale • Model serving APIs (w/ OpenVINO support) • Support Web Services, Spark, Storm, Flink, Kafka, etc. Out-of-the-box solutions • Built-in deep learning models and reference use cases
  • 28. What Can you do with Analytic Zoo? Anomaly Detection • Using LSTM network to detect anomalies in time series data Fraud Detection • Using feed-forward neural network to detect frauds in credit card transaction data Recommendation • Use Analytics Zoo Recommendation API (i.e., Neural Collaborative Filtering, Wide and Deep Learning) for recommendations on data with explicit feedback. Sentiment Analysis • Sentiment analysis using neural network models (e.g. CNN, LSTM, GRU, Bi-LSTM) Variational Autoencoder (VAE) • Use VAE to generate faces and digital numbers https://github.com/intel-analytics/analytics-zoo/tree/master/apps
  • 29. Building and Deploying with BigDL/Analytics Zoo http://software.intel.com/bigdl/build Not a Full List
  • 30. Agenda • Job/Resume Search Challenges and Opportunity • Analytics Zoo and BigDL Overview • Resume Search Analytics Zoo Solution • Takeaways
  • 31. Word Embeddings and GloVe Vectors https://nlp.stanford.edu/projects/glove/ • Words or phrases from the vocabulary are mapped to vectors of real numbers. • Global log-bilinear regression model for the unsupervised learning algorithm. • Training is performed on aggregated global word-word co-occurrence statistics from a Wikipedia. • Vector representations showcase meaningful linear substructures of the word vector space. 31
  • 32. Analytics Zoo Recommender Model • Neural collaborative filtering, Wide and Deep • Answer the question using classification methodologies • Implicit feedback and explicit feedback • APIs • recommendForUser • recommendForItem • predictUserItemPair https://github.com/intel-analytics/analytics-zoo/ https://analytics-zoo.github.io/ He, 2015 32
  • 33. Recommender model 33 val model = Sequential[Float]() model.add(Linear(100, 40)).add(ReLU()) .add(Linear(40, 20)).add(ReLU()) .add(Linear(20, 10)).add(ReLU()) .add(Linear(10, 2)).add(ReLU()) .add(LogSoftMax()) Resume Glove vectors Linear1(40 output) Linear2(20 output) Linear3(10 output) LogSoftMax Job Glove vectors Linear4(2 output)
  • 34. End to End Flow 34 https://software.intel.com/en-us/articles/talroo-uses-analytics-zoo- and-aws-to-leverage-deep-learning-for-job-recommendations
  • 36. Takeaways • Analytics Zoo/BigDL integrates well into existing AWS Databricks Spark ETL and machine learning platform • Analytics Zoo/BigDL scales with our data and business • Jobs and resumes can be effectively modeled and processed through embeddings • Ensembling multiple models and glove embedding feature embedding proved to be very effective for rich content • More information available at https://analytics-zoo.github.io/ 36
  • 37. Unified Analytics + AI Platform Distributed TensorFlow, Keras and BigDL on Apache Spark https://github.com/intel-analytics/analytics-zoo
  • 38. Legal Disclaimers • Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. • No computer system can be absolutely secure. • Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. Intel, the Intel logo, Xeon, Xeon phi, Lake Crest, etc. are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. © 2019 Intel Corporation