SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
NLP IN PRACTICE
STREAMING
ANALYSIS OF
LIVE EVENTS
ANDREW LARIMER
@ANDREWLARIMER
If we produced these event,
what would we want to know?
3
4
What did people think?
What did they like and what didn’t
they like?
What were people most excited
about?
5
What were most people saying (i.e.
what were the trends in
conversation)?
6
What characters/players stood out
to the fans?
With SENTIMENT ANALYSIS we can

tackle the first block of questions:
7
What did people think?
What did they like and what didn’t they like?
What were people most excited about?
8
SENTIMENT ANALYSIS
vaderSentiment is:
- rule and lexicon-based
- easy to use
- assigns polarity and intensity
- handles social media usage & emojis
- handles negation, i.e.



“VADER is not smart, handsome, nor
funny.”
9
SENTIMENT ANALYSIS
-1 1
MOST NEGATIVE MOST POSITIVE
10
SENTIMENT ANALYSIS
pip install vaderSentiment
11
SENTIMENT ANALYSIS
from vaderSentiment.vaderSentiment

import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
analyzer.polarity_scores(your_text_here)
12
SENTIMENT ANALYSIS
{'neg': 0.0,
'neu': 1.0,
'pos': 0.0,
‘compound': 0.0}
“You’re my queen.”
13
SENTIMENT ANALYSIS
{'neg': 0.379,
'neu': 0.621,
'pos': 0.0,
‘compound': -0.0572}
“I don’t want it.”
14
SENTIMENT ANALYSIS
{'neg': 0.602,
'neu': 0.398,
'pos': 0.0,
‘compound': -0.6739}
“Steph Curry is a FRAUD.”
15
SENTIMENT ANALYSIS
BUT let’s also explore its limitations
16
SENTIMENT ANALYSIS
“Steph Curry got outscored by Fred VanVleet & Kyle
Lowry at home in an elimination game in the NBA Finals.”
{'neg': 0.0,
'neu': 1.0,
'pos': 0.0,
‘compound': 0.0}
17
SENTIMENT ANALYSIS
“I love Jamie Lannister.”
{'neg': 0.0,
'neu': 0.323,
'pos': 0.677,
'compound': 0.6369}
18
SENTIMENT ANALYSIS
“I love that Jaime Lannister got what he deserved.”
{'neg': 0.0,
'neu': 0.625,
'pos': 0.375,
'compound': 0.6369}
19
20
21
22
23
With TOPIC CLUSTERING we can

tackle the second block of questions:
What were most people saying (i.e. what were the
trends in conversation)?
24
TOPIC CLUSTERING
BERT:
- Bidirectional Encoder Representations
from Transformer 
- Generates Word Embeddings
Word Embeddings
- Multi-dimensional numerical
representations of the context in which a
word is found
25
TOPIC CLUSTERING
768
DIMENSIONAL REPRESENTATIONS
26
TOPIC CLUSTERING
Github: hanxiao/bert-as-service
docker build -t bert-as-service -f ./docker/Dockerfile .
NUM_WORKER=1
PATH_MODEL=/PATH_TO/_YOUR_MODEL/
docker run --runtime nvidia -dit -p 5555:5555 -p 5556:5556
-v $PATH_MODEL:/model -t bert-as-service $NUM_WORKER
27
TOPIC CLUSTERING
Github: hanxiao/bert-as-service
pip install bert-serving-client
from bert_serving.client import BertClient
bc = BertClient()
bc.encode(['First do it', 'then do it right', 'then do it
better'])
28
TOPIC CLUSTERING
29
Now, streaming…
30
UNBOUNDED DATA
STREAMING
WINDOW
FINITE DATA
BATCH PROCESSING
STREAMING
WINDOW
STREAMING
WINDOW
31
Spark
Apache Beam + Dataflow
MESSAGES
(WITH TIMESTAMP)
32
TWEETS (VIA
TWEEPY)
CLOUD 

PUB/SUB
WINDOW
SENTIMENT
ANALYSIS
AGGREGATE
STATISTICS
WRITE TO
BIG QUERY
DATAFLOW
KUBERNETES

CLUSTER
33
messages = []
for line in data_lines:
pub = base64.urlsafe_b64encode(line)
messages.append({'data': pub})
body = {'messages': messages}
resp = client.projects().topics().publish(
topic=pubsub_topic, body=body).execute(
num_retries=NUM_RETRIES)
34
with Pipeline(options=options) as p:
results = (p | 'read_from_topic' >> ReadFromPubSub(topic=PUBSUB_TOPIC,
with_attributes=False,
timestamp_attribute='created_at')
| 'Window' >> WindowInto(window.FixedWindows(60))
| 'Emit_needed_values' >> FlatMap(emit_values,entity_map)
| 'Combine' >> CombinePerKey(EntityScoreCombine())
| 'Add Window Timestamp' >> beam.ParDo(AddWindowTimestampFn())
| 'FormatForWrite' >> Map(format_for_write)
| 'Write' >> WriteToBigQuery('streaming_scores',
dataset=BQ_DATASET,
project=PROJECT_ID,
create_disposition='CREATE_IF_NEEDED',
write_disposition='WRITE_APPEND',
batch_size=10)
)
MESSAGES
(WITH TIMESTAMP)
35
TWEETS (VIA
TWEEPY)
CLOUD 

PUB/SUB
WINDOW
SENTIMENT
ANALYSIS
AGGREGATE
STATISTICS
WRITE TO
BIG QUERY
DATAFLOW
KUBERNETES

CLUSTER
THANKS!
Any questions?
You can find me at:
@andrewlarimer
andrew.larimer@springml.com
37

Weitere ähnliche Inhalte

Mehr von Bill Liu

Mehr von Bill Liu (20)

Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19
 
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsHighly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
 
Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine Learning
 
Weekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on MobileWeekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on Mobile
 
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine LearningWeekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
 
AISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with MicroeconomicsAISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with Microeconomics
 
AISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First WorldAISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First World
 
AISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the EdgeAISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the Edge
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLP
 
Auto visualization and viml
Auto visualization and vimlAuto visualization and viml
Auto visualization and viml
 
Explainability and bias in AI
Explainability and bias in AIExplainability and bias in AI
Explainability and bias in AI
 
An Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchAn Introduction to Neural Architecture Search
An Introduction to Neural Architecture Search
 
weekly AI tech talk #85 ml-agents Enabling Learned Behaviors with Reinforceme...
weekly AI tech talk #85 ml-agents Enabling Learned Behaviors with Reinforceme...weekly AI tech talk #85 ml-agents Enabling Learned Behaviors with Reinforceme...
weekly AI tech talk #85 ml-agents Enabling Learned Behaviors with Reinforceme...
 
Data stream with cruise control
Data stream with cruise controlData stream with cruise control
Data stream with cruise control
 
AI in linkedin
AI in linkedinAI in linkedin
AI in linkedin
 
Deep natural language processing in search systems
Deep natural language processing in search systemsDeep natural language processing in search systems
Deep natural language processing in search systems
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

NLP: Streaming Sentiment Analysis of Live Events