SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
NLP in 10 lines of code
Andraž Hribernik
AGENDA
1. NLP analysis of Pride & Prejudice
○ Introduction to spaCy API
○ Extract characters and visualize them relative to their position in the book
○ Extract adjectives that describes a character in the book
2. How we use spaCy at Cytora
Pride & Prejudice by Jane Austen
What is the book about?
○ 5 unmarried Bennet daughters
○ 2 young, wealthy gentlemen (Mr Bingley & Mr
Darcy) move into their neighbourhood
○ The oldest Bennet daughters (Jane & Elizabeth)
become involved with said gentlemen
Recreate the plot in 10 lines of code!
1. Parse text
2. Extract named entities
3. Keep only personal named entities
4. Get offset for every extracted entity
5. Plot the graph
1. Parse text
import spacy
nlp = spacy.load('en')
text = open('pride_and_prejudice.txt').read()
processed_text = nlp(text)
2. Extract named entities
import spacy
nlp = spacy.load('en')
text = open('pride_and_prejudice.txt').read()
processed_text = nlp(text)
for ent in processed_text.ents[:7]:
print(ent.text, ent.label_)
Output:
The Project Gutenberg EBook of ORG
Jane Austen PERSON
the Project Gutenberg License ORG
www.gutenberg.org FAC
Pride ORG
Jane Austen PERSON
August 26, 2008 DATE
3. Keep only personal named entities
import spacy
nlp = spacy.load('en')
text = open('pride_and_prejudice.txt').read()
processed_text = nlp(text)
for ent in processed_text.ents[300:310]:
if ent.label_ == 'PERSON':
print(ent.text, ent.label_)
Output:
Bingley PERSON
Elizabeth PERSON
Darcy PERSON
William Lucas PERSON
Darcy PERSON
4. Get offset for every extracted entity
...
processed_text = nlp(text)
character_offsets = defaultdict(list)
for ent in processed_text.ents:
if ent.label_ == 'PERSON':
character_offsets[ent.text].append(ent.start)
print(character_offsets['Elizabeth'][:5])
print(character_offsets['Darcy'][:5])
print(processed_text[1422])
print(processed_text[3229])
Output:
[1422, 3670, 3759, 3867, 4532]
[3005, 3229, 3367, 3410, 3754]
Elizabeth
Darcy
5. Plot the graph
from collections import defaultdict
import spacy
nlp = spacy.load('en')
text = open('pride_and_prejudice.txt').read()
processed_text = nlp(text)
character_offsets = defaultdict(list)
for ent in processed_text.ents:
if ent.label_ == 'PERSON':
character_offsets[ent.lemma_].append(ent.start)
plot_character_timeseries(character_offsets, ['darcy', 'bingley'])
Demo
Describe Mr Darcy
Describe Mr Darcy
● Automatically describe Mr Darcy (e.g. silent, tall, young, etc)
● We can solve this problem using syntactic dependencies that are part of
spaCy API
● Syntactic dependencies could be very nicely visualized with displaCy
Describe Mr Darcy
adjective modifier
Extract all ‘amod’ dependencies in entities subtree
darcy_adjectives = []
darcy_ents = [ent for ent in processed_text.ents if
ent.lemma_ == 'darcy']
for ent in darcy_ents:
for token in ent.subtree:
if token.dep_ == 'amod':
darcy_adjectives.append(token.lemma_)
print(set(darcy_adjectives))
Output:
{'handsome', 'last', 'grave', 'silent',
'particular', 'young', 'poor',
'abominable', 'disappointing',
'disagreeable', 'confidential', 'late',
'little', 'charming', 'present',
'intimate'}
Describe Mr Darcy
adjective complement
noun subject
Extract all ‘acomp’ from entity’s root subtree
for ent in darcy_ents:
if ent.root.dep_ == 'nsubj':
for child in ent.root.head.children:
if child.dep_ == 'acomp':
darcy_adjectives.append(child.lemma_)
Output:
{'kind', 'ashamed', 'impatient',
'answerable', 'sorry', 'unworthy',
'grow', 'fond', 'proud', 'engaged',
'little', 'clever', 'worth', 'tall',
'studious', 'punctual'}
Pros & Cons of syntactic dependencies approach
● Training dataset is not needed
● Intuitive
● From our experiences, you can
achieve decent extraction
precision
● Our approach achieved very
poor recall
● Spacy dependency parsing
always works inside a single
sentence only
What is our mission at Cytora?
spaCy at Cytora
● We process 2M documents everyday with spaCy
● Named entity recognition (geolocations, actors)
● Dependency parsing (impact metric extraction)
● Integrated Word Embeddings (preprocessing for DL models)
Cytora is hiring!
● Data Engineer
● Data Science Analyst
● Risk Modeler
All open positions
Thank you!
https://github.com/cytora/pycon-nlp-in-10-lines
https://spacy.io/
https://demos.explosion.ai/displacy/
http://www.cytora.com/
andraz@cytora.com

Weitere ähnliche Inhalte

Andere mochten auch

How to use NLP in Business
How to use NLP in BusinessHow to use NLP in Business
How to use NLP in BusinessMorgan PR
 
What is Neuro Linguistic Programming (NLP)
What is Neuro Linguistic Programming (NLP)What is Neuro Linguistic Programming (NLP)
What is Neuro Linguistic Programming (NLP)Fiona Campbell
 
Rich relational data from thin air john stinson
Rich relational data from thin air   john stinsonRich relational data from thin air   john stinson
Rich relational data from thin air john stinsonJohn Stinson
 
NLP for Everyday People
NLP for Everyday PeopleNLP for Everyday People
NLP for Everyday PeopleRebecca Bilbro
 
Natural language processing
Natural language processingNatural language processing
Natural language processingHansi Thenuwara
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
Natural language processing
Natural language processingNatural language processing
Natural language processingprashantdahake
 
Advanced Communications Using NLP Methods
Advanced Communications Using NLP MethodsAdvanced Communications Using NLP Methods
Advanced Communications Using NLP MethodsDr.Arivalan Ramaiyah
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingrohitnayak
 
Four ‘Magic’ Questions that Help Resolve Most Problems - Introduction to The ...
Four ‘Magic’ Questions that Help Resolve Most Problems - Introduction to The ...Four ‘Magic’ Questions that Help Resolve Most Problems - Introduction to The ...
Four ‘Magic’ Questions that Help Resolve Most Problems - Introduction to The ...Fiona Campbell
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Slides For Nlp(Anchoring)
Slides For Nlp(Anchoring)Slides For Nlp(Anchoring)
Slides For Nlp(Anchoring)Alwyn Lau
 
The State of AI 2016
The State of AI 2016The State of AI 2016
The State of AI 2016Ines Montani
 
150 Tips Tricks and Ideas for Personal Branding
150 Tips Tricks and Ideas for Personal Branding150 Tips Tricks and Ideas for Personal Branding
150 Tips Tricks and Ideas for Personal BrandingKyle Lacy
 

Andere mochten auch (20)

How to use NLP in Business
How to use NLP in BusinessHow to use NLP in Business
How to use NLP in Business
 
What is Neuro Linguistic Programming (NLP)
What is Neuro Linguistic Programming (NLP)What is Neuro Linguistic Programming (NLP)
What is Neuro Linguistic Programming (NLP)
 
Applications of NLP: Part-10 By Ms. Rukmini Iyer
Applications of NLP: Part-10 By Ms. Rukmini Iyer Applications of NLP: Part-10 By Ms. Rukmini Iyer
Applications of NLP: Part-10 By Ms. Rukmini Iyer
 
Applications of NLP: Part 8
Applications of NLP: Part 8Applications of NLP: Part 8
Applications of NLP: Part 8
 
Rich relational data from thin air john stinson
Rich relational data from thin air   john stinsonRich relational data from thin air   john stinson
Rich relational data from thin air john stinson
 
NLP for Everyday People
NLP for Everyday PeopleNLP for Everyday People
NLP for Everyday People
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLP for Business Owners/Enterpreneurs : Applying Neuro Linguistic Programming...
NLP for Business Owners/Enterpreneurs : Applying Neuro Linguistic Programming...NLP for Business Owners/Enterpreneurs : Applying Neuro Linguistic Programming...
NLP for Business Owners/Enterpreneurs : Applying Neuro Linguistic Programming...
 
Advanced Communications Using NLP Methods
Advanced Communications Using NLP MethodsAdvanced Communications Using NLP Methods
Advanced Communications Using NLP Methods
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Four ‘Magic’ Questions that Help Resolve Most Problems - Introduction to The ...
Four ‘Magic’ Questions that Help Resolve Most Problems - Introduction to The ...Four ‘Magic’ Questions that Help Resolve Most Problems - Introduction to The ...
Four ‘Magic’ Questions that Help Resolve Most Problems - Introduction to The ...
 
NLP
NLPNLP
NLP
 
Neuro linguistic programming(nlp)
Neuro linguistic programming(nlp)Neuro linguistic programming(nlp)
Neuro linguistic programming(nlp)
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Slides For Nlp(Anchoring)
Slides For Nlp(Anchoring)Slides For Nlp(Anchoring)
Slides For Nlp(Anchoring)
 
The State of AI 2016
The State of AI 2016The State of AI 2016
The State of AI 2016
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
150 Tips Tricks and Ideas for Personal Branding
150 Tips Tricks and Ideas for Personal Branding150 Tips Tricks and Ideas for Personal Branding
150 Tips Tricks and Ideas for Personal Branding
 

Kürzlich hochgeladen

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Kürzlich hochgeladen (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

NLP in 10 lines of code

  • 1. NLP in 10 lines of code Andraž Hribernik
  • 2. AGENDA 1. NLP analysis of Pride & Prejudice ○ Introduction to spaCy API ○ Extract characters and visualize them relative to their position in the book ○ Extract adjectives that describes a character in the book 2. How we use spaCy at Cytora
  • 3. Pride & Prejudice by Jane Austen What is the book about? ○ 5 unmarried Bennet daughters ○ 2 young, wealthy gentlemen (Mr Bingley & Mr Darcy) move into their neighbourhood ○ The oldest Bennet daughters (Jane & Elizabeth) become involved with said gentlemen
  • 4.
  • 5. Recreate the plot in 10 lines of code! 1. Parse text 2. Extract named entities 3. Keep only personal named entities 4. Get offset for every extracted entity 5. Plot the graph
  • 6. 1. Parse text import spacy nlp = spacy.load('en') text = open('pride_and_prejudice.txt').read() processed_text = nlp(text)
  • 7. 2. Extract named entities import spacy nlp = spacy.load('en') text = open('pride_and_prejudice.txt').read() processed_text = nlp(text) for ent in processed_text.ents[:7]: print(ent.text, ent.label_) Output: The Project Gutenberg EBook of ORG Jane Austen PERSON the Project Gutenberg License ORG www.gutenberg.org FAC Pride ORG Jane Austen PERSON August 26, 2008 DATE
  • 8. 3. Keep only personal named entities import spacy nlp = spacy.load('en') text = open('pride_and_prejudice.txt').read() processed_text = nlp(text) for ent in processed_text.ents[300:310]: if ent.label_ == 'PERSON': print(ent.text, ent.label_) Output: Bingley PERSON Elizabeth PERSON Darcy PERSON William Lucas PERSON Darcy PERSON
  • 9. 4. Get offset for every extracted entity ... processed_text = nlp(text) character_offsets = defaultdict(list) for ent in processed_text.ents: if ent.label_ == 'PERSON': character_offsets[ent.text].append(ent.start) print(character_offsets['Elizabeth'][:5]) print(character_offsets['Darcy'][:5]) print(processed_text[1422]) print(processed_text[3229]) Output: [1422, 3670, 3759, 3867, 4532] [3005, 3229, 3367, 3410, 3754] Elizabeth Darcy
  • 10. 5. Plot the graph from collections import defaultdict import spacy nlp = spacy.load('en') text = open('pride_and_prejudice.txt').read() processed_text = nlp(text) character_offsets = defaultdict(list) for ent in processed_text.ents: if ent.label_ == 'PERSON': character_offsets[ent.lemma_].append(ent.start) plot_character_timeseries(character_offsets, ['darcy', 'bingley'])
  • 11. Demo
  • 13. Describe Mr Darcy ● Automatically describe Mr Darcy (e.g. silent, tall, young, etc) ● We can solve this problem using syntactic dependencies that are part of spaCy API ● Syntactic dependencies could be very nicely visualized with displaCy
  • 15. Extract all ‘amod’ dependencies in entities subtree darcy_adjectives = [] darcy_ents = [ent for ent in processed_text.ents if ent.lemma_ == 'darcy'] for ent in darcy_ents: for token in ent.subtree: if token.dep_ == 'amod': darcy_adjectives.append(token.lemma_) print(set(darcy_adjectives)) Output: {'handsome', 'last', 'grave', 'silent', 'particular', 'young', 'poor', 'abominable', 'disappointing', 'disagreeable', 'confidential', 'late', 'little', 'charming', 'present', 'intimate'}
  • 16. Describe Mr Darcy adjective complement noun subject
  • 17. Extract all ‘acomp’ from entity’s root subtree for ent in darcy_ents: if ent.root.dep_ == 'nsubj': for child in ent.root.head.children: if child.dep_ == 'acomp': darcy_adjectives.append(child.lemma_) Output: {'kind', 'ashamed', 'impatient', 'answerable', 'sorry', 'unworthy', 'grow', 'fond', 'proud', 'engaged', 'little', 'clever', 'worth', 'tall', 'studious', 'punctual'}
  • 18. Pros & Cons of syntactic dependencies approach ● Training dataset is not needed ● Intuitive ● From our experiences, you can achieve decent extraction precision ● Our approach achieved very poor recall ● Spacy dependency parsing always works inside a single sentence only
  • 19. What is our mission at Cytora?
  • 20. spaCy at Cytora ● We process 2M documents everyday with spaCy ● Named entity recognition (geolocations, actors) ● Dependency parsing (impact metric extraction) ● Integrated Word Embeddings (preprocessing for DL models)
  • 21. Cytora is hiring! ● Data Engineer ● Data Science Analyst ● Risk Modeler All open positions