SlideShare ist ein Scribd-Unternehmen logo
Adding value to NLP:
a little semantics goes a long way
Dr. Diana Maynard
University of Sheffield, UK
15 years ago in NLP, we were…
• Using ontologies for adding reasoning/additional information
to text
• Developing linguistic tools for conceptual modelling
• Semantic enrichment for ontology mapping
• Developing ways to get from text to ontologies and back
• Trying to persuade the Semantic Web community that NLP was
important
• Paul Buitelaar, Daniel Olejnik, Michael Sintek: A Protégé Plug-In for
Ontology Extraction from Text Based on Linguistic Analysis
• Fabio Ciravegna, Sam Chapman, Alexiei Dingli, Yorick Wilks:
Learning to Harvest Information for the Semantic Web
• Sanghee Kim, Paul H. Lewis, Kirk Martinez, Simon Goodall:
Question Answering Towards Automatic Augmentations of
Ontology Instances
• Naoki Sugiura, Yoshihiro Shigeta, Naoki Fukuta, Noriaki
Izumi, Takahira Yamaguchi: Towards On-the-Fly Ontology
Construction - Focusing on Ontology Quality Improvement
• John Domingue, Martin Dzbor, Enrico Motta:
Collaborative Semantic Web Browsing with Magpie
NLP at ESWS 2004
• Finding “hidden” semantic links between concepts using
gloss information from dictionaries
• (We want to know that Henry Ford makes cars)
• These days we could just check Wikipedia J
Mapping semantic relations
Merging
descriptions of
plants in Multiflora
Semantic annotation 15 years ago at Ontotext
And at the OU…
• The techniques may have changed a bit: we now have
• lots of lovely big data to play with
• open source knowledge bases
• powerful GPUs (crack cocaine for NLP students)
• deep learning (all the cool kidz J )
• But there are a number of fields where people still haven’t even
heard of the word “ontology” (or at least, they don’t know what it
means)
Many of the underlying technologies are still relevant
How to impress a “freedom of the media” researcher
“What is this
wizardry?”
The problem: too much unstructured information
Feral databases
On 26 June 1996, while driving her red Opel Calibra, Guerin stopped at a red
traffic light on the Naas Dual Carriageway near Newlands Cross, on the outskirts
of Dublin, unaware she was being followed. She was shot six times, by one of two
men sitting on a motorcycle.
The solution: categorise it and stick it in a spreadsheet
“Everybody loves spreadsheets!”
How many journalists died in 1996?
That’s great, we have lots of NLP tools to do that!
Name Date Location
City
Location
Country
Event type Type of
death
Guerin 26061996 Dublin Ireland Killing Shooting
Woe betide anyone who opens the sacred spreadsheet!
Mass panic
Let’s start a new spreadsheet!
Aaaaggghhhh!!!
Harry Potter and the Spreadsheet of Doom
Once the latest in calendar technology…
• Violations against journalists
• Disaster relief
• Scientometrics
• The B-word
Case studies: how can NLP and semantics help?
Global monitoring of violations against journalists
• The situation: cases of killing, kidnapping, enforced disappearance,
arbitrary detention and torture of journalists, associated media
personnel, often with impunity
• The task: UN SDG agenda (indicator 16.10.1): we need a
comprehensive monitoring system capturing the scope & nature of
violations (lethal & non-lethal)
• The problem: the data is a horrible mess!
• accessibility of reliable information on violations
• gaps in the data beyond killings; cannot compile data on a
wide range of violations; practical challenges of collection
• systematisation and compiling of information
• conceptual inconsistency; much information is
unstructured
Events are complex
The family of murdered Maltese anti-corruption journalist
Daphne Caruana Galizia is demanding an independent
public inquiry because she had suffered years of
intimidation.
She was killed by a car bomb near her home in October.
Her widely-read blog accused top politicians of corruption.
One of her sons, Paul, said three pet dogs were killed and
attempts were made to burn down the journalist's home.
• Several separate events, but
• are they related?
• was it the same perpetrator(s)?
• Traditional methods use a person-centric approach with no
relation between events
But we know how to understand complex events
SemEval 2018 Task on Counting Events and Participants in the Long Tail
Information categories in current monitoring efforts
ICCS Crime Classification Scheme
HURIDOCS classification scheme
Putting it all together
Reconciling information from different sources
Relation Example Issue Solution Same
Event?
Exact match Location: London
Location: London
Identical facts Two events can
be merged
Very likely
Equivalent
match
Type: murder
Type: assassination
Semantically
equivalent facts
Two events can
be merged
Very likely
No-conflict Location: London
Date: 2019
Different but
semantically
compatible facts
Facts from two
events can be
merged
Probably
Specificity
Conflict
Location: London
Location: UK
One fact is more
specific than the
other, but both are
semantically
compatible
Facts can be
merged at either
specific or
generic level
Probably
Direct
conflict
Location: London
Location: Paris
Facts conflict
semantically: both
cannot be correct
More info
needed to
resolve conflict
Unlikely
Reconciling our information
Name Date City Country Event type Type of
death
Guerin 26061996 Dublin Ireland Killing Shooting
Name Date Organisation Location Event type
Veronica Guerin 26 June
1996
Sunday Independent Ireland Murder
Name Date Organisation Location Event type Sources
Veronica
Guerin
26061996 Sunday
Independent
Dublin,
Ireland
Shooting:
deliberate, fatal
2
Automatic categorization of free text
Entities and events are extracted,
along with features, and linked to
other knowledge sources, e.g.
Wikipedia, other categorization
schemes
Social Media and Disaster Relief
• Hurricane Sandy in the US:
• 1.1 million tweets in the first day; over 20 million in total
• > 800K photos with #Sandy hashtag on Instagram
• Haze in Singapore: > 23 million
• Nepal earthquake in 2015: more than half a million posts
Aid workers in Nepal discussing strategy
Tools to help disaster
victims get aid quickly:
pinpointing geographic
locations
28
• Many NGOs are not local to the
disaster area and may not have
a good grasp of the geography
• Place names are ambiguous
• Find mentions of locations in
the text, match them to a
knowledge base, and plot them
on a map
Adding semantic information (from BabelNet) improves cross-lingual
crisis event classification
CREES Google Sheets Add-on
Semantic Technologies in Scientometrics
Opportunities:
• Ability to link different kinds of data sources to provide a
richer view of knowledge production in Europe
Challenges
• Need for a robust approach to identify and model
relevant topics
• Language (connect different kinds of data due to
terminology differences)
• Commensurability (cannot connect different kinds of
classifications)
• Flexibility (model changes over time and space)
What kind of questions do we want to answer?
• Which country published most
about waste management and
recycling in 2014?
• What happens when you look only
at the top 10% most cited?
• What kind of international
collaborations do we see?
• What about patents?
The problem:
• topics for different document types don’t match –
different classification systems
• and they don’t correlate with EU policies
Which countries published most about waste
management and recycling in 2014?
Top 10% cited
All publications
Germany Italy
UK
France Spain
Sweden
Patents
Publications
Italy
France
Denmark
Belgium
Netherlands
Spain
Germany
UK
• A composite indicator combining publications, patents and
projects shows that:
• the volume of knowledge production is highly concentrated
in large metropolitan regions, e.g. Paris, London, Munich
• some medium-sized regions are highly productive in terms of
intensity (normalised by population), e.g. Eindhoven and
Heidelberg
• some smaller areas have high volume and intensity, e.g.
Oxfordshire
• Eastern Europe shows low volume and intensity, except major
cities, but all have low intensity (except Ljubljana)
How is European knowledge distributed across regions?
• Technological production is measured by patents
• Scientific production is measured by publications
• These 2 types show different geographical distributions:
technological are more concentrated in space
• In terms of volume, Paris is the biggest cluster for both types
• Within regions, production varies a lot: London is the biggest
producer of both types, while Eindhoven is key in terms of
technological knowledge (both for volume and intensity)
• These findings reflect the different structure of public and private
knowledge
Technological vs scientific knowledge production in genomics
Specialisation Indexes in Biotechnology around Europe
The Semantic Approach
Perspectives on
CO2 capture and
storage
Filipp Johnsson
Published 14-04-11
SC5-20-2014
H2020
Zero Emission
Robot-Boat for
Coastal and Inland
Water Monitoring
What is the innovation
performance of France
on climate change
compared with
Germany?
Policy Ontology Data
6687 2007 0
LED module with
gold bonding.
Processes or
apparatus
specially adapted
for the
manufacture or
treatment of
semiconductor
In a nutshell:
• We need to know which topics each document is talking about
(multi-class classification)
• But we have to connect these topics together coherently
Ontologies connect information
Find more information
about the topicLink related topics
Link with other sources
(Nature.com, skos,
DBpedia…)
From ontology to data
1. Create ontology of topics representing KET and SGC
• From existing classifications, policy documents, expert users,
and data
2. Automatically generate collections of keywords
• NLP techniques (term extraction, word embeddings) from large
training dataset
• Ranking and scoring algorithms to decide:
• Which topic(s) to match the keywords to?
• Which are the best keywords?
• Which are the best keyword combinations?
3. For each document, decide which topics best fit it
• based on keywords and scoring algorithms
energy storage
storage of energy
accumulator
hydraulic accumulator
capacitor
Creating and
populating the
ontology
1. Create ontology
structure (classes &
subclasses)
2. Add extra information
(descriptions, links,
alternate class names)
3. Ontology population:
generate lists of terms
associated with each
class
SGC Topics and SubTopics
Linking information from external sources
Link to more
information
Ontology population
Sustainable development of urban areas is a challenge of key
importance. It requires new, efficient, and user-friendly technologies and
services, in particular in the areas of energy, transport and ICT. However,
these solutions need integrated approaches, both in terms of research and
development of advanced technological solutions, as well as deployment.
The focus on smart cities technologies will result in commercial-scale
solutions with a high market potential.
1. Automatically generate keywords from class names,
descriptions, and related information (e.g. DBpedia, skos,
etc.) using term recognition tools
2. Enrich using word embeddings
3. Score the keywords according to how representative they
are of that class
4. Generate prior probabilities using PMI for term
combinations, based on frequency of co-occurrence
• Data sources are annotated against the ontologies
• each document is associated with one or more topics
• Sophisticated NLP matching and scoring of terms in the
documents with ontology
• A REST service accepts documents, scores and classifies them
according to the ontology, and returns classification and keyword
information
• Several million documents can be processed in about a week
(using around 12 threads)
• Annotated data sources are then used to build indicators
• e.g. for each topic, how many publications and in which
region?
Annotating Data with Ontologies
{"classification":
“http://www.gate.ac.uk/ns/ontologies/knowmak/antibiotics":
{ "boostedBy":
"http://www.gate.ac.uk/ns/ontologies/knowmak/antimicrobials",
"keywords": {
"antibiotics": {
"kinds": [ "generated", "preferred" ],
"score": 1.1527377521613833
},
"bacteria": {
"kinds": ["generated"],
"score": 0.5763688760806917
...
},
"score": [ 4.322766570605188, 4.4159785333 ],
"topicID": "38",
"unboostedScore": [ 2.5936599423631126, 3.75354899915 ],
},
Example of patent annotation
Protein stabilized pharmacologically active agents, methods for the
preparation thereof and methods for the use thereof
In accordance with the present invention, there are provided compositions and
methods useful for the in vivo delivery of substantially water-insoluble
pharmacologically active agents (such as the anti-cancer drug paclitaxel) in which
the pharmacologically active agent is delivered in the form of suspended particles
coated with protein (which acts as a stabilizing agent)…..
• RNA vaccines: (agent, protein, vaccine)
• anti-viral agents: (protein, anti-cancer, drug)
• protein vaccines: (protein, vaccine, antimicrobial)
KET: Industrial biotechnology
SGC: Health
Ongoing Challenges
Inconsistencies
• ontology design has to be tailored to user needs, but these are
not uniform
Automation
• keyword-based approach still requires some manual
intervention for best results
Accuracy
• language processing is never 100% accurate
Evaluation
• how do we know if/when it’s good enough?
• Determine weighting mechanisms; cut-off thresholds…
The future?
• integration of existing classification and modelling approaches
with our semantics
• Built on top of annotated text in GATE
• can be used to index and search over text, annotations,
semantic metadata
• allow queries that arbitrarily mix full-text, structural,
linguistic and semantic annotations
• open source
• you can issue ridiculously complex queries
GATE MIMIR and Prospector
{
{Person gender=male sparql = "SELECT ?inst WHERE {
?inst :party
<http://dbpedia.org/resource/Labour_Party_%28UK%29> .
?inst :almaMater
<http://dbpedia.org/resource/University_of_Edinburgh> }"
}
[0..3] root:say
[0..5] {Money currency=“£” value > 2000000}
}
IN
{Document date > 20050000 topic=uk_health}
Prospecting Biomedical Literature
Choosing A Specific Instance
What diseases are in these documents?
What pathogens?
Disease vs Disease Co-ocurrences
Diseases vs Pathogens
• How do people in different
parts of the country talk
about elections and
political events?
• How do the MPs talk about
different topics?
• How does the public
respond to them?
• What kind of people send
hate speech, who to, and
what about?
Social media and politics
Hate speech against politicians Jan-Feb 2019
Annotation of tweets with simple DBpedia and NUTS linking
lets us index and query our data in interesting ways
Who gets the abuse?
• Click to edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
20172015
Top abusive words
Words vs people
20192017
Topics of abuse per party
Local and national news coverage of Brexit
• Green/blue indicates areas where local coverage outweighed
national coverage
• Across the regions, Twitter remainers showed closer congruence
with local press than Twitter leavers
• We have lots of cool technology and data
• It doesn’t always need to be complex, it just needs to be
applied
• There are a number of areas where semantic
technologies are still really unknown
• New emerging domains of interest: journalism, politics,
law, humanities (literature, geography, history…)
• Don’t neglect interdisciplinary “borrowing”
• The usual disclaimers about accuracy – real-world
evaluation
So where are we at?
Acknowledgements
This work supported by the European Union/EU under the Information and
Communication Technologies (ICT) theme of the 7th Framework and H2020
Programmes for R&D:
• KNOWMAK (726992) http://knowmak.eu
• RISIS2 (82409) http://risis2.eu
• SoBigData (654024) http://www.sobigdata.eu
• WeVerify (825297) https://weverify.eu/
• COMRADES (687847) http://www.comrades-project.eu
• EPSRC grant EP/I004327/1 and British Academy under call “The Humanities
and Social Sciences: Tackling the UK’s International Challenges”.
• Nesta http://nesta.org.uk
• Free Press Unlimited pilot project: “Developing a database for the improved
collection and systematisation of information on incidents of violations
against journalists” https://www.freepressunlimited.org
And all the GATE team in Sheffield!

Weitere ähnliche Inhalte

Was ist angesagt?

Profiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivityProfiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivity
Piet J.H. Daas
 
Big Data @ CBS
Big Data @ CBSBig Data @ CBS
Big Data @ CBS
Piet J.H. Daas
 
Social media sentiment and consumer confidence
Social media sentiment and consumer confidenceSocial media sentiment and consumer confidence
Social media sentiment and consumer confidence
Piet J.H. Daas
 
Using OSINT in times of social unrest
Using OSINT in times of social unrestUsing OSINT in times of social unrest
Using OSINT in times of social unrest
Shani Wolf
 
Creating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeCreating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With Purpose
Tyrone Grandison
 
Sentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhySentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhyDavide Feltoni Gurini
 
Info leakage 200510
Info leakage 200510Info leakage 200510
Who’s in the Gang? Revealing Coordinating Communities in Social Media
Who’s in the Gang? Revealing Coordinating Communities in Social MediaWho’s in the Gang? Revealing Coordinating Communities in Social Media
Who’s in the Gang? Revealing Coordinating Communities in Social Media
Derek Weber
 
Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...
Fredrik Olsson
 
Practical Opinion Mining for Social Media
Practical Opinion Mining for Social MediaPractical Opinion Mining for Social Media
Practical Opinion Mining for Social Media
Diana Maynard
 
Building Data-centric Media Organizations
Building Data-centric Media OrganizationsBuilding Data-centric Media Organizations
Building Data-centric Media Organizations
J T "Tom" Johnson
 
Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals
Marco Brambilla
 
Manichean Progress: Positive and Negative States of the Art in Web-Scale Data...
Manichean Progress: Positive and Negative States of the Art in Web-Scale Data...Manichean Progress: Positive and Negative States of the Art in Web-Scale Data...
Manichean Progress: Positive and Negative States of the Art in Web-Scale Data...
Lewis Shepherd
 
Online Research SRC
Online Research SRCOnline Research SRC
Online Research SRCEmily Litle
 
Lecture 7: Learning from Massive Datasets
Lecture 7: Learning from Massive DatasetsLecture 7: Learning from Massive Datasets
Lecture 7: Learning from Massive Datasets
Marina Santini
 
Towards Explainable Fact Checking
Towards Explainable Fact CheckingTowards Explainable Fact Checking
Towards Explainable Fact Checking
Isabelle Augenstein
 
Explainability for NLP
Explainability for NLPExplainability for NLP
Explainability for NLP
Isabelle Augenstein
 
The Coming Explosion of Records at FamilySearch Syllabus
The Coming Explosion of Records at FamilySearch SyllabusThe Coming Explosion of Records at FamilySearch Syllabus
The Coming Explosion of Records at FamilySearch Syllabus
bakers84
 
Maps and data esri health care 2012
Maps and data   esri health care 2012Maps and data   esri health care 2012
Maps and data esri health care 2012
J T "Tom" Johnson
 
Lect6 technologies fall 2017
Lect6 technologies fall 2017Lect6 technologies fall 2017
Lect6 technologies fall 2017
SukaynaAmeen
 

Was ist angesagt? (20)

Profiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivityProfiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivity
 
Big Data @ CBS
Big Data @ CBSBig Data @ CBS
Big Data @ CBS
 
Social media sentiment and consumer confidence
Social media sentiment and consumer confidenceSocial media sentiment and consumer confidence
Social media sentiment and consumer confidence
 
Using OSINT in times of social unrest
Using OSINT in times of social unrestUsing OSINT in times of social unrest
Using OSINT in times of social unrest
 
Creating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeCreating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With Purpose
 
Sentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhySentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and Why
 
Info leakage 200510
Info leakage 200510Info leakage 200510
Info leakage 200510
 
Who’s in the Gang? Revealing Coordinating Communities in Social Media
Who’s in the Gang? Revealing Coordinating Communities in Social MediaWho’s in the Gang? Revealing Coordinating Communities in Social Media
Who’s in the Gang? Revealing Coordinating Communities in Social Media
 
Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...
 
Practical Opinion Mining for Social Media
Practical Opinion Mining for Social MediaPractical Opinion Mining for Social Media
Practical Opinion Mining for Social Media
 
Building Data-centric Media Organizations
Building Data-centric Media OrganizationsBuilding Data-centric Media Organizations
Building Data-centric Media Organizations
 
Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals
 
Manichean Progress: Positive and Negative States of the Art in Web-Scale Data...
Manichean Progress: Positive and Negative States of the Art in Web-Scale Data...Manichean Progress: Positive and Negative States of the Art in Web-Scale Data...
Manichean Progress: Positive and Negative States of the Art in Web-Scale Data...
 
Online Research SRC
Online Research SRCOnline Research SRC
Online Research SRC
 
Lecture 7: Learning from Massive Datasets
Lecture 7: Learning from Massive DatasetsLecture 7: Learning from Massive Datasets
Lecture 7: Learning from Massive Datasets
 
Towards Explainable Fact Checking
Towards Explainable Fact CheckingTowards Explainable Fact Checking
Towards Explainable Fact Checking
 
Explainability for NLP
Explainability for NLPExplainability for NLP
Explainability for NLP
 
The Coming Explosion of Records at FamilySearch Syllabus
The Coming Explosion of Records at FamilySearch SyllabusThe Coming Explosion of Records at FamilySearch Syllabus
The Coming Explosion of Records at FamilySearch Syllabus
 
Maps and data esri health care 2012
Maps and data   esri health care 2012Maps and data   esri health care 2012
Maps and data esri health care 2012
 
Lect6 technologies fall 2017
Lect6 technologies fall 2017Lect6 technologies fall 2017
Lect6 technologies fall 2017
 

Ähnlich wie Adding value to NLP: a little semantics goes a long way

Big data and Digital Transformations in the Humanities
Big data and Digital Transformations in the HumanitiesBig data and Digital Transformations in the Humanities
Big data and Digital Transformations in the Humanities
Martin Wynne
 
F.Blin IFLA Trend Report English_dk
F.Blin IFLA Trend Report English_dkF.Blin IFLA Trend Report English_dk
F.Blin IFLA Trend Report English_dk
Frederic Blin
 
NLP Workshop Presentation at Universitat de Barcelona
NLP Workshop Presentation at Universitat de BarcelonaNLP Workshop Presentation at Universitat de Barcelona
NLP Workshop Presentation at Universitat de Barcelona
SergiPons5
 
V Rolfe - open education and innovation
V Rolfe - open education and innovationV Rolfe - open education and innovation
V Rolfe - open education and innovation
Vivien Rolfe
 
Open education projects - through the lens of innovation
Open education projects - through the lens of innovation Open education projects - through the lens of innovation
Open education projects - through the lens of innovation
BCcampus
 
Chris hankin tril_presentation_pdf.pptx
Chris hankin tril_presentation_pdf.pptxChris hankin tril_presentation_pdf.pptx
Chris hankin tril_presentation_pdf.pptx
Centre for Information Rights, University of Winchester, UK
 
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
Liliana Bounegru
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social Sciences
Chantal van Son
 
Artificial Intelligence For Investigative Reporting
Artificial Intelligence For Investigative ReportingArtificial Intelligence For Investigative Reporting
Artificial Intelligence For Investigative Reporting
Jennifer Strong
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Micah Altman
 
What the hack? – Fostering diversity through tinkering with algorithms and di...
What the hack? – Fostering diversity through tinkering with algorithms and di...What the hack? – Fostering diversity through tinkering with algorithms and di...
What the hack? – Fostering diversity through tinkering with algorithms and di...
Dan Verständig
 
Using Data for Science Journalism
Using Data for Science JournalismUsing Data for Science Journalism
Using Data for Science Journalism
Liliana Bounegru
 
Using Data for Science Journalism
Using Data for Science JournalismUsing Data for Science Journalism
Using Data for Science Journalism
Jonathan Gray
 
Introduction to Computational Social Science - Lecture 1
Introduction to Computational Social Science - Lecture 1Introduction to Computational Social Science - Lecture 1
Introduction to Computational Social Science - Lecture 1
Lauri Eloranta
 
The Rise of Data Journalism: The Making of Journalistic Knowledge through Qua...
The Rise of Data Journalism: The Making of Journalistic Knowledge through Qua...The Rise of Data Journalism: The Making of Journalistic Knowledge through Qua...
The Rise of Data Journalism: The Making of Journalistic Knowledge through Qua...
Liliana Bounegru
 
Flew dmrc 11 sept 15
Flew dmrc 11 sept 15Flew dmrc 11 sept 15
Flew dmrc 11 sept 15
Terry Flew
 
Open source and journalism: Toward new frameworks for imagining news innovation
Open source and journalism: Toward new frameworks for imagining news innovationOpen source and journalism: Toward new frameworks for imagining news innovation
Open source and journalism: Toward new frameworks for imagining news innovation
Seth Lewis
 
Data Literacy as a Human Right.pdf
Data Literacy as a Human Right.pdfData Literacy as a Human Right.pdf
Data Literacy as a Human Right.pdf
li1smc
 
SoBigData - Exploring human mobility and migration with BigData @ NTTS2017
SoBigData - Exploring human mobility and migration with BigData @ NTTS2017SoBigData - Exploring human mobility and migration with BigData @ NTTS2017
SoBigData - Exploring human mobility and migration with BigData @ NTTS2017
Vittorio Romano
 
ODC BarCamp 2013 - Introduction to Data Journalism
ODC BarCamp 2013 - Introduction to Data JournalismODC BarCamp 2013 - Introduction to Data Journalism
ODC BarCamp 2013 - Introduction to Data Journalism
Open Development Cambodia
 

Ähnlich wie Adding value to NLP: a little semantics goes a long way (20)

Big data and Digital Transformations in the Humanities
Big data and Digital Transformations in the HumanitiesBig data and Digital Transformations in the Humanities
Big data and Digital Transformations in the Humanities
 
F.Blin IFLA Trend Report English_dk
F.Blin IFLA Trend Report English_dkF.Blin IFLA Trend Report English_dk
F.Blin IFLA Trend Report English_dk
 
NLP Workshop Presentation at Universitat de Barcelona
NLP Workshop Presentation at Universitat de BarcelonaNLP Workshop Presentation at Universitat de Barcelona
NLP Workshop Presentation at Universitat de Barcelona
 
V Rolfe - open education and innovation
V Rolfe - open education and innovationV Rolfe - open education and innovation
V Rolfe - open education and innovation
 
Open education projects - through the lens of innovation
Open education projects - through the lens of innovation Open education projects - through the lens of innovation
Open education projects - through the lens of innovation
 
Chris hankin tril_presentation_pdf.pptx
Chris hankin tril_presentation_pdf.pptxChris hankin tril_presentation_pdf.pptx
Chris hankin tril_presentation_pdf.pptx
 
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social Sciences
 
Artificial Intelligence For Investigative Reporting
Artificial Intelligence For Investigative ReportingArtificial Intelligence For Investigative Reporting
Artificial Intelligence For Investigative Reporting
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
What the hack? – Fostering diversity through tinkering with algorithms and di...
What the hack? – Fostering diversity through tinkering with algorithms and di...What the hack? – Fostering diversity through tinkering with algorithms and di...
What the hack? – Fostering diversity through tinkering with algorithms and di...
 
Using Data for Science Journalism
Using Data for Science JournalismUsing Data for Science Journalism
Using Data for Science Journalism
 
Using Data for Science Journalism
Using Data for Science JournalismUsing Data for Science Journalism
Using Data for Science Journalism
 
Introduction to Computational Social Science - Lecture 1
Introduction to Computational Social Science - Lecture 1Introduction to Computational Social Science - Lecture 1
Introduction to Computational Social Science - Lecture 1
 
The Rise of Data Journalism: The Making of Journalistic Knowledge through Qua...
The Rise of Data Journalism: The Making of Journalistic Knowledge through Qua...The Rise of Data Journalism: The Making of Journalistic Knowledge through Qua...
The Rise of Data Journalism: The Making of Journalistic Knowledge through Qua...
 
Flew dmrc 11 sept 15
Flew dmrc 11 sept 15Flew dmrc 11 sept 15
Flew dmrc 11 sept 15
 
Open source and journalism: Toward new frameworks for imagining news innovation
Open source and journalism: Toward new frameworks for imagining news innovationOpen source and journalism: Toward new frameworks for imagining news innovation
Open source and journalism: Toward new frameworks for imagining news innovation
 
Data Literacy as a Human Right.pdf
Data Literacy as a Human Right.pdfData Literacy as a Human Right.pdf
Data Literacy as a Human Right.pdf
 
SoBigData - Exploring human mobility and migration with BigData @ NTTS2017
SoBigData - Exploring human mobility and migration with BigData @ NTTS2017SoBigData - Exploring human mobility and migration with BigData @ NTTS2017
SoBigData - Exploring human mobility and migration with BigData @ NTTS2017
 
ODC BarCamp 2013 - Introduction to Data Journalism
ODC BarCamp 2013 - Introduction to Data JournalismODC BarCamp 2013 - Introduction to Data Journalism
ODC BarCamp 2013 - Introduction to Data Journalism
 

Mehr von Diana Maynard

Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...
Diana Maynard
 
Getting the-most-out-of-conferences
Getting the-most-out-of-conferencesGetting the-most-out-of-conferences
Getting the-most-out-of-conferences
Diana Maynard
 
Adapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutionsAdapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutions
Diana Maynard
 
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Diana Maynard
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATE
Diana Maynard
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social media
Diana Maynard
 
Tools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media AnalysisTools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media Analysis
Diana Maynard
 
Social media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATESocial media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATE
Diana Maynard
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
Diana Maynard
 
Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?
Diana Maynard
 
Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Diana Maynard
 
Do we really know what people mean when they tweet?
Do we really know what people mean when they tweet?Do we really know what people mean when they tweet?
Do we really know what people mean when they tweet?
Diana Maynard
 
Disability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged SwordDisability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged Sword
Diana Maynard
 
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Diana Maynard
 
Multimodal opinion mining from social media
Multimodal opinion mining from social mediaMultimodal opinion mining from social media
Multimodal opinion mining from social media
Diana Maynard
 
What do you really mean when you tweet? Challenges for opinion mining on soci...
What do you really mean when you tweet? Challenges for opinion mining on soci...What do you really mean when you tweet? Challenges for opinion mining on soci...
What do you really mean when you tweet? Challenges for opinion mining on soci...
Diana Maynard
 
A tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisA tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysis
Diana Maynard
 
Practical sentiment analysis
Practical sentiment analysisPractical sentiment analysis
Practical sentiment analysis
Diana Maynard
 
Opinion mining for social media
Opinion mining for social mediaOpinion mining for social media
Opinion mining for social media
Diana Maynard
 

Mehr von Diana Maynard (20)

Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...
 
Getting the-most-out-of-conferences
Getting the-most-out-of-conferencesGetting the-most-out-of-conferences
Getting the-most-out-of-conferences
 
Adapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutionsAdapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutions
 
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATE
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social media
 
Tools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media AnalysisTools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media Analysis
 
Social media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATESocial media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATE
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
 
Cls8 decarbonet
Cls8 decarbonetCls8 decarbonet
Cls8 decarbonet
 
Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?
 
Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?
 
Do we really know what people mean when they tweet?
Do we really know what people mean when they tweet?Do we really know what people mean when they tweet?
Do we really know what people mean when they tweet?
 
Disability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged SwordDisability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged Sword
 
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
 
Multimodal opinion mining from social media
Multimodal opinion mining from social mediaMultimodal opinion mining from social media
Multimodal opinion mining from social media
 
What do you really mean when you tweet? Challenges for opinion mining on soci...
What do you really mean when you tweet? Challenges for opinion mining on soci...What do you really mean when you tweet? Challenges for opinion mining on soci...
What do you really mean when you tweet? Challenges for opinion mining on soci...
 
A tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisA tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysis
 
Practical sentiment analysis
Practical sentiment analysisPractical sentiment analysis
Practical sentiment analysis
 
Opinion mining for social media
Opinion mining for social mediaOpinion mining for social media
Opinion mining for social media
 

Kürzlich hochgeladen

Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Kürzlich hochgeladen (20)

Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

Adding value to NLP: a little semantics goes a long way

  • 1. Adding value to NLP: a little semantics goes a long way Dr. Diana Maynard University of Sheffield, UK
  • 2. 15 years ago in NLP, we were… • Using ontologies for adding reasoning/additional information to text • Developing linguistic tools for conceptual modelling • Semantic enrichment for ontology mapping • Developing ways to get from text to ontologies and back • Trying to persuade the Semantic Web community that NLP was important
  • 3. • Paul Buitelaar, Daniel Olejnik, Michael Sintek: A Protégé Plug-In for Ontology Extraction from Text Based on Linguistic Analysis • Fabio Ciravegna, Sam Chapman, Alexiei Dingli, Yorick Wilks: Learning to Harvest Information for the Semantic Web • Sanghee Kim, Paul H. Lewis, Kirk Martinez, Simon Goodall: Question Answering Towards Automatic Augmentations of Ontology Instances • Naoki Sugiura, Yoshihiro Shigeta, Naoki Fukuta, Noriaki Izumi, Takahira Yamaguchi: Towards On-the-Fly Ontology Construction - Focusing on Ontology Quality Improvement • John Domingue, Martin Dzbor, Enrico Motta: Collaborative Semantic Web Browsing with Magpie NLP at ESWS 2004
  • 4. • Finding “hidden” semantic links between concepts using gloss information from dictionaries • (We want to know that Henry Ford makes cars) • These days we could just check Wikipedia J Mapping semantic relations
  • 6. Semantic annotation 15 years ago at Ontotext
  • 7. And at the OU…
  • 8. • The techniques may have changed a bit: we now have • lots of lovely big data to play with • open source knowledge bases • powerful GPUs (crack cocaine for NLP students) • deep learning (all the cool kidz J ) • But there are a number of fields where people still haven’t even heard of the word “ontology” (or at least, they don’t know what it means) Many of the underlying technologies are still relevant
  • 9. How to impress a “freedom of the media” researcher “What is this wizardry?”
  • 10. The problem: too much unstructured information Feral databases On 26 June 1996, while driving her red Opel Calibra, Guerin stopped at a red traffic light on the Naas Dual Carriageway near Newlands Cross, on the outskirts of Dublin, unaware she was being followed. She was shot six times, by one of two men sitting on a motorcycle. The solution: categorise it and stick it in a spreadsheet “Everybody loves spreadsheets!” How many journalists died in 1996?
  • 11. That’s great, we have lots of NLP tools to do that! Name Date Location City Location Country Event type Type of death Guerin 26061996 Dublin Ireland Killing Shooting
  • 12. Woe betide anyone who opens the sacred spreadsheet! Mass panic Let’s start a new spreadsheet! Aaaaggghhhh!!!
  • 13. Harry Potter and the Spreadsheet of Doom
  • 14. Once the latest in calendar technology…
  • 15. • Violations against journalists • Disaster relief • Scientometrics • The B-word Case studies: how can NLP and semantics help?
  • 16. Global monitoring of violations against journalists • The situation: cases of killing, kidnapping, enforced disappearance, arbitrary detention and torture of journalists, associated media personnel, often with impunity • The task: UN SDG agenda (indicator 16.10.1): we need a comprehensive monitoring system capturing the scope & nature of violations (lethal & non-lethal) • The problem: the data is a horrible mess! • accessibility of reliable information on violations • gaps in the data beyond killings; cannot compile data on a wide range of violations; practical challenges of collection • systematisation and compiling of information • conceptual inconsistency; much information is unstructured
  • 17. Events are complex The family of murdered Maltese anti-corruption journalist Daphne Caruana Galizia is demanding an independent public inquiry because she had suffered years of intimidation. She was killed by a car bomb near her home in October. Her widely-read blog accused top politicians of corruption. One of her sons, Paul, said three pet dogs were killed and attempts were made to burn down the journalist's home. • Several separate events, but • are they related? • was it the same perpetrator(s)? • Traditional methods use a person-centric approach with no relation between events
  • 18. But we know how to understand complex events SemEval 2018 Task on Counting Events and Participants in the Long Tail
  • 19. Information categories in current monitoring efforts
  • 22. Putting it all together
  • 23. Reconciling information from different sources Relation Example Issue Solution Same Event? Exact match Location: London Location: London Identical facts Two events can be merged Very likely Equivalent match Type: murder Type: assassination Semantically equivalent facts Two events can be merged Very likely No-conflict Location: London Date: 2019 Different but semantically compatible facts Facts from two events can be merged Probably Specificity Conflict Location: London Location: UK One fact is more specific than the other, but both are semantically compatible Facts can be merged at either specific or generic level Probably Direct conflict Location: London Location: Paris Facts conflict semantically: both cannot be correct More info needed to resolve conflict Unlikely
  • 24. Reconciling our information Name Date City Country Event type Type of death Guerin 26061996 Dublin Ireland Killing Shooting Name Date Organisation Location Event type Veronica Guerin 26 June 1996 Sunday Independent Ireland Murder Name Date Organisation Location Event type Sources Veronica Guerin 26061996 Sunday Independent Dublin, Ireland Shooting: deliberate, fatal 2
  • 25. Automatic categorization of free text Entities and events are extracted, along with features, and linked to other knowledge sources, e.g. Wikipedia, other categorization schemes
  • 26. Social Media and Disaster Relief • Hurricane Sandy in the US: • 1.1 million tweets in the first day; over 20 million in total • > 800K photos with #Sandy hashtag on Instagram • Haze in Singapore: > 23 million • Nepal earthquake in 2015: more than half a million posts
  • 27. Aid workers in Nepal discussing strategy
  • 28. Tools to help disaster victims get aid quickly: pinpointing geographic locations 28 • Many NGOs are not local to the disaster area and may not have a good grasp of the geography • Place names are ambiguous • Find mentions of locations in the text, match them to a knowledge base, and plot them on a map
  • 29. Adding semantic information (from BabelNet) improves cross-lingual crisis event classification CREES Google Sheets Add-on
  • 30. Semantic Technologies in Scientometrics Opportunities: • Ability to link different kinds of data sources to provide a richer view of knowledge production in Europe Challenges • Need for a robust approach to identify and model relevant topics • Language (connect different kinds of data due to terminology differences) • Commensurability (cannot connect different kinds of classifications) • Flexibility (model changes over time and space)
  • 31. What kind of questions do we want to answer? • Which country published most about waste management and recycling in 2014? • What happens when you look only at the top 10% most cited? • What kind of international collaborations do we see? • What about patents? The problem: • topics for different document types don’t match – different classification systems • and they don’t correlate with EU policies
  • 32. Which countries published most about waste management and recycling in 2014?
  • 33. Top 10% cited All publications Germany Italy UK France Spain Sweden
  • 35. • A composite indicator combining publications, patents and projects shows that: • the volume of knowledge production is highly concentrated in large metropolitan regions, e.g. Paris, London, Munich • some medium-sized regions are highly productive in terms of intensity (normalised by population), e.g. Eindhoven and Heidelberg • some smaller areas have high volume and intensity, e.g. Oxfordshire • Eastern Europe shows low volume and intensity, except major cities, but all have low intensity (except Ljubljana) How is European knowledge distributed across regions?
  • 36. • Technological production is measured by patents • Scientific production is measured by publications • These 2 types show different geographical distributions: technological are more concentrated in space • In terms of volume, Paris is the biggest cluster for both types • Within regions, production varies a lot: London is the biggest producer of both types, while Eindhoven is key in terms of technological knowledge (both for volume and intensity) • These findings reflect the different structure of public and private knowledge Technological vs scientific knowledge production in genomics
  • 37. Specialisation Indexes in Biotechnology around Europe
  • 38. The Semantic Approach Perspectives on CO2 capture and storage Filipp Johnsson Published 14-04-11 SC5-20-2014 H2020 Zero Emission Robot-Boat for Coastal and Inland Water Monitoring What is the innovation performance of France on climate change compared with Germany? Policy Ontology Data 6687 2007 0 LED module with gold bonding. Processes or apparatus specially adapted for the manufacture or treatment of semiconductor In a nutshell: • We need to know which topics each document is talking about (multi-class classification) • But we have to connect these topics together coherently
  • 39. Ontologies connect information Find more information about the topicLink related topics Link with other sources (Nature.com, skos, DBpedia…)
  • 40. From ontology to data 1. Create ontology of topics representing KET and SGC • From existing classifications, policy documents, expert users, and data 2. Automatically generate collections of keywords • NLP techniques (term extraction, word embeddings) from large training dataset • Ranking and scoring algorithms to decide: • Which topic(s) to match the keywords to? • Which are the best keywords? • Which are the best keyword combinations? 3. For each document, decide which topics best fit it • based on keywords and scoring algorithms energy storage storage of energy accumulator hydraulic accumulator capacitor
  • 41. Creating and populating the ontology 1. Create ontology structure (classes & subclasses) 2. Add extra information (descriptions, links, alternate class names) 3. Ontology population: generate lists of terms associated with each class
  • 42. SGC Topics and SubTopics
  • 43. Linking information from external sources Link to more information
  • 44.
  • 45. Ontology population Sustainable development of urban areas is a challenge of key importance. It requires new, efficient, and user-friendly technologies and services, in particular in the areas of energy, transport and ICT. However, these solutions need integrated approaches, both in terms of research and development of advanced technological solutions, as well as deployment. The focus on smart cities technologies will result in commercial-scale solutions with a high market potential. 1. Automatically generate keywords from class names, descriptions, and related information (e.g. DBpedia, skos, etc.) using term recognition tools 2. Enrich using word embeddings 3. Score the keywords according to how representative they are of that class 4. Generate prior probabilities using PMI for term combinations, based on frequency of co-occurrence
  • 46. • Data sources are annotated against the ontologies • each document is associated with one or more topics • Sophisticated NLP matching and scoring of terms in the documents with ontology • A REST service accepts documents, scores and classifies them according to the ontology, and returns classification and keyword information • Several million documents can be processed in about a week (using around 12 threads) • Annotated data sources are then used to build indicators • e.g. for each topic, how many publications and in which region? Annotating Data with Ontologies
  • 47. {"classification": “http://www.gate.ac.uk/ns/ontologies/knowmak/antibiotics": { "boostedBy": "http://www.gate.ac.uk/ns/ontologies/knowmak/antimicrobials", "keywords": { "antibiotics": { "kinds": [ "generated", "preferred" ], "score": 1.1527377521613833 }, "bacteria": { "kinds": ["generated"], "score": 0.5763688760806917 ...
}, "score": [ 4.322766570605188, 4.4159785333 ], "topicID": "38", "unboostedScore": [ 2.5936599423631126, 3.75354899915 ], },
  • 48. Example of patent annotation Protein stabilized pharmacologically active agents, methods for the preparation thereof and methods for the use thereof In accordance with the present invention, there are provided compositions and methods useful for the in vivo delivery of substantially water-insoluble pharmacologically active agents (such as the anti-cancer drug paclitaxel) in which the pharmacologically active agent is delivered in the form of suspended particles coated with protein (which acts as a stabilizing agent)….. • RNA vaccines: (agent, protein, vaccine) • anti-viral agents: (protein, anti-cancer, drug) • protein vaccines: (protein, vaccine, antimicrobial) KET: Industrial biotechnology SGC: Health
  • 49. Ongoing Challenges Inconsistencies • ontology design has to be tailored to user needs, but these are not uniform Automation • keyword-based approach still requires some manual intervention for best results Accuracy • language processing is never 100% accurate Evaluation • how do we know if/when it’s good enough? • Determine weighting mechanisms; cut-off thresholds… The future? • integration of existing classification and modelling approaches with our semantics
  • 50. • Built on top of annotated text in GATE • can be used to index and search over text, annotations, semantic metadata • allow queries that arbitrarily mix full-text, structural, linguistic and semantic annotations • open source • you can issue ridiculously complex queries GATE MIMIR and Prospector
  • 51. { {Person gender=male sparql = "SELECT ?inst WHERE { ?inst :party <http://dbpedia.org/resource/Labour_Party_%28UK%29> . ?inst :almaMater <http://dbpedia.org/resource/University_of_Edinburgh> }" } [0..3] root:say [0..5] {Money currency=“£” value > 2000000} } IN {Document date > 20050000 topic=uk_health}
  • 54. What diseases are in these documents?
  • 56. Disease vs Disease Co-ocurrences
  • 58. • How do people in different parts of the country talk about elections and political events? • How do the MPs talk about different topics? • How does the public respond to them? • What kind of people send hate speech, who to, and what about? Social media and politics
  • 59. Hate speech against politicians Jan-Feb 2019 Annotation of tweets with simple DBpedia and NUTS linking lets us index and query our data in interesting ways
  • 60. Who gets the abuse? • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level 20172015
  • 64. Local and national news coverage of Brexit • Green/blue indicates areas where local coverage outweighed national coverage • Across the regions, Twitter remainers showed closer congruence with local press than Twitter leavers
  • 65. • We have lots of cool technology and data • It doesn’t always need to be complex, it just needs to be applied • There are a number of areas where semantic technologies are still really unknown • New emerging domains of interest: journalism, politics, law, humanities (literature, geography, history…) • Don’t neglect interdisciplinary “borrowing” • The usual disclaimers about accuracy – real-world evaluation So where are we at?
  • 66. Acknowledgements This work supported by the European Union/EU under the Information and Communication Technologies (ICT) theme of the 7th Framework and H2020 Programmes for R&D: • KNOWMAK (726992) http://knowmak.eu • RISIS2 (82409) http://risis2.eu • SoBigData (654024) http://www.sobigdata.eu • WeVerify (825297) https://weverify.eu/ • COMRADES (687847) http://www.comrades-project.eu • EPSRC grant EP/I004327/1 and British Academy under call “The Humanities and Social Sciences: Tackling the UK’s International Challenges”. • Nesta http://nesta.org.uk • Free Press Unlimited pilot project: “Developing a database for the improved collection and systematisation of information on incidents of violations against journalists” https://www.freepressunlimited.org And all the GATE team in Sheffield!