Get guidance through the gigantic sea of freely available Open Data and learn how it can empower you analysis of any kind of sources.
This webinar is a live demo of news and data analytics, based on rich links within big knowledge graphs. It will show you how to:
Build ranking reports (e.g for people and organisations)
View topics linked implicitly (e.g. daughter companies, key personnel, products …)
Draw trend lines
Extend your analytics with additional data sources
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
Boost your data analytics with open data and public news content
1. Boost Your Data Analytics with Open
Data and Public News Content
Ontotext Webinar, 24 Mar 2016
2. Presentation Outline – PART I
• Quick news-analytics case
• Technology approach
• FactForge-News: Data architecture
• Sample queries on Linked Open Data
• News analytics examples
• Today’s News Map
Mar 2016Open Data & News Analytics 2
3. Quick news-analytics case
Mar 2016Open Data & News Analytics 3
• Our Dynamic Semantic
Publishing platform
already offers linking
of text with big open
data graphs
• One can get navigate
from text to concepts,
get trends, related
entities and news
• Try it at
http://now.ontotext.com
4. Presentation Outline
• Quick news-analytics case
• Technology approach
• FactForge-News: Data architecture
• Sample queries on Linked Open Data
• News analytics examples
• Today’s News Map
Mar 2016Open Data & News Analytics 4
5. Our approach to Big Data
1. Integrate relevant data from many sources
− Build a Big Knowledge Graph from proprietary databases and
taxonomies integrated with millions of facts of Linked Data
2. Infer new facts and unveil relationships
− Performing reasoning across data from different sources
3. Interlink text and with big data
− Using text-mining to automatically discover references to
concepts and entities
4. Use NoSQL graph database for metadata
management, querying and search
Mar 2016Open Data & News Analytics #5
6. NoSQL Graph
Database
Mar 2016Open Data & News Analytics 6
myData: Maria
ptop:Agent
ptop:Person
ptop:Woman
ptop:childOf
ptop:parentOf
rdfs:range
owl:inverseOf
inferred
myData:Ivan
owl:relativeOf
owl:inverseOfowl:SymmetricProperty
rdfs:subPropertyOf
owl:inverseOf
owl:inverseOf
rdf:type
rdf:type
rdf:type
• The hottest NoSQL trend
• W3C standards
• Efficient Data Integration
− Using logical inference
− For data integration and BI
7. Analyzing Text
Mar 2016Open Data & News Analytics 7
• Full spectrum of NLP
weaponry
• Semantic indexing
− Tag references with entity IDs
− Generate semantic metadata
descriptions of documents
− Store metadata in GraphDB
8. Presentation Outline
• Quick news-analytics case
• Technology approach
• FactForge-News: Data architecture
• Sample queries on Linked Open Data
• News analytics examples
• Today’s News Map
Mar 2016Open Data & News Analytics 8
9. The Web of Linked Data in 2007
Mar 2016Open Data & News Analytics #9
structured database
version of Wikipedia
database of all
locations on Earth
product
reviews
semantic synonym
dictionary
Note: Each bubble represents a dataset.
Arrows represent mappings across datasets;
e.g. dbpedia:Paris owl:sameAs geo:2988507
10. The Web of Linked Data is Gaining Mass
Mar 2016Open Data & News Analytics #10
11. The Web of Data is Gaining Mass (2011)
Mar 2016Open Data & News Analytics #11
12. The Web of Linked Data is Gaining Mass
Mar 2016Open Data & News Analytics #12
• 2013 stats: 2 289 public
datasets
− http://stats.lod2.eu/
• Growing exponentially
− see the dotted trend line
• Structured markup
− Schema.org; semantic SEO
• Enables better semantic
tagging!
− As there are more concepts and
richer descriptions to refer to
27 43 89 162
295
822
2,289
2007 2008 2009 2010 2011 2012 2013
LinkedDataDatasets
13. The FactForge Data
• DBpedia (the English version only): 496M statements
• Geonames: 150M statements
− SameAs links between DBpedia and Geonames: 471K statements
• NOW data – metadata about news: 128M statements
• Total size: 938М statements
− 656M explicit statements + 281M inferred statements
− RDFRank and geo-spatial indices enabled to allow for ranking and efficient geo-region constraints
Mar 2016Open Data & News Analytics 13
14. News Metadata
• Metadata from Ontotext’s Dynamic Semantic Publishing platform
− Automatically generated as part of the NOW.ontotext.com semantic news showcase
• News corpus from Google since Feb 2015, about 10k news/month
• ~70 tags (annotations) per news article
• Tags link text mentions of concepts to the knowledge graph
− Technically these are URIs for entities (people, organizations, locations, etc.) and key phrases
Mar 2016Open Data & News Analytics 14
18. Class Hierarchy Map (by number of instances)
Mar 2016Open Data & News Analytics 18
Left: The big picture
Right: dbo:Agent class (2.7M organizations and persons)
19. Presentation Outline
• Quick news-analytics case
• Technology approach
• FactForge-News: Data architecture
• Sample queries on Linked Open Data
• News analytics examples
• Today’s News Map
Mar 2016Open Data & News Analytics 19
20. Sample queries
• There is a rich set of sample queries that allow exploration of this
combination of DBPedia, GeoNames and news metadata
• We will showcase few of those, starting from the simple once
• In bold we marked the “parameters” of the queires
Mar 2016Open Data & News Analytics 20
21. Query: Big Cities in Eastern Europe
# benefits from inference over transitive gn:parentFeature
# benefits from owl:sameAs mapping between DBPedia and Geonames
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX onto: <http://www.ontotext.com/>
PREFIX gn: <http://www.geonames.org/ontology#>
PREFIX dbo: <http://dbpedia.org/ontology/>
select *
from onto:disable-sameAs
where {
?loc gn:parentFeature dbr:Eastern_Europe ; gn:featureClass gn:P.
?loc dbo:populationTotal ?population ; dbo:country ?country .
FILTER(?population > 300000 )
} order by ?country
Mar 2016Open Data & News Analytics 21
22. Query: People and Organizations related to Google
# benefits from inference over transitive dbo:parent
# RDFRank makes it easy to see the “top suspects” in a list of 93 entities
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
PREFIX dbr: <http://dbpedia.org/resource/>
select distinct ?related_entity ?rank
where {
BIND (dbr:Google as ?entity)
{ ?related_entity a dbo:Person ; ?p ?entity . } UNION
{ ?related_entity a dbo:Organisation ; dbo:parent ?entity . }
?related_entity rank:hasRDFRank ?rank
} order by desc(?rank)
Mar 2016Open Data & News Analytics 22
23. Query: Airports near London
# GraphDB’s geo-spatial plug-in allows efficient evaluation of near-by
# RDFRank brings the top 6 passanger airports at the top of a list of 80
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX geo-pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX gdb-geo: <http://www.ontotext.com/owlim/geo#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX gdb: <http://www.ontotext.com/owlim/>
SELECT distinct ?airport ?rrank
WHERE {
{ SELECT * { dbr:London geo-pos:lat ?lat ; geo-pos:long ?long . } LIMIT 10 }
?airport gdb-geo:nearby(?lat ?long "50mi");
a dbo:Airport ;
gdb:hasRDFRank ?rrank .
} ORDER BY DESC(?rrank)
Mar 2016Open Data & News Analytics 23
24. Query: Top-level Industries by number of companies
# benefits from mapping and consolidation of industry classifications
# and predicates in DBPedia (ff-map)
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX ff-map: <http://factforge.net/ff2016-mapping/>
select distinct ?topIndustry (count(?company) as ?companies)
where {
?company dbo:industry ?industry .
?industrySum ff-map:industryVariant ?industry .
?industrySum ff-map:industryCenter ?topIndustry .
} group by ?topIndustry order by desc(?companies)
Mar 2016Open Data & News Analytics 24
25. Presentation Outline
• Quick news-analytics case
• Technology approach
• FactForge-News: Data architecture
• Sample queries on Linked Open Data
• News analytics examples
• Today’s News Map
Mar 2016Open Data & News Analytics 25
26. Semantic Press-Clipping
• We can trace references to a specific company in the news
− This is pretty much standard, however we can deal with syntactic variations in the names, because state
of the art Named Entity Recognition technology is used
− What’s more important, we distinguish correctly in which mention “Paris” refers to which of the
following: Paris (the capital of France), Paris in Texas, Paris Hilton or to Paris (the Greek hero)
• We can trace and consolidate references to daughter companies
• We have comprehensive industry classification
− The one from DBPedia, but refined to accommodate identifier variations and specialization (e.g.
company classified as dbr:Bank will also be considered classified as dbr:FinancialServices)
Mar 2016Open Data & News Analytics 26
27. Query: News Mentioning an IBM
# technical example to demonstrate how news metadata can be accessed
PREFIX pub-old: <http://ontology.ontotext.com/publishing#>
PREFIX pub: <http://ontology.ontotext.com/taxonomy/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct ?news ?title ?date ?pub_entity
where {
?news pub-old:containsMention / pub-old:hasInstance ?pub_entity .
?pub_entity pub:exactMatch dbr:IBM .
?news pub-old:creationDate ?date; pub-old:title ?title .
FILTER ( (?date > "2015-10-01T00:02:00Z"^^xsd:dateTime) &&
(?date < "2015-11-01T00:02:00Z"^^xsd:dateTime))
} limit 100
Mar 2016Open Data & News Analytics 27
28. Query: News Mentioning Gazprom and Its Related Entities
# benefits from inference over transitive dbo:parent relation and mappings to it
select distinct ?news ?title ?date ?related_entity
where {
{ select distinct ?related_entity {
BIND (dbr:Gazprom as ?entity)
{ ?related_entity a dbo:Person ; ?p ?entity .
FILTER NOT EXISTS { ?related_entity dbo:club ?entity } } UNION
{ ?related_entity a dbo:Organisation ; dbo:parent ?entity . } UNION
{ BIND(?entity as ?related_entity) }
} }
?news pub-old:containsMention / pub-old:hasInstance ?pub_entity .
?pub_entity pub:exactMatch ?related_entity .
?news pub-old:creationDate ?date; pub-old:title ?title .
} order by desc(?date) limit 1000
Mar 2016Open Data & News Analytics 28
29. Query: Most Popular in the News Automotive Companies
# benefits from mapping and consolidation of industry classifications
select distinct ?pub_entity (max(?entity_label) as ?label)
(count(?news) as ?news_count)
where {
?news pub-old:containsMention / pub-old:hasInstance ?pub_entity .
?pub_entity pub:exactMatch ?entity; pub:preferredLabel ?entity_label.
dbr:Automotive ff-map:industryVariant ?industry .
?entity dbo:industry ?industry .
?news pub-old:creationDate ?date .
} group by ?pub_entity order by desc(?news_count)
Mar 2016Open Data & News Analytics 29
30. Query: Most Popular in the News, including children
# benefits from mapping and consolidation of industry classifications
select distinct ?parent (count(?news) as ?news_count)
where {
{ select distinct ?parent ?entity {
BIND(dbr:Software as ?industry)
?industry ff-map:industryVariant ?industryVar .
?parent dbo:industry ?industryVar .
?parent a dbo:Company .
FILTER NOT EXISTS { ?parent dbo:parent / dbo:industry / ff-map:industryVariant ?industry }
{ ?entity dbo:parent ?parent . } UNION
{ BIND(?parent as ?entity) }
} }
?news pub-old:containsMention / pub-old:hasInstance ?pub_entity .
?pub_entity pub:exactMatch ?entity .
?news pub-old:creationDate ?date .
} group by ?parent order by desc(?news_count)
Mar 2016Open Data & News Analytics 30
31. News Popularity Ranking: Automotive
Mar 2016Open Data & News Analytics 31
Rank Company News # Rank Company incl. mentions of controlled News #
1 General Motors 2722 1 General Motors 4620
2 Tesla Motors 2346 2 Volkswagen Group 3999
3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658
4 Ford Motor Company 1934 4 Tesla Motors 2370
5 Toyota 1325 5 Ford Motor Company 2125
6 Chevrolet 1264 6 Toyota 1656
7 Chrysler 1054 7 Renault-Nissan Alliance 1332
8 Fiat Chrysler Automobiles 1011 8 Honda 864
9 Audi AG 972 9 BMW 715
10 Honda 717 10 Takata Corporation 547
32. News Popularity: Finance
Mar 2016Open Data & News Analytics 32
Rank Company News # Rank Company incl. mentions of controlled News #
1 Bloomberg L.P. 3203 1 China Merchants Bank 40940
2 Goldman Sachs 1992 2 Alphabet Inc. 24219
3 JP Morgan Chase 1712 3 Capital Group Companies 4379
4 Wells Fargo 1688 4 Bloomberg L.P. 3893
5 Citigroup 1557 5 Exor (company) 2775
6 HSBC Holdings 1546 6 JP Morgan Chase 2715
7 Deutsche Bank 1414 7 Nasdaq, Inc. 2178
8 Bank of America 1335 8 Oaktree Capital Management 1757
9 Barclays 1260 9 Goldman Sachs 1085
10 UBS 694 10 Sentinel Capital Partners 1064
Note: Including investment funds, stock exchanges, agencies, etc.
33. News Popularity: Banking
Mar 2016Open Data & News Analytics 33
Rank Company News # Rank Company incl. mentions of controlled News #
1 Goldman Sachs 996 1 China Merchants Bank * 38288
2 JP Morgan Chase 856 2 JP Morgan Chase 1972
3 HSBC Holdings 773 3 Goldman Sachs 1030
4 Deutsche Bank 707 4 HSBC 966
5 Barclays 630 5 Bank of America 771
6 Citigroup 519 6 Deutsche Bank 742
7 Bank of America 445 7 Barclays 681
8 Wells Fargo 422 8 Citigroup 630
9 UBS 347 9 Wells Fargo 428
10 Chase 126 10 UBS 347
Note: including investment funds, stock exchanges, agencies, etc.
34. Presentation Outline
• Quick news-analytics case
• Technology approach
• FactForge-News: Data architecture
• Sample queries on Linked Open Data
• News analytics examples
• Today’s News Map
Mar 2016Open Data & News Analytics 34
36. Today’s News Map: International
Mar 2016Open Data & News Analytics 36
37. Expect in Part II
• Mentions of entity and related by month
• Most relevant co-occurrnig entities
• Most relevant co-occurrnig entities per month
• Related News
• and more
Mar 2016Open Data & News Analytics 37
38. Thank you!
Experience the technology with NOW: Semantic News Portal
http://now.ontotext.com
Start using GraphDB and text-mining with S4 in the cloud
http://s4.ontotext.com
Learn more at our website or simply get in touch
info@ontotext.com, @ontotext
Mar 2016Open Data & News Analytics 38