SlideShare a Scribd company logo
1 of 25
Download to read offline
Slide 1 
International Semantic Web Conference 
Riva del Garda, Italy, 22.10.2014 
Semantic Web Challenge – Big Data Track 
Extending Tables with Data from 
over a Million Websites 
Oliver Lehmberg, Dominique Ritze, Petar Ristoski, 
Kai Eckert, Heiko Paulheim, Christian Bizer
Slide 2 
Extend a local table with additional columns 
using different types of Web data. 
Region 
Un-employment 
Alsace 11 % 
Lorraine 12 % 
Guadeloupe 28 % 
Centre 10 % 
Martinique 25 % 
GDP 
per Capita 
Population 
Growth 
45.914 € 0,16 % 
51.233 € -0,05 % 
19.810 € 1,34 % 
59.502 € 1,76 % 
NULL 2,64 % 
+ 
Goal
Slide 3 
Operation 1: Extend Local Table with Single Column 
Given a local table and keywords describing the extension 
column, add the extension column to the table and fill it 
with data from the Web. 
„GDP per Capita“ 
Region Unemployment 
Alsace 11 % 
Lorraine 12 % 
Guadeloupe 28 % 
Centre 10 % 
Martinique 25 % 
… … 
GDP per Capita 
45.914 € 
51.233 € 
19.810 € 
59.502 € 
21,527 € 
… 
+
Slide 4 
Operation 2: Extend Local Table with Many Columns 
Given a local table, add all columns to the table that can 
be filled beyond a density threshold. 
Region Unemp. 
Rate 
Alsace 11 % 
Lorraine 12 % 
Guadeloupe 28 % 
Centre 10 % 
Martinique 25 % 
… … 
GDP 
per Capita 
Population 
Growth 
Overseas 
departments 
… 
45.914 € 0,16 % No … 
51.233 € -0,05 % No … 
19.810 € 1,34 % Yes … 
59.502 € NULL NULL … 
NULL 2,64 % Yes … 
… … … 
+ 
density >= 0.8
Slide 5 
Types of Web Data Used 
Microdata 
(schema.org) 
Wiki Tables 
HTML Tables 
Linked Data
Slide 6 
Billion Triple Challenge Dataset 2014 
4 billion triples crawled from 47,000 websites.
Slide 7 
Web Data Commons - Microdata Corpus 
250 million triples from 463,000 websites. 
 Extracted from Common Crawl 2013 web corpus 
 2.2 billion HTML pages from 12.8 million websites 
 Mostly using the schema.org vocabulary 
 Main topics 
 Products 
 Reviews 
 Organisations / LocalBusiness 
 Events 
Download: http://webdatacommons.org/structureddata/
Slide 8 
Web Data Commons – Web Tables Corpus 
Around 1% of all HTML tables contain structured data. 
 we used 35 million English HTML tables. 
 extracted from the Common Crawl 2012 web corpus 
 selected out of 11.2 billion raw tables
Slide 9 
Web Data Commons – Web Tables Corpus 
 Column Statistics 
Column #Tables 
name 4,600,000 
price 3,700,000 
date 2,700,000 
artist 2,100,000 
location 1,200,000 
year 1,000,000 
manufacturer 375,000 
counrty 340,000 
isbn 99,000 
area 95,000 
population 86,000 
 Subject Column Values 
Value #Rows 
usa 135,000 
germany 91,000 
greece 42,000 
new york 59,000 
london 37,000 
athens 11,000 
david beckham 3,000 
ronaldinho 1,200 
oliver kahn 710 
twist shout 2,000 
yellow submarine 1,400 
Download: http://webdatacommons.org/webtables/
Slide 10 
WikiTables 
1.4 million tables from English Wikipedia. 
 extracted by Northwestern University 
 from the 2013 Wikipedia XML dump 
 only tables, no infoboxes 
Download: http://downey-n1.cs.northwestern.edu/public/
Slide 11 
Internal Data Model: Entity-Attributes-Tables 
 One entity per row 
 Subject Column = Name of the entity 
 HTML tables: Most unique string column, break ties by taking leftmost. 
Rank Film Studio Director Length 
1. Star Wars –Episode 1 Lucasfilm George Lucas 121 min 
2. Alien Brandwine Ridley Scott 117 min 
3. Black Moon NEF Louis Malle 100 min 
 Table generation from Linked Data and Microdata 
 generate one table per class and website 
 subject column: rdfs:label, foaf:name, x:name 
 we exploit common vocabularies
Slide 12 
Indexed Tables 
 Selection Conditions: 
1. Minimum size of 3 columns and 5 rows 
2. Subject column detection successful 
 Total # of tables: 36.3 million 
 Total # of PLDs: ~ 1.5 million 
 Total # of triples: 3.0 billion
Slide 13 
The Mannheim Search Joins Engine (MSJE) 
Collection of tables Table Normalization 
Table Storage Table Index 
1. Table 
Indexing 
Input query table Table Preprocessing 
Search 
2. Table 
Search 
3. Data 
Consolidation 
Data collection 
User Preferences 
Consolidation 
MultiJoin Top k Candidates
Slide 14 
The Search Operator 
The Search operator determines the set of relevant Web tables. 
 Table Ranking 
 subject column value overlap 
 extended Jaccard Similarity (FastJoin) 
 Select TopK Tables 
 1000 tables in the single column experiments 
Relevant
Slide 15 
Multi-Join Operator 
The MultiJoin operator performs a series of left-outer joins 
between the query table and all tables in the input set. 
No. Region 
1 Alsace 
2 Lorraine 
3 Guadeloupe 
4 Centre 
Unemploy 
11 % 
12 % 
28 % 
10 % 
Unemploy 
NULL 
NULL 
NULL 
9.4 % 
GDP 
45.914 € 
51.233 € 
NULL 
NULL 
GDP per C 
45.000 € 
NULL 
19.000 € 
59.500 €
Slide 16 
Consolidation Operator 
The consolidation operator merges corresponding columns 
and fuses values in order to return a concise result table. 
 Column Matching 
 Combination of label- and instance-based techniques 
 Conflict Resolution 
 Strings: majority vote 
 Numeric values: average, 
median, clustering and vote 
No Region Unemploy GDP 
1 Alsace 11 % 45.914 € 
2 Lorraine 12 % 51.233 € 
3 Guadelo 
upe 
28 % 19.000 € 
4 Centre 10 % 59.500 €
Slide 17 
http://searchjoins.webdatacommons.org
Slide 18 
Result: Extend with Single Column
Slide 19 
Provenance Summary
Slide 20 
Provenance Details
Slide 21 
Evaluation Results 
Author Head‐quarter 
Industry Area Capital Code Currency Popu‐lation 
Ingre‐dient 
Cast Director Genre Year Artist Team 
Book Company Country Drug Film Song Soccer 
Player 
100% 
80% 
60% 
40% 
20% 
0% 
coverage 93% 94% 94% 100% 100% 100% 94% 100% 87% 94% 97% 97% 96% 99% 88% 
precision 96% 96% 94% 95% 100% 94% 96% 64% 89% 85% 97% 86% 97% 95% 67% 
Coverage: Percentage of entities for which a value was found. 
Precision: Manually evaluated using Wikipedia, IMDB, Amazon.
Slide 22 
Result: Extend with Many Columns 
505 columns are added 
and filled with data from 2071 tables.
Slide 23 
Provenance Summary
Slide 24 
Provenance Details for “area (sq. km)”
Search Joins bring together Web Search and DB Joins. 
Slide 25 
Conclusion 
 The prototype shows that simple queries are feasible. 
 The Web is one application domain for search joins, 
corporate intranets are the other. 
 The overlooked Big Data Vs: Variety and Veracity

More Related Content

What's hot

2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortarOpen Analytics
 
Graph Structure in the Web - Revisited. WWW2014 Web Science Track
Graph Structure in the Web - Revisited. WWW2014 Web Science TrackGraph Structure in the Web - Revisited. WWW2014 Web Science Track
Graph Structure in the Web - Revisited. WWW2014 Web Science TrackChris Bizer
 
Building an open democracy with open data +
Building an open democracy with open data + Building an open democracy with open data +
Building an open democracy with open data + Irina Bolychevsky
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?Martin Hepp
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greaterCristina Sarasua
 
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...Data Beers
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseRDTF-Discovery
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataOntotext
 
Putting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open DataPutting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open DataMartin Kaltenböck
 
The methods and practices of Linked Open Data
The methods and practices of Linked Open DataThe methods and practices of Linked Open Data
The methods and practices of Linked Open DataDongpo Deng
 
Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Herbert Van de Sompel
 
Web Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseWeb Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseGiorgio Orsi
 
20180226 data driven smart governance
20180226 data driven smart governance20180226 data driven smart governance
20180226 data driven smart governanceDongpo Deng
 
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open DataMuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data21Style
 
Data.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataData.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataMatthew Rowe
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Michele Pasin
 

What's hot (20)

Linked data life cycles
Linked data life cyclesLinked data life cycles
Linked data life cycles
 
2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
Graph Structure in the Web - Revisited. WWW2014 Web Science Track
Graph Structure in the Web - Revisited. WWW2014 Web Science TrackGraph Structure in the Web - Revisited. WWW2014 Web Science Track
Graph Structure in the Web - Revisited. WWW2014 Web Science Track
 
Building an open democracy with open data +
Building an open democracy with open data + Building an open democracy with open data +
Building an open democracy with open data +
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greater
 
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
 
WCIT2010
WCIT2010WCIT2010
WCIT2010
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
Putting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open DataPutting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open Data
 
The methods and practices of Linked Open Data
The methods and practices of Linked Open DataThe methods and practices of Linked Open Data
The methods and practices of Linked Open Data
 
Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)
 
Web Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseWeb Data Extraction: A Crash Course
Web Data Extraction: A Crash Course
 
20180226 data driven smart governance
20180226 data driven smart governance20180226 data driven smart governance
20180226 data driven smart governance
 
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open DataMuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
 
Data.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataData.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked Data
 
Linking Open Data
Linking Open DataLinking Open Data
Linking Open Data
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...
 

Viewers also liked

Infographic - Food and Beverage Barcode Labeling
Infographic - Food and Beverage Barcode LabelingInfographic - Food and Beverage Barcode Labeling
Infographic - Food and Beverage Barcode LabelingLoftware
 
The Role of Technology in Food Processing Compliance and Traceability
The Role of Technology in Food Processing Compliance and TraceabilityThe Role of Technology in Food Processing Compliance and Traceability
The Role of Technology in Food Processing Compliance and TraceabilityBlytheco
 
Open Knowledge Repositories: Enablers of Data Integration across Business Col...
Open Knowledge Repositories: Enablers of Data Integration across Business Col...Open Knowledge Repositories: Enablers of Data Integration across Business Col...
Open Knowledge Repositories: Enablers of Data Integration across Business Col...Monika Solanki
 
Realising the Potential of Algal Biomass Production through Semantic Web an...
Realising the Potential of Algal Biomass Production   through Semantic Web an...Realising the Potential of Algal Biomass Production   through Semantic Web an...
Realising the Potential of Algal Biomass Production through Semantic Web an...Monika Solanki
 
Building Ontologies for Algal Biomass Operations 2012
Building Ontologies for Algal Biomass Operations 2012Building Ontologies for Algal Biomass Operations 2012
Building Ontologies for Algal Biomass Operations 2012Monika Solanki
 
Querying Linked Data and Büchi automata
Querying Linked Data and Büchi automataQuerying Linked Data and Büchi automata
Querying Linked Data and Büchi automataKonstantinos Giannakis
 
From Biomass to Energy via Semantic Web and Linked data
From Biomass to Energy via Semantic Web and Linked dataFrom Biomass to Energy via Semantic Web and Linked data
From Biomass to Energy via Semantic Web and Linked dataMonika Solanki
 
The potential role of open data in supply chain integration
The potential role of open data in supply chain integrationThe potential role of open data in supply chain integration
The potential role of open data in supply chain integrationChristopher Brewster
 
The Internet of Lettuces: Legibility, Data and Alternative Food Networks
The Internet of Lettuces: Legibility, Data and Alternative Food NetworksThe Internet of Lettuces: Legibility, Data and Alternative Food Networks
The Internet of Lettuces: Legibility, Data and Alternative Food NetworksChristopher Brewster
 
Representing Supply Chain Events on the Web of Data
Representing Supply Chain Events on the Web of DataRepresenting Supply Chain Events on the Web of Data
Representing Supply Chain Events on the Web of DataMonika Solanki
 
The curious case of Blockchain Technology
The curious case of Blockchain TechnologyThe curious case of Blockchain Technology
The curious case of Blockchain TechnologyRitesh Mehrotra
 
Linked data driven EPCIS Event-based Traceability across Supply chain busine...
Linked data driven EPCIS Event-based Traceability across  Supply chain busine...Linked data driven EPCIS Event-based Traceability across  Supply chain busine...
Linked data driven EPCIS Event-based Traceability across Supply chain busine...Monika Solanki
 
Detecting EPCIS exceptions in linked traceability streams across supply cha...
Detecting   EPCIS exceptions in linked traceability streams across supply cha...Detecting   EPCIS exceptions in linked traceability streams across supply cha...
Detecting EPCIS exceptions in linked traceability streams across supply cha...Monika Solanki
 
Consuming Linked data in Supply Chains: Enabling data visibility via Linked P...
Consuming Linked data in Supply Chains: Enabling data visibility via Linked P...Consuming Linked data in Supply Chains: Enabling data visibility via Linked P...
Consuming Linked data in Supply Chains: Enabling data visibility via Linked P...Monika Solanki
 
Global Supply Chain Innovation Summit, Shanghai
Global Supply Chain Innovation Summit, ShanghaiGlobal Supply Chain Innovation Summit, Shanghai
Global Supply Chain Innovation Summit, ShanghaiDeborah Weinswig
 
Semantic web and Linked Data
Semantic web and Linked DataSemantic web and Linked Data
Semantic web and Linked DataHyun Namgoong
 
Linked data driven EPCIS Event based Traceability across Supply chain busine...
Linked data driven EPCIS Event based Traceability across  Supply chain busine...Linked data driven EPCIS Event based Traceability across  Supply chain busine...
Linked data driven EPCIS Event based Traceability across Supply chain busine...Monika Solanki
 
Linking transformations in EPCIS governing supply chain business processes
Linking transformations in EPCIS governing supply chain business processesLinking transformations in EPCIS governing supply chain business processes
Linking transformations in EPCIS governing supply chain business processesMonika Solanki
 
How Blockchain Can Help Retailers Fight Fraud, Boost Margins and Build Brands
How Blockchain Can Help Retailers Fight Fraud, Boost Margins and Build BrandsHow Blockchain Can Help Retailers Fight Fraud, Boost Margins and Build Brands
How Blockchain Can Help Retailers Fight Fraud, Boost Margins and Build BrandsCognizant
 
EPCIS Event-Based Traceability in Pharmaceutical Supply Chains via Automated ...
EPCIS Event-Based Traceability in Pharmaceutical Supply Chains via Automated ...EPCIS Event-Based Traceability in Pharmaceutical Supply Chains via Automated ...
EPCIS Event-Based Traceability in Pharmaceutical Supply Chains via Automated ...Monika Solanki
 

Viewers also liked (20)

Infographic - Food and Beverage Barcode Labeling
Infographic - Food and Beverage Barcode LabelingInfographic - Food and Beverage Barcode Labeling
Infographic - Food and Beverage Barcode Labeling
 
The Role of Technology in Food Processing Compliance and Traceability
The Role of Technology in Food Processing Compliance and TraceabilityThe Role of Technology in Food Processing Compliance and Traceability
The Role of Technology in Food Processing Compliance and Traceability
 
Open Knowledge Repositories: Enablers of Data Integration across Business Col...
Open Knowledge Repositories: Enablers of Data Integration across Business Col...Open Knowledge Repositories: Enablers of Data Integration across Business Col...
Open Knowledge Repositories: Enablers of Data Integration across Business Col...
 
Realising the Potential of Algal Biomass Production through Semantic Web an...
Realising the Potential of Algal Biomass Production   through Semantic Web an...Realising the Potential of Algal Biomass Production   through Semantic Web an...
Realising the Potential of Algal Biomass Production through Semantic Web an...
 
Building Ontologies for Algal Biomass Operations 2012
Building Ontologies for Algal Biomass Operations 2012Building Ontologies for Algal Biomass Operations 2012
Building Ontologies for Algal Biomass Operations 2012
 
Querying Linked Data and Büchi automata
Querying Linked Data and Büchi automataQuerying Linked Data and Büchi automata
Querying Linked Data and Büchi automata
 
From Biomass to Energy via Semantic Web and Linked data
From Biomass to Energy via Semantic Web and Linked dataFrom Biomass to Energy via Semantic Web and Linked data
From Biomass to Energy via Semantic Web and Linked data
 
The potential role of open data in supply chain integration
The potential role of open data in supply chain integrationThe potential role of open data in supply chain integration
The potential role of open data in supply chain integration
 
The Internet of Lettuces: Legibility, Data and Alternative Food Networks
The Internet of Lettuces: Legibility, Data and Alternative Food NetworksThe Internet of Lettuces: Legibility, Data and Alternative Food Networks
The Internet of Lettuces: Legibility, Data and Alternative Food Networks
 
Representing Supply Chain Events on the Web of Data
Representing Supply Chain Events on the Web of DataRepresenting Supply Chain Events on the Web of Data
Representing Supply Chain Events on the Web of Data
 
The curious case of Blockchain Technology
The curious case of Blockchain TechnologyThe curious case of Blockchain Technology
The curious case of Blockchain Technology
 
Linked data driven EPCIS Event-based Traceability across Supply chain busine...
Linked data driven EPCIS Event-based Traceability across  Supply chain busine...Linked data driven EPCIS Event-based Traceability across  Supply chain busine...
Linked data driven EPCIS Event-based Traceability across Supply chain busine...
 
Detecting EPCIS exceptions in linked traceability streams across supply cha...
Detecting   EPCIS exceptions in linked traceability streams across supply cha...Detecting   EPCIS exceptions in linked traceability streams across supply cha...
Detecting EPCIS exceptions in linked traceability streams across supply cha...
 
Consuming Linked data in Supply Chains: Enabling data visibility via Linked P...
Consuming Linked data in Supply Chains: Enabling data visibility via Linked P...Consuming Linked data in Supply Chains: Enabling data visibility via Linked P...
Consuming Linked data in Supply Chains: Enabling data visibility via Linked P...
 
Global Supply Chain Innovation Summit, Shanghai
Global Supply Chain Innovation Summit, ShanghaiGlobal Supply Chain Innovation Summit, Shanghai
Global Supply Chain Innovation Summit, Shanghai
 
Semantic web and Linked Data
Semantic web and Linked DataSemantic web and Linked Data
Semantic web and Linked Data
 
Linked data driven EPCIS Event based Traceability across Supply chain busine...
Linked data driven EPCIS Event based Traceability across  Supply chain busine...Linked data driven EPCIS Event based Traceability across  Supply chain busine...
Linked data driven EPCIS Event based Traceability across Supply chain busine...
 
Linking transformations in EPCIS governing supply chain business processes
Linking transformations in EPCIS governing supply chain business processesLinking transformations in EPCIS governing supply chain business processes
Linking transformations in EPCIS governing supply chain business processes
 
How Blockchain Can Help Retailers Fight Fraud, Boost Margins and Build Brands
How Blockchain Can Help Retailers Fight Fraud, Boost Margins and Build BrandsHow Blockchain Can Help Retailers Fight Fraud, Boost Margins and Build Brands
How Blockchain Can Help Retailers Fight Fraud, Boost Margins and Build Brands
 
EPCIS Event-Based Traceability in Pharmaceutical Supply Chains via Automated ...
EPCIS Event-Based Traceability in Pharmaceutical Supply Chains via Automated ...EPCIS Event-Based Traceability in Pharmaceutical Supply Chains via Automated ...
EPCIS Event-Based Traceability in Pharmaceutical Supply Chains via Automated ...
 

Similar to Extending Tables with Data from over a Million Websites

Data Search and Search Joins (Universität Heidelberg 2015)
Data Search and Search Joins (Universität Heidelberg 2015)Data Search and Search Joins (Universität Heidelberg 2015)
Data Search and Search Joins (Universität Heidelberg 2015)Chris Bizer
 
Unevenly Distributed
Unevenly DistributedUnevenly Distributed
Unevenly DistributedC4Media
 
INTERFACE by apidays 2023 - API Green Score, Yannick Tremblais, Groupe Rocher
INTERFACE by apidays 2023 - API Green Score, Yannick Tremblais, Groupe RocherINTERFACE by apidays 2023 - API Green Score, Yannick Tremblais, Groupe Rocher
INTERFACE by apidays 2023 - API Green Score, Yannick Tremblais, Groupe Rocherapidays
 
Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013MLconf
 
Mining a Large Web Corpus
Mining a Large Web CorpusMining a Large Web Corpus
Mining a Large Web CorpusRobert Meusel
 
Text Mining with Rapid Miner.
Text Mining with Rapid Miner.Text Mining with Rapid Miner.
Text Mining with Rapid Miner.Gurdal Ertek
 
Text Mining with RapidMiner
Text Mining with RapidMinerText Mining with RapidMiner
Text Mining with RapidMinerertekg
 
MWLUG 2014: Modern Domino (workshop)
MWLUG 2014: Modern Domino (workshop)MWLUG 2014: Modern Domino (workshop)
MWLUG 2014: Modern Domino (workshop)Peter Presnell
 
RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Media...
RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Media...RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Media...
RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Media...doenz
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsAntonio Severien
 
Interactive Latency in Big Data Visualization
Interactive Latency in Big Data VisualizationInteractive Latency in Big Data Visualization
Interactive Latency in Big Data Visualizationbigdataviz_bay
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 
Math 141 Exam Final Exam Name____________________.docx
Math 141 Exam Final Exam             Name____________________.docxMath 141 Exam Final Exam             Name____________________.docx
Math 141 Exam Final Exam Name____________________.docxendawalling
 
Math 141 Exam Final Exam Name____________________.docx
Math 141 Exam Final Exam             Name____________________.docxMath 141 Exam Final Exam             Name____________________.docx
Math 141 Exam Final Exam Name____________________.docxwkyra78
 
Daniel Egan Msdn Tech Days Oc
Daniel Egan Msdn Tech Days OcDaniel Egan Msdn Tech Days Oc
Daniel Egan Msdn Tech Days OcDaniel Egan
 
ASSIGNMENT BOOKLET 2018 DIPLOMA IN IT 3 YEARS 1ST YEAR
ASSIGNMENT BOOKLET 2018 DIPLOMA IN IT 3 YEARS 1ST YEARASSIGNMENT BOOKLET 2018 DIPLOMA IN IT 3 YEARS 1ST YEAR
ASSIGNMENT BOOKLET 2018 DIPLOMA IN IT 3 YEARS 1ST YEARDon Dooley
 
What's new in spark 2.0?
What's new in spark 2.0?What's new in spark 2.0?
What's new in spark 2.0?Örjan Lundberg
 

Similar to Extending Tables with Data from over a Million Websites (20)

Data Search and Search Joins (Universität Heidelberg 2015)
Data Search and Search Joins (Universität Heidelberg 2015)Data Search and Search Joins (Universität Heidelberg 2015)
Data Search and Search Joins (Universität Heidelberg 2015)
 
Unevenly Distributed
Unevenly DistributedUnevenly Distributed
Unevenly Distributed
 
Asp Aug2008
Asp Aug2008Asp Aug2008
Asp Aug2008
 
Diadem 1.0
Diadem 1.0Diadem 1.0
Diadem 1.0
 
INTERFACE by apidays 2023 - API Green Score, Yannick Tremblais, Groupe Rocher
INTERFACE by apidays 2023 - API Green Score, Yannick Tremblais, Groupe RocherINTERFACE by apidays 2023 - API Green Score, Yannick Tremblais, Groupe Rocher
INTERFACE by apidays 2023 - API Green Score, Yannick Tremblais, Groupe Rocher
 
Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013
 
Mining a Large Web Corpus
Mining a Large Web CorpusMining a Large Web Corpus
Mining a Large Web Corpus
 
Text Mining with Rapid Miner.
Text Mining with Rapid Miner.Text Mining with Rapid Miner.
Text Mining with Rapid Miner.
 
Text Mining with RapidMiner
Text Mining with RapidMinerText Mining with RapidMiner
Text Mining with RapidMiner
 
MWLUG 2014: Modern Domino (workshop)
MWLUG 2014: Modern Domino (workshop)MWLUG 2014: Modern Domino (workshop)
MWLUG 2014: Modern Domino (workshop)
 
RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Media...
RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Media...RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Media...
RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Media...
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data Streams
 
Interactive Latency in Big Data Visualization
Interactive Latency in Big Data VisualizationInteractive Latency in Big Data Visualization
Interactive Latency in Big Data Visualization
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Math 141 Exam Final Exam Name____________________.docx
Math 141 Exam Final Exam             Name____________________.docxMath 141 Exam Final Exam             Name____________________.docx
Math 141 Exam Final Exam Name____________________.docx
 
Math 141 Exam Final Exam Name____________________.docx
Math 141 Exam Final Exam             Name____________________.docxMath 141 Exam Final Exam             Name____________________.docx
Math 141 Exam Final Exam Name____________________.docx
 
Daniel Egan Msdn Tech Days Oc
Daniel Egan Msdn Tech Days OcDaniel Egan Msdn Tech Days Oc
Daniel Egan Msdn Tech Days Oc
 
ASSIGNMENT BOOKLET 2018 DIPLOMA IN IT 3 YEARS 1ST YEAR
ASSIGNMENT BOOKLET 2018 DIPLOMA IN IT 3 YEARS 1ST YEARASSIGNMENT BOOKLET 2018 DIPLOMA IN IT 3 YEARS 1ST YEAR
ASSIGNMENT BOOKLET 2018 DIPLOMA IN IT 3 YEARS 1ST YEAR
 
What's new in spark 2.0?
What's new in spark 2.0?What's new in spark 2.0?
What's new in spark 2.0?
 
GraphQL Basics
GraphQL BasicsGraphQL Basics
GraphQL Basics
 

More from Chris Bizer

GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?
GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?
GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?Chris Bizer
 
Integrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning TechniquesIntegrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning TechniquesChris Bizer
 
Using the Semantic Web as Training Data for Product Matching
Using the Semantic Web as Training Data for Product MatchingUsing the Semantic Web as Training Data for Product Matching
Using the Semantic Web as Training Data for Product MatchingChris Bizer
 
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open WebJIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open WebChris Bizer
 
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...Chris Bizer
 
Exploring the Application Potential of Relational Web Tables
Exploring the Application Potential of Relational Web TablesExploring the Application Potential of Relational Web Tables
Exploring the Application Potential of Relational Web TablesChris Bizer
 
Evolving the Web into a Global Database - Advances and Applications.
Evolving the Web into a Global Database - Advances and Applications. Evolving the Web into a Global Database - Advances and Applications.
Evolving the Web into a Global Database - Advances and Applications. Chris Bizer
 

More from Chris Bizer (7)

GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?
GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?
GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?
 
Integrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning TechniquesIntegrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning Techniques
 
Using the Semantic Web as Training Data for Product Matching
Using the Semantic Web as Training Data for Product MatchingUsing the Semantic Web as Training Data for Product Matching
Using the Semantic Web as Training Data for Product Matching
 
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open WebJIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web
 
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
 
Exploring the Application Potential of Relational Web Tables
Exploring the Application Potential of Relational Web TablesExploring the Application Potential of Relational Web Tables
Exploring the Application Potential of Relational Web Tables
 
Evolving the Web into a Global Database - Advances and Applications.
Evolving the Web into a Global Database - Advances and Applications. Evolving the Web into a Global Database - Advances and Applications.
Evolving the Web into a Global Database - Advances and Applications.
 

Recently uploaded

Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663Call Girls Mumbai
 
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Onlineanilsa9823
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...tanu pandey
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Call Girls in Nagpur High Profile
 
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...Neha Pandey
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.soniya singh
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Sheetaleventcompany
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...Diya Sharma
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)Damian Radcliffe
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445ruhi
 

Recently uploaded (20)

Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
 
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
Russian Call Girls in %(+971524965298  )#  Call Girls in DubaiRussian Call Girls in %(+971524965298  )#  Call Girls in Dubai
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
 
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
 
@9999965857 🫦 Sexy Desi Call Girls Laxmi Nagar 💓 High Profile Escorts Delhi 🫶
@9999965857 🫦 Sexy Desi Call Girls Laxmi Nagar 💓 High Profile Escorts Delhi 🫶@9999965857 🫦 Sexy Desi Call Girls Laxmi Nagar 💓 High Profile Escorts Delhi 🫶
@9999965857 🫦 Sexy Desi Call Girls Laxmi Nagar 💓 High Profile Escorts Delhi 🫶
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
 
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 

Extending Tables with Data from over a Million Websites

  • 1. Slide 1 International Semantic Web Conference Riva del Garda, Italy, 22.10.2014 Semantic Web Challenge – Big Data Track Extending Tables with Data from over a Million Websites Oliver Lehmberg, Dominique Ritze, Petar Ristoski, Kai Eckert, Heiko Paulheim, Christian Bizer
  • 2. Slide 2 Extend a local table with additional columns using different types of Web data. Region Un-employment Alsace 11 % Lorraine 12 % Guadeloupe 28 % Centre 10 % Martinique 25 % GDP per Capita Population Growth 45.914 € 0,16 % 51.233 € -0,05 % 19.810 € 1,34 % 59.502 € 1,76 % NULL 2,64 % + Goal
  • 3. Slide 3 Operation 1: Extend Local Table with Single Column Given a local table and keywords describing the extension column, add the extension column to the table and fill it with data from the Web. „GDP per Capita“ Region Unemployment Alsace 11 % Lorraine 12 % Guadeloupe 28 % Centre 10 % Martinique 25 % … … GDP per Capita 45.914 € 51.233 € 19.810 € 59.502 € 21,527 € … +
  • 4. Slide 4 Operation 2: Extend Local Table with Many Columns Given a local table, add all columns to the table that can be filled beyond a density threshold. Region Unemp. Rate Alsace 11 % Lorraine 12 % Guadeloupe 28 % Centre 10 % Martinique 25 % … … GDP per Capita Population Growth Overseas departments … 45.914 € 0,16 % No … 51.233 € -0,05 % No … 19.810 € 1,34 % Yes … 59.502 € NULL NULL … NULL 2,64 % Yes … … … … + density >= 0.8
  • 5. Slide 5 Types of Web Data Used Microdata (schema.org) Wiki Tables HTML Tables Linked Data
  • 6. Slide 6 Billion Triple Challenge Dataset 2014 4 billion triples crawled from 47,000 websites.
  • 7. Slide 7 Web Data Commons - Microdata Corpus 250 million triples from 463,000 websites.  Extracted from Common Crawl 2013 web corpus  2.2 billion HTML pages from 12.8 million websites  Mostly using the schema.org vocabulary  Main topics  Products  Reviews  Organisations / LocalBusiness  Events Download: http://webdatacommons.org/structureddata/
  • 8. Slide 8 Web Data Commons – Web Tables Corpus Around 1% of all HTML tables contain structured data.  we used 35 million English HTML tables.  extracted from the Common Crawl 2012 web corpus  selected out of 11.2 billion raw tables
  • 9. Slide 9 Web Data Commons – Web Tables Corpus  Column Statistics Column #Tables name 4,600,000 price 3,700,000 date 2,700,000 artist 2,100,000 location 1,200,000 year 1,000,000 manufacturer 375,000 counrty 340,000 isbn 99,000 area 95,000 population 86,000  Subject Column Values Value #Rows usa 135,000 germany 91,000 greece 42,000 new york 59,000 london 37,000 athens 11,000 david beckham 3,000 ronaldinho 1,200 oliver kahn 710 twist shout 2,000 yellow submarine 1,400 Download: http://webdatacommons.org/webtables/
  • 10. Slide 10 WikiTables 1.4 million tables from English Wikipedia.  extracted by Northwestern University  from the 2013 Wikipedia XML dump  only tables, no infoboxes Download: http://downey-n1.cs.northwestern.edu/public/
  • 11. Slide 11 Internal Data Model: Entity-Attributes-Tables  One entity per row  Subject Column = Name of the entity  HTML tables: Most unique string column, break ties by taking leftmost. Rank Film Studio Director Length 1. Star Wars –Episode 1 Lucasfilm George Lucas 121 min 2. Alien Brandwine Ridley Scott 117 min 3. Black Moon NEF Louis Malle 100 min  Table generation from Linked Data and Microdata  generate one table per class and website  subject column: rdfs:label, foaf:name, x:name  we exploit common vocabularies
  • 12. Slide 12 Indexed Tables  Selection Conditions: 1. Minimum size of 3 columns and 5 rows 2. Subject column detection successful  Total # of tables: 36.3 million  Total # of PLDs: ~ 1.5 million  Total # of triples: 3.0 billion
  • 13. Slide 13 The Mannheim Search Joins Engine (MSJE) Collection of tables Table Normalization Table Storage Table Index 1. Table Indexing Input query table Table Preprocessing Search 2. Table Search 3. Data Consolidation Data collection User Preferences Consolidation MultiJoin Top k Candidates
  • 14. Slide 14 The Search Operator The Search operator determines the set of relevant Web tables.  Table Ranking  subject column value overlap  extended Jaccard Similarity (FastJoin)  Select TopK Tables  1000 tables in the single column experiments Relevant
  • 15. Slide 15 Multi-Join Operator The MultiJoin operator performs a series of left-outer joins between the query table and all tables in the input set. No. Region 1 Alsace 2 Lorraine 3 Guadeloupe 4 Centre Unemploy 11 % 12 % 28 % 10 % Unemploy NULL NULL NULL 9.4 % GDP 45.914 € 51.233 € NULL NULL GDP per C 45.000 € NULL 19.000 € 59.500 €
  • 16. Slide 16 Consolidation Operator The consolidation operator merges corresponding columns and fuses values in order to return a concise result table.  Column Matching  Combination of label- and instance-based techniques  Conflict Resolution  Strings: majority vote  Numeric values: average, median, clustering and vote No Region Unemploy GDP 1 Alsace 11 % 45.914 € 2 Lorraine 12 % 51.233 € 3 Guadelo upe 28 % 19.000 € 4 Centre 10 % 59.500 €
  • 18. Slide 18 Result: Extend with Single Column
  • 21. Slide 21 Evaluation Results Author Head‐quarter Industry Area Capital Code Currency Popu‐lation Ingre‐dient Cast Director Genre Year Artist Team Book Company Country Drug Film Song Soccer Player 100% 80% 60% 40% 20% 0% coverage 93% 94% 94% 100% 100% 100% 94% 100% 87% 94% 97% 97% 96% 99% 88% precision 96% 96% 94% 95% 100% 94% 96% 64% 89% 85% 97% 86% 97% 95% 67% Coverage: Percentage of entities for which a value was found. Precision: Manually evaluated using Wikipedia, IMDB, Amazon.
  • 22. Slide 22 Result: Extend with Many Columns 505 columns are added and filled with data from 2071 tables.
  • 24. Slide 24 Provenance Details for “area (sq. km)”
  • 25. Search Joins bring together Web Search and DB Joins. Slide 25 Conclusion  The prototype shows that simple queries are feasible.  The Web is one application domain for search joins, corporate intranets are the other.  The overlooked Big Data Vs: Variety and Veracity