Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using Social Data and SPARQL Rules Matthew Rowe Organisations, Information and Knowledge Group University of Sheffield

Outline Problem Setting Personal Information Dissemination SPARQL Rules: Identifying Web Citations Generating Seed Data Gathering Possible Web Citations Inferring Web Citations Evaluation Conclusions Future Work

Personal Information on the Web Personal information on the Web is disseminated: Voluntarily Involuntarily Increase in personal information: Identity Theft Lateral Surveillance Web users must discover their identity web references 2 stage process Find possible references Identify definite references

Problem Setting Performing identification manually: Time consuming Laborious Handle masses of information Repeated often The Web keeps changing Solution = automated techniques Alleviate the need for humans Need background knowledge Who am I searching for? What makes them unique?

SPARQL Rules: Identifying Web Citations

Generating Seed Data Profiles on Social Web are leveraged as seed data To generate seed data: Export Social Graphs Interface with the platform’s API Convert proprietary response into RDF Biographical Information Social Network Information Enrich Graphs with URIs Interlink graphs Detect equivalent foaf:Person instances Builds a single social graph

Generating Seed Data Profiles on Social Web are leveraged as seed data To generate seed data: Export Social Graphs Interface with the platform’s API Convert proprietary response into RDF Biographical Information Social Network Information Enrich Graphs with URIs Interlink graphs Detect equivalent foaf:Person instances Builds a single social graph http://www.dcs.shef.ac.uk/~mrowe/foafgenerator.html

Generating Seed Data Profiles on Social Web are leveraged as seed data To generate seed data: Export Social Graphs Interface with the platform’s API Convert proprietary response into RDF Biographical Information Social Network Information Enrich Graphs with URIs Interlink graphs Detect equivalent foaf:Person instances Builds a single social graph Blocking Step Compare values of Inverse Functional Properties Compare Geo URIs Compare Geo data

Gathering Possible Web Citations Search WWW and Semantic Web for possible citations Web resources come in many flavours: Data Models, HTML documents, XHTML documents Convert into RDF XHTML Documents: Use GRDDL Automated RDF model lifting HTML Documents: Apply person name gazetteer: identify person information Apply Hidden Markov Model to extract information Build RDF model from information M Rowe. Data.dcs: Converting Legacy Data into Linked Data. In proceedings of Linked Data on the Web Workshop, WWW 2010. Raleigh, USA. (2010)

Inferring Web Citations using SPARQL Rules Seed data = solitary example to build rules State of the art rule induction strategies are limited E.g. FOIL and C4.5 Build rules from RDF instances! 1. Extract instances from Seed Data 2. For each instance, build a rule: Build a skeleton rule Add triples to the rule Create a new rule if a triple’s predicate is Inverse Functional 3. Apply the rules to the web resources

Inferring Web Citations using SPARQL Rules Seed data = solitary example to build rules State of the art rule induction strategies are limited E.g. FOIL and C4.5 Build rules from RDF instances! 1. Extract instances from Seed Data 2. For each instance, build a rule: Build a skeleton rule Add triples to the rule Create a new rule if a triple’s predicate is Inverse Functional 3. Apply the rules to the web resources PREFIX foaf:<http://xmlns.com/foaf/0.1/> CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url } WHERE { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n . ?urlfoaf:topic ?p . ?pfoaf:name ?n }

Inferring Web Citations using SPARQL Rules Seed data = solitary example to build rules State of the art rule induction strategies are limited E.g. FOIL and C4.5 Build rules from RDF instances! 1. Extract instances from Seed Data 2. For each instance, build a rule: Build a skeleton rule Add triples to the rule Create a new rule if a triple’s predicate is Inverse Functional 3. Apply the rules to the web resources PREFIX foaf:<http://xmlns.com/foaf/0.1/> CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url } WHERE { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n . ?urlfoaf:topic ?p . ?pfoaf:name ?n . <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q . ?qfoaf:name ?m . ?urlfoaf:topic ?r . ?rfoaf:name ?m }

Inferring Web Citations using SPARQL Rules Seed data = solitary example to build rules State of the art rule induction strategies are limited E.g. FOIL and C4.5 Build rules from RDF instances! 1. Extract instances from Seed Data 2. For each instance, build a rule: Build a skeleton rule Add triples to the rule Create a new rule if a triple’s predicate is Inverse Functional 3. Apply the rules to the web resources PREFIX foaf:<http://xmlns.com/foaf/0.1/> CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url } WHERE { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n . ?urlfoaf:topic ?p . ?pfoaf:name ?n . <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q . ?qfoaf:homepage ?h . ?urlfoaf:topic ?r . ?rfoaf:homepage ?h }

Inferring Web Citations using SPARQL Rules Seed data = solitary example to build rules State of the art rule induction strategies are limited E.g. FOIL and C4.5 Build rules from RDF instances! 1. Extract instances from Seed Data 2. For each instance, build a rule: Build a skeleton rule Add triples to the rule Create a new rule if a triple’s predicate is Inverse Functional 3. Apply the rules PREFIX foaf:<http://xmlns.com/foaf/0.1/> CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url } WHERE { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n . ?urlfoaf:topic ?p . ?pfoaf:name ?n . <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q . ?qfoaf:homepage ?h . ?urlfoaf:topic ?r . ?rfoaf:homepage ?h }

Evaluation Measures: Precision, Recall, F-Measure Dataset 50 participants from the Semantic Web and Web 2.0 communities Seed data collected from Facebook and Twitter ~17300 web resources: 346 web resources for each participant Baselines Baseline 1: Person name as positive classification Skeleton SPARQL Rule Baseline 2: Human Processing

Results High precision Better than humans Triple Patterns Low recall Rules are strict No room for variability Hard to generalise No learning from disambiguation decisions

Conclusions SPARQL Rules are precise Poor generalisation however Outperform humans at low web presence levels “Needle in a haystack problem” User profiles provide seed data Inexpensively Capturing: Biographical information Social networking information Inability to learn from identifications Plan for future work Overcome poor seed data feature coverage

Twitter: @mattroweshow Web: http://www.dcs.shef.ac.uk/~mrowe Email: m.rowe@dcs.shef.ac.uk Questions? For more information: M Rowe and F Ciravegna. Disambiguating Identity Web References using Web 2.0 Data and Semantics. In Press for special issue on "Web 2.0" in the Journal of Web Semantics. (2010)

Inferring Web Citations using Social Data and SPARQL Rules

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Inferring Web Citations using Social Data and SPARQL Rules

Similar to Inferring Web Citations using Social Data and SPARQL Rules (20)

More from Matthew Rowe

More from Matthew Rowe (20)

Recently uploaded

Recently uploaded (20)

Inferring Web Citations using Social Data and SPARQL Rules

Editor's Notes