State of Play presentation at the LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web by Amar-Djalil MEZAOUR,Dassault Systèmes Exalead.
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
1. Creating Knowledge out of Interlinked Data
WP8: Linked Enterprise Data
“Active Hiring” use case demonstrator
Amar-Djalil MEZAOUR, Phd
Dassault Systèmes Exalead, France
LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
2. Creating Knowledge out of Interlinked Data
Luxembourg 1st Review Meeting Report Analysis
• Criticism:
• Use case specifications are not convincing
• RDF use and benefits in search are not clear.
• Concerns on the relevance of the use case application
• Unclear identification of business actors and incentives to use the
application
• Doubts on the volume of HR data that could be used
• Suggestions:
– Innovantage: http://www.innovantage.co.uk
LOD2 Event . 06.09.2010 . Page 2
2 http://lod2.eu
3. Creating Knowledge out of Interlinked Data
Response and action plan (1/5)
• Use case specifications are not convincing
• Exit (for the moment) the application on resumes:
– Few available data and so unable to showcase a convincing application
– Few RDF standards for managing resumes
– Optimisation of project resources to target a first release of a convincing application before
the 2nd review
• Refocus the use case on job vacancies:
– More available data (crawl of job boards or recruitment section of companies web sites)
– Identification of market opportunities: recruitment reports, dashboards and analysis for
business and administrations (Innovantage potential competitor)
– RDF support for representing job posts (JobPosting by http://schema.org)
LOD2 Event . 06.09.2010 . Page 3
3 http://lod2.eu
4. Creating Knowledge out of Interlinked Data
Response and action plan (2/5)
• RDF use and benefits in search are not clear
• Linked data, in RDF format, will help increasing the accuracy of the
application since it supports the process of data entities recognition:
identification and disambiguation of locations, hiring
organizations, industry domains, job titles …
• Search benefit: efficient indexing of key data entities. So retrieval and
query suggestions are more relevant.
• Consolidation and analytics will bring clear added value: aggregation of
numeric indicators will reflect the actual state of the indexed data
LOD2 Event . 06.09.2010 . Page 4
4 http://lod2.eu
5. Creating Knowledge out of Interlinked Data
Response and action plan (3/5)
• Concerns on the relevance of the use case application
• The existence of two potential competitors in UK (Innovantage &
Myresourcer) is a good indicator of the need of the market to have
intelligence survey on jobs vacancies and skills.
LOD2 Event . 06.09.2010 . Page 5
5 http://lod2.eu
6. Creating Knowledge out of Interlinked Data
Response and action plan (4/5)
• Unclear identification of business actors and incentives to use the
application
• HR departments of businesses and administrations are the target end users.
• In a similar way as Innovantage and Myresourcer, WP8 use case application
will provide tools for releasing a job market analysis by domains and
vacancies/skills. Linked data will provide the leveraging mean for enriching
the “end user” experience by providing additional information
• The dashboard of market intelligence widgets is one of the incentives that
we target to make businesses use and get advantage from WP8 use case
application
LOD2 Event . 06.09.2010 . Page 6
6 http://lod2.eu
7. Creating Knowledge out of Interlinked Data
Response and action plan (5/5)
• Doubts on the volume of data that could be used
• Job vacancies are easy to get from the web. A crawl process of major
job boards, companies web sites is enough to provide large amount of
job postings
LOD2 Event . 06.09.2010 . Page 7
7 http://lod2.eu
8. Creating Knowledge out of Interlinked Data
Next target
• The 2nd review in September
M24 M27 M48
First release of use
case application
Specification of the
semantic features in
use case dataflow
LOD2 Event . 06.09.2010 . Page 8
8 http://lod2.eu
9. Creating Knowledge out of Interlinked Data
Active Hiring use case specifications
• Job vacancies market dashboard and analytics using linked data
• Monitor hiring market on the web
• Provide insights on job leads
• Compute comprehensive dashboard of analytics on job vacancies trends
• Link and reuse linked datasets in HR enterprise application
• Enrich search experience by mashing up content from different sources
(social networks for example)
LOD2 Event . 06.09.2010 . Page 9
9 http://lod2.eu
10. Creating Knowledge out of Interlinked Data
Architecture and data workflow
import
HR vocabularies
HTML D2R +NLP
SERVER
crawl
RDFisation pipeline Indexing
Analytics
CloudView
Scraping RDF STORE
Search
LOD2 Event . 06.09.2010 . Page 10
10 http://lod2.eu
11. Creating Knowledge out of Interlinked Data
Requirements of the initial release of enterprise demo
• R1: Geolocation
• R2: Entities identification and linking
• R3: Mapping with HR resources and vocabularies
• R4: Duplicate (repost) detection
• R5: Mapping with legal regulations
• R6: Hiring support
• R7: Analytics
• R8: Mashup
LOD2 Event . 06.09.2010 . Page 11 http://lod2.eu
12. Creating Knowledge out of Interlinked Data
R1: Geolocation
• R1.1: Disambiguate job vacancies locations
• Paris, IDF, France vs. Paris, Virginia, United-States
• R1.2: Enrich job description with geo coordinates in RDF store
• For Paris, IDF, France: N 48° 51' 12''E 2° 20' 55''
• R1.3: Link with a reference resource (geonames, dbpedia,
freebase,…)
• GeoNameId: 2988507
• Dbpedia: ??
• Freebase: ??
LOD2 Event . 06.09.2010 . Page 12
12 http://lod2.eu
13. Creating Knowledge out of Interlinked Data
R2: Entities identification and linking
• R2.1: Extraction of salaries in job postings when available
• R2.2: linking hiring organizations with reference source, opencorporates
for example
LOD2 Event . 06.09.2010 . Page 13
13 http://lod2.eu
14. Creating Knowledge out of Interlinked Data
R3: Mapping with HR resources and vocabularies
• R3.1: Identification and import of occupations taxonomies
• R3.2: Identification and import of industry domains taxonomies
• R3.3: RDFisation of taxonomies
• R3.4: Mapping of job titles to occupation taxonomy labels
• R3.5: Classification of job vacancies by domain
• R3.6: Aggregate and match skills (skills in job description with
taxonomy required skills)
LOD2 Event . 06.09.2010 . Page 14
14 http://lod2.eu
15. Creating Knowledge out of Interlinked Data
R4: Duplicate detection
• R4.1: Detect job reposts and duplicates from different sources.
• R4.2: Merge same job vacancy posted in different sources
LOD2 Event . 06.09.2010 . Page 15
15 http://lod2.eu
16. Creating Knowledge out of Interlinked Data
R5: Mapping with legal regulations
• R5.1: Map job vacancies with laws and regulations
• ??
LOD2 Event . 06.09.2010 . Page 16
16 http://lod2.eu
17. Creating Knowledge out of Interlinked Data
R6: Hiring support
• R6.1: Integration with social networks
• ??
LOD2 Event . 06.09.2010 . Page 17
17 http://lod2.eu
18. Creating Knowledge out of Interlinked Data
R7: Analytics
• R7.1: Provide analytics of job posts by region
• R7.2: Provide analytics of job posts by occupation
• R7.3: Provide analytics of job posts by industry domain
• R7.4: Provide analytics of job posts in a timeline
• ??
LOD2 Event . 06.09.2010 . Page 18
18 http://lod2.eu
19. Creating Knowledge out of Interlinked Data
R8: Mashup
• R8.1: Advanced search using identified criteria
• R8.2: Enhanced interface with analytics widgets
• R8.3: Geolocation of search results in a map service
• R8.4: Enhance displayed content with information provided by
external sources (organization info, country info, related
news, stock exchange info, ….)
• ??
LOD2 Event . 06.09.2010 . Page 19
19 http://lod2.eu
20. Creating Knowledge out of Interlinked Data
Roadmap
R1.1 R1.2 R1.3 R2.1 R2.2 R3.1 R3.2 R3.3 R3.4 R3.5 R3.6 R4.1 R4.2 R5.1 R6.1 R7.1 R7.2 R7.3 R7.4 R8.1 R8.2 R8.3 R8.4
priority
Deadline
TO BE DISCUSSED DURING WP8 BREAKOUT SESSION
component
Partner
TO BE DISCUSSED DURING WP8 BREAKOUT SESSION
HIGH
MEDIUM
LOW
LOD2 Event . 06.09.2010 . Page 20
20 http://lod2.eu
21. Creating Knowledge out of Interlinked Data
Preexisting assets for the first release: Cloud Platform
• Outscale platform for hosting the demo:
• Outscale is a Dassault Systèmes company providing cloud computing
solutions for businesses and ISV (independent software vendors)
• TINA is Outscale’s cloud IAAS (Infrastructure as a service) cloud computing
service & software.
• TINA is Amazon EC2 compatible
• An access account will be created to WP8 partners within their
organization network (network IP mask).
• SSH Logging to the public IP of the VM host. For the moment, it
is 46.231.151.11
LOD2 Event . 06.09.2010 . Page 21
21 http://lod2.eu
22. Creating Knowledge out of Interlinked Data
Preexisting assets for the first release
LOD2 Event . 06.09.2010 . Page 22
22 http://lod2.eu
23. Creating Knowledge out of Interlinked Data
Preexisting assets for the first release: HR dataset
• Dataset of HRXML v3.2 data crawled from the web:
• 7,035 CVs: 110 in English and the others in French.
• 42,186 job opening descriptions all in English. The format is
documented here:
http://ns.hr-xml.org/schemas/org_hr-xml/3_2/Documentation/ComponentDoc/PositionOpening-noun.php
• XSLT processors to transform HRXML v3.2 to RDF:
• ResumeRDF
• JobPosting: more than 1 million RDF triples in virtuoso.
• JobPosting is a format for describing job descrptions in HTML pages.
• JobPosting is maintained by http://schema.org (Google, Yahoo! & Microsoft)
• An RDF schema of JobPosting is maintained by LATC projet:
http://schema.rdfs.org/all.rdf
LOD2 Event . 06.09.2010 . Page 23
23 http://lod2.eu
24. Creating Knowledge out of Interlinked Data
JobPosting overview
LOD2 Event . 06.09.2010 . Page 24
24 http://lod2.eu
25. Creating Knowledge out of Interlinked Data
JobPosting overview
LOD2 Event . 06.09.2010 . Page 25
25 http://lod2.eu
26. Creating Knowledge out of Interlinked Data
Preexisting assets for the first release: HR domain vocabulary
• O*NET data dictionary database 16.0 (Standard Occupational Classification) +
additional tab separated txt files:
LOD2 Event . 06.09.2010 . Page 26
26 http://lod2.eu
27. Creating Knowledge out of Interlinked Data
Conclusion
• COME TO THE WP8 BREAKOUT SESSION
LOD2 Event . 06.09.2010 . Page 27
27 http://lod2.eu
28. Creating Knowledge out of Interlinked Data
Thank you for your attention!
LOD2 Presentation . 02.09.2010 . Page http://lod2.eu