I present the design and implementation of an ontology for scholarly event description (SEDE) to provide a backbone to represent, collect, share and allow inference from scholarly event information
Boost PC performance: How more available memory can improve productivity
SEDE: An Ontology For Scholarly Event Description
1. SEDE:
An Ontology for Scholarly Event
Description
Senator Jeong
senator@snu.ac.kr
Biomedical Knowledge Engineering Lab.,
Seoul National University
2. Publications
Senator Jeong. Toward Scholarly Event Digital Library Services. Bulletin of IEEE Technical
Committee on Digital Libraries. 2008 Fall 2008;4(2).
Senator Jeong, Hong-Gee Kim. “SEDE: An Ontology for Scholarly Event
Description“. Journal of Information Science. [in press] DOI: 10.1177/0165551509358487.
Senator Jeong, Sungin Lee, Hong-Gee Kim. “Are You an Invited Speaker?: A Bibliometric
Analysis of Elite Groups for Scholarly Events in Bioinformatics“. Journal of the American
Society for Information Science and Technology. 2009;60(6). pp.1118-1131.
Senator Jeong, Hong-Gee Kim. “Intellectual Structure of Biomedical Informatics reflected
in Scholarly Events“. Scientometrics. [in press].
2
3. Table of Contents
1 Introduction & Background
2 Generic Event Model
3 The SEDE Model & Implementation
4 Application Use Case Scenarios
5 Ontology Evaluation
6 Discussion & Conclusion
5. Scholarly Events
• Conferences, Workshops, Seminars, Symposia
• A sequentially and spatially organized collection of scholars’
interactions
• with the intention of
• Delivering and Sharing knowledge,
• Exchanging Research Ideas, and
• Performing related activities.
5
6. Scholarly Events
Publish up-to-date scientific research results,
Get feedback from scientific communities
Exchange research interests and ideas with each other
Demonstrate current research trends
7. Information Needs wrt Scholarly
Events
Information need of a simple magnitude
• Event Name, Topics
• Event Date, Venue, Organizer
• Due dates for Calls for Paper
A scientist does not gets a full and exhaustive picture of
scholarly events held in the world
• Due to the sheer volume of events held by various academic societies
and organizations
• no single information channel has been successful at keeping track of
ever-growing conferences and providing their information to scientists
8. Information Needs wrt Scholarly
Events
Scientifically meaningful inference
• prominent scientists
• prominent events
• best scientists suited for consultations and collaboration
might be met partially at a minimal level
• since almost all event websites list leadership members such as
• general chairs, committee members, invited speakers and/or award
winners
• Users are not able to get the whole picture
• existing library services do not provide this kind of meaningful
information in an integrated and collective manner
9. Research Goal
Satisfy scientists’ basic information needs
• by collecting, archiving and providing access to scholarly event
information.
Satisfy users’ in-depth information needs
• by excavating scholarly meaningful information through reasoning
about knowledge
To define a description base for scholarly events
• to enable software agents to crawl and extract event data, and
• to facilitate the unified access to, and reason about, the collected
data
10. Previous Work
• EventSeer, PapersInvited, Conference Alerts
– focus on calls for papers
– simple metadata about forthcoming events
– proprietary description formats
• Semantic Web Conference ontology
– best only for the ESWC conference
• Event Driven Model
– ABC ontology, INDECS, OntologX, FRBR, CIDOC-
CRM, Enterprise Architecture, Event Ontology
11. GENERIC EVENT MODEL
provide enough descriptive power and granularity to
span over multiple scientific disciplines and capture as many varied event
types as possible
14. THE SEDE MODEL &
IMPLEMENTATION
Ontology modelling principle
Scholarly event description structure
Key concepts in the SEDE ontology
n-ary relations and reification heuristics
Ontology improvement
15. Scholarly Event Description Structure
Scholarly Event
Session
Track Track
Atom
Session Session Event
Atom Atom
…
Event Event Atom
…
…
Event
Atom Atom
Event Event Scholarly
… Event
…
…
Session Session
Atom Atom
Event Event
…
…
…
Scholarly
Atom Atom Event
Event Event
15
16. foaf:Agent
foaf:Person playedBy foaf:Group
Role Event Series
hasSessionChair
hasPresenter CommitteeRole
isMemberEventOf
hasCommitteeRole
Session
hasSession Committee
hasAtomEvent hasSession
startDate
Track
AtomEvent hasCommittee
Time
hasTrack
endDate
hasTopic hasTopic
Event hasChildEvent
hasArtifact skos:Concept
geo:SpatialThing
Artifact hasTopic heldAt
VideoClip hasCall
Place Country
Call
hasProgram
City
foaf:Document skos:inScheme
Program
hasTheme Venue
hasProceedings
Paper Presentation skos:ConceptScheme
Proceedings 16
27. Ontology-based Information
Extraction
• The limitations of fully automatic information
extraction techniques
• The heterogeneous nature of event web pages
• Strategy
– to make use of a more simple approach of data
extraction,
– utilizes manually defined patterns of text content
and HTML formatting based on general conventions
for listing data in human-readable formats on the
web.
27
28. Method: Rule based Pattern Matching
Start Tag form: /aBCD
• a: Tag Category
Tag • BCD: Tag description
HTML Document Array
Opening HTML Tags:
• tr, p, div à newlines
List of rules for identifying similar patterns of tags
• td à Tab HTML Parser (Grammar Parser)
• li à bullet Chainer
Parse HTML
Closing HTML tags:
• p, table, li, h1-5, br à
newlines
String +
Text string chain index
+chain
Type
• Tokenize text
• pre-tag Text Tokenizer
• Separate punctuation marks Holds a hierarchy of realms
Tokenize
Realmer Each realm correspond to a different chain in the document
(/n, “”, ,, !, (), :,;, .) text
• append EOF tag
• split text by spaces
• return array of tokens
Realm Data
Token
Array
Extender
• Directory class call Modify
Directory Add Realm
‘createTagIndex’ function Realm Data
• Match Tags using REG
keyword matches and Assign Tags
gazetter lookup
Data
Lookup match Exporter Lookup Extraction
Rule
Extracted
Regular Data
Expression Gazetteer
Keyword
End 28
29. Method: Tag Cassification
Tag
Punctuations Literal Data & Numbers Grammar related Name-Related Keywords Additional
/pCOM /lEML /iYEA /gOF /nTTL /kUNI /xCAP
Category Tag Meaning
Grammer /gART [article ex. the|this|its|...]
Category /gOF of
/gFOR for
/gON on
/gAT at
/gIN in
/gABT about
/gFRM from
/gTO To | through | until
/gCNJ [conjunction = and | or | &] 29
30. Method: Tag Cassification
Tag
Punctuations Literal Data & Numbers Grammar related Name-Related Keywords Additional
/pCOM /lEML /iYEA /gOF /nTTL /kUNI /xCAP
Tag Meaning Example
/UNI university universtiy|college|academy|Universitat...
/CTR center center|centre|institute|department|division
/ORG organization society|association|council|consortium
/EVT event conference|conf|symposium|meeting|congress|roundtable|colloquium|seminar|summit|convention|forum|program
/QUA qualifier annual|biannual|biennial|interdisciplinary|special|joint|asian|european|international|metropolitan|national|polytechnic|glob
al|graduate|limited|ltd(.)?|incorporated|inc(.)?|int(.)|applied)
/SBJ subject (Aeronautics|aerospace|Agriculture|applications|Astronomy|Biology|Biotechnology|Biochemistry|bioinformatics|business
|Chemistry|Cryptology|Ecology|economics|Electronics|Energy|Engineering|Environment|Forensics|Geography|health|info
rmatics|information|Mathematics|Mechanical|medicine|Meteorology|Nanotechnology|Oceanography|Paleontology|Physic
s|Policy|Psychology|Research|science(s)?|security|securities|solution(s)?|Space|systems|technology|Vibrations|Wireless)"
/OTH other (webpage- "(Main|Media|Home|you|of|(Us)|((?i)(tutorial|proceeding(s?)|download|PDF|PostScript|HTML|MSWord|LaTex|Format|A
related) SCII|collocated|copyright|see|contact)))
31. Realms: Example
There were few surprises about the submission of the paper TEXT_CHUNK
It will take place at the University of Technology, Brahms, Canada. SUBMISSION_MARKER
UNIVERSITY_NAME
COUNTRY
Submission due date: September 5th, 2009 DEADLINE_CONTAINER
SUBMISSION_MARKER
DATE
Notification date: November 6th, 2009 DEADLINE_CONTAINER
NOTIFICATION_MARKER
DATE
Program Committee: COMMITTEE_MARKER
Dolldrum Flannery, University of Texas, USA AFFILIATION_GROUP
NAME
UNIVERSITY_NAME
COUNTRY
HTML Text Realms
34. Semantic Event Knowledge Domain KOS Academic
Search & Coupling Structure Generation Prominence
Retrieval Analysis Evaluation
…..
APIs
Knowledge SEDE
Base Ontology
Event Data Ontology
Extractor Editor
Event Data
Crawler Crawled Data
Web 34
35. Semantic S&R on Scholarly
Events(1)
• Finding events with a specific call-for-paper topic, a
submission deadline, and an event start date
SELECT DISTINCT ?Topic ?Event ?Deadline ?Event_Start
WHERE {
?x a sede:Event; rdfs:label ?Event. ?x sede:hasCall ?y.?y rdfs:label ?Call.
?y sede:hasTopic ?z. ?z skos:prefLabel ?Topic.
?y sede:submissionDeadline ?Deadline. ?x sede:startDate ?Event_Start.
FILTER ( (regex(?Topic, "data mining")||regex(?Topic, "Data mining") )||
(regex(?Topic, "Ontolog*")||regex(?Topic, "ontolog*") ) )
}ORDER BY ?Topic
35
36. Semantic S&R on Scholarly
Events(2)
• Retrieving artifacts from an atom event:
• A user missed an invited talk session on the
topic of “semantic search” at the ESWC2008
Conference. So, the user searches for invited
talk session covering that topic to come up
with its video clip URI.
36
37. Data Repositories
Bibliographic Repositories
Video Clip
Repositories
Presentation
Repositories
Artifacts
Presentation
Paper VersionOf Presentation VideoClip
hasArtifact hasArtifact hasArtifact
AtomEvent
SPARQL Query
Track
hasAtomEvent
hasAuthor hasTrack
hasPresenter
End User hasSession
RDF Endpoint: hasTopic
foaf:Person
http://eventography.org/query/
Session Event
skos:Concept
37
40. Semantic Event Knowledge Domain KOS Academic
Search & Coupling Structure Generation Prominence
Retrieval Analysis Evaluation
…..
APIs
Knowledge SEDE
Base Ontology
Event Data Ontology
Extractor Editor
Event Data
Crawler Crawled Data
Web 40
41. Coupling of Events and Scientists
sim ( Ei , E j ) =
∑w w t ,i t, j
∑w ∑w
2
t ,i
2
t, j
41
42. Semantic Event Knowledge Domain KOS Academic
Search & Coupling Structure Generation Prominence
Retrieval Analysis Evaluation
…..
APIs
Knowledge SEDE
Base Ontology
Event Data Ontology
Extractor Editor
Event Data
Crawler Crawled Data
Web 42
43. Domain Knowledge Structure
Analysis
(data mining and its usage context in Bioinformatics, cosine ≥0.1; k-nn 2; n=69) 43
44. *Co-word Analysis: Assumption
Topic C
article
Topic A
article These two
topics are likely
to be related
Topic B
article
……
……
44
45. *Co-word Analysis
t1 t2 t3 t4 Event Papers from
d1 1 0 1 0 Topics Events
d2 0 1 1 0
d3 0 1 1 1
t1 t2 t3
t1 0 1 3
t2 5 0 2 fi , j N
t3 Wi , j = × IDF =
TF × log
∑
1 2 0
k
nk , j ni
n n
t i i ∑x y ∑x y i i
= = 1= 1
i i
Cosine( x, y ) =
n n n n
t t
i
∑ xi2
= 1= 1
i
∑ yi2 = 1
i
(∑ xi2 ) × (∑ yi2 )
= 1
i
t
t
t
t
t
SNA.dat file
t
t
t
45
t
53. Scholar’s Prominence Evaluation
Definition (1)
# of Elite Group
Prominence Weight Membership
of Scholar S
Field
∑ n
t =1 ( wt kt | f )
P( S ) = τ t∈T
nf
Normalizer Elite Group # of Events in a
Type Specific Field
53
55. Scholarly Event’s Prominence
Evaluation Metrics
Definition (2)
Scholar’s
Event’s Prominence(Def. 1)
Prominence
∑ n
s =1 P( S )
P( E ) = τ s∈S
cf
# of Elite Group Member for an
Event belong to a Specific Field
55
57. Event Series’ Prominence Evaluation
Definition (3)
Event
Event Series Prominence(Def. 2)
Prominence
∑
g∈G
n
g =1 P( E )
P(ε ) = τ
zf
# of event instances (e.g.,AMIA2009)belonging to Event
Series (AMIA)in a given subject field (Medical Informatics)
57
60. Ontology Evaluation
Competency Question SEDE SWC
Does it have a Yes. It uses SKOS to describe No. It uses SWRC’s research topic which has
container for topics? topics. a limited number of topics.
Does it have a Yes. It has the Committee class No.
container for
committees?
Does it identify Yes. It defines a generic class Role No. It enumerates Chair, Delegate, Presenter,
various roles in a identifiable with a label. Program Committee Member, resulting in no
committee? mechanisms to identify variant names such as
co-chair, vice-chair, founder, etc.
Does it support the Yes. It is more flexible than SWC, Arguable. The WorkshopEvent, TutorialEvent,
representation of an in that it furnishes the class from the ConferenceEvent, and PanelEvent should be
event’s structure in a top level (Event) down to the leaf deprecated, since they can be described with
flexible way? level classes (AtomEvent). the top level class, such as AcademicEvent,
TrackEvent and SessionEvent.
Does it have a Yes, it has the Call class No. The Call class was deprecated, and it uses
container for Call? the CfP ontology.
CfP Vocabulary Specification, http://sw.deri.org/2005/08/conf/cfp.html 60
[1]
62. Discussion & Conclusion
• The SEDE ontology provides a backbone to represent,
collect, share and allow inference from scholarly event
information in a logical way
• Basic information needs
– semantic search and retrieval using the facts stored in the KB
• Scientifically meaningful information needs
– unearth hidden knowledge for the academic community
• SEDE
– helps to improve information accessibility through greater
semantic interoperability of information.
– makes it possible to build a scholarly semantic web
• isolated pieces of scholarly event data integrated through
relationships with other scientific data on the web thus creating
added information.
63. SEDE:
An Ontology for Scholarly Event
Description
Senator Jeong
senator@snu.ac.kr
Biomedical Knowledge Engineering Lab.,
Seoul National University