Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC)
1. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Bringing Mathematics To the Web of Data:
the Case of the
Mathematics Subject Classification (MSC)
Extended Semantic Web Conference 2012
Ch. Lange1,2,3 , P. Ion4,5,6 , A. Dimou5 , Ch. Bratsas5 , W. Sperber7 ,
M. Kohlhase2 , I. Antoniou5
1 School of Computer Science, Univ. of Birmingham, UK 2 Computer Science, Jacobs Univ.
Bremen, DE 3 SFB/TR 8 “Spatial cognition”, Univ. of Bremen, DE 4 Mathematical
Reviews/American Mathematical Society, US 5 Web Science, Aristotle Univ. Thessaloniki,
GR 6 Univ. of Michigan, Math. Dept., US 7 Zentralblatt MATH/FIZ Karlsruhe, DE
Project page: http://msc2010.org/mscwork/
2012-05-30
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 1
2. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
The MSC in Paper Publications
Three-level tree structure:
52 Convex and discrete geometry
53 Differential geometry
53A Classical differential geometry
53A04 Curves in Euclidean space
53A45 Vector and tensor analysis
53B Local differential geometry
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 2
3. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Browsing PlanetMath.org by Subject
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 3
4. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Searching MathSciNet by Subject
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 4
5. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Uploading to arXiv.org
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 5
6. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
How to Know the Right MSC Code?
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 6
7. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
The MSC Master Source So Far
MajorSub 53-++SubText Differential geometry
SeeFor{For differential topology, see SbjNo 57Rxx.
For foundational questions of differentiable manifolds,
see SbjNo 58Axx}
...
SecndLvl 53AxxSubText Classical differential geometry
...
ThirdLvl 53A45SubText Vector and tensor analysis
Processing MSC-related information (in applications and for
maintenance) requires specially tailored scripts!
Who knows how to
write them?
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 7
8. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Redesign Requirements
1 facilitate use and reuse
for Mathematical Reviews/Zentralblatt MATH services
but also for 3rd-party publishers and authors
2 facilitate maintenance:
preserve all existing information, leave room for semantic
refinements
use standard tools instead of custom scripts
integrate maintenance-related information into the scheme
3 enable knowledge workers and service developers to adapt and
extend the MSC:
connections to related subjects e.g. in science
add unofficial translations
. . . without impairing the editorially controlled core scheme
4 allow end users to explore connections to related subjects
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 8
9. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Our Choice: a SKOS Linked Dataset
RDF linked dataset, using SKOS as vocabulary – same as these:
← Dewey Decimal
Classification (DDC,
http://dewey.info)
Library of Congress
Subject Headings (LCSH,
http://id.loc.gov/
authorities/
subjects.html) →
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 9
10. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
The Basic Hierarchy (SKOS Core)
63 top-level nodes, 528 second-level nodes, 5606 leaves
Straightforward application of SKOS vocabulary terms:
hasTopConcept narrower
Concept-
Concept Concept
Scheme topConceptOf
broader
inScheme
inScheme
msc2010:53A45 a skos:Concept ;
skos:inScheme msc2010: ;
skos:broader msc2010:53Axx ;
skos:prefLabel "Vector and tensor analysis"@en ;
skos:notation "53A45"^^mscsmpl:MSCNotation .
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 10
11. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Multilingual Labels (SKOS Core)
TEX source had English labels
Trusted parties contributed Chinese, Italian and Russian labels
(stored externally)
msc2010:53A45
skos:prefLabel
"Vector and tensor analysis"@en,
"向量与张量分析"@zh .
Greek labels needed, but no official ones available?
No problem, merge a separate graph!
msc2010:53A45
skos:prefLabel
"Διανυσµατική και τανυστική ανάλυση"@el .
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 11
12. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Mathematical Markup in Labels (SKOS Core)
215 out of 6198 labels (3,4 %) contain mathematical markup
26E10 C∞ -functions, quasi-analytic functions
Unicode covers most of it, e.g. bold Greek letters,
sub-/superscript digits, operators
No two-dimensional markup (fractions, matrices)
23 remaining problematic labels:
expressions in a sub-/superscript: Sn−1
non-standard sub-/superscript letters: 1k , Hp , vn
sub-/superscript symbols: C ∞
overlined operators: ∂
Solution: MathML
msc2010:26E10 skos:prefLabel "<mml:math alttext="$C^infty$">
<mml:msup><mml:mi>C</mml:mi><mml:mi>∞</mml:mi></mml:msup>
</mml:math>-functions, quasi-analytic functions"^^rdf:XMLLiteral .
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 12
13. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Linking Partitively Related Concepts (Extension)
Three types of non-symmetric links:
00A08 Recreational mathematics (see also 97A20)
→ straightforward SKOS extension property
20F60 Ordered groups (see mainly 06F15)
→ straightforward SKOS extension property
11Hxx Geometry of numbers (for applications in coding
theory, see 94B75)
→ a bit trickier (next slide)
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 13
14. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Faithfully Representing “See For” Links
scope
for applications in coding theory
fo
or
rT
eF
ar
ge
se
t
seeConditionally
11Hxx 94B75
ty
pe pe
ty
Concept
mscvocab seeFor ○ mscvocab forTarget ⊑ mscvocab seeConditionally
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 14
15. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Linking Across Concept Schemes (SKOS Core)
MSC2000 still widely in use
explicit links to MSC2010 would assist migration
Typical cases:
no change → skos:exactMatch
reclassification → skos:relatedMatch
e.g. 05E40 “Combinatorial aspects of commutative algebra”
partly replacing the MSC2000 classes 05E20 and 05E25
diversification → skos:broadMatch
e.g. 97-XX “Mathematics education”:
MSC2000: 49 concepts
MSC2010: 160 concepts
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 15
16. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Linking to non-SKOS Concepts (Extension)
Some relevant classification schemes not fully available in SKOS
DDC dataset only covers the top three levels
(just 9 classes for mathematics )
We know more fine-grained mappings and represent them
using local DDC placeholders
msc:53A45 skos:relatedMatch [
a skos:Concept ;
dcterms:isPartOf ddc:, msc: ;
skos:notation
"515.63"^^<http://dewey.info/schema-terms/Notation> ;
skos:prefLabel "Vector, Tensor, Spinor Analysis"@en ] .
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 16
17. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Collections Besides the Hierarchy (SKOS Core)
01 History and biography (see also the classification number
–03 in the other sections) – how to link there?
52 Convex and discrete geometry → 52-03 Historical
53 Differential geometry → 53-03 Historical
msc:HistoricalTopics a skos:Collection ;
skos:prefLabel "Historical topics"@en ;
skos:member msc:03-03, ..., msc:97-03 .
Further candidates:
explicitly given: general reference works (–00), instructional
expositions (–01), works on computational methods (–08)
requiring conceptual analysis: stability of different mathematical
structures (scattered all over the MSC)
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 17
18. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Co-Classification Policies (Extension?)
03-03 Historical (must also be assigned at least one
classification number from Section 01)
01A55 19th century
01A70 Biographies, obituaries, personalia, bibliographies
...
Not represented fully explicitly for now, . . .
. . . but kept in one central place, separate from concept labels
msc:HistoricalTopics skos:note "Any resource classified as -03
must also be assigned at least one classification number
from Section 01." .
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 18
19. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
URI Format
http://msc2010.org/resources/MSC/2010/53A45
expanded dataset has 92,000 triples (7 MB in RDF/XML)
typical linked data clients need few MSC classes:
publications typically classified with two MSC classes
superclasses may also be of interest
Therefore:
“slash URIs” . . .
. . . plus a SPARQL endpoint
. . . plus all-in-one downloads for developers
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 19
20. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Development and Deployment of the Dataset
one RDF/
core SKOS
XML file
(other serializations)
split per resource
cwm/ (Makefile)
Python rdflib
core SKOS expanded SKOS SPARQL
TEX
custom (RDF/XML) N3 (N-Triples) import endpoint
Perl ruleset
script (cwm) cwm/ (for “end users”)
Python rdflib
expanded SKOS
(other serializations)
all available from http://msc2010.org/mscwork/, licensed under
CC-BY-NC-SA
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 20
21. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Expanding a Dataset with N3 Rules
As little manual maintenance as possible, . . .
. . . with maximum convenience for stupid linked data clients.
# infer skos:broader back-links from skos:narrower
# (actually hard-coding the semantics of owl:inverseOf)
{ ?concept skos:narrower ?narrowerConcept }
=> { ?narrowerConcept skos:broader ?concept }.
similarly for
un-reifying the “see for” links
dumbing down MSC-specific links to rdfs:seeAlso
Makefile applies this using
cwm --rdf msc2010-core.skos --n3 expand-skos-rules.n3 --think
(expansion from 79,000 triples to 92,000 triples = + 16 %)
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 21
22. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Benefits
Benefits experienced immediately:
all information preserved, in most cases more explicitly
(hierarchy, cross-references)
links to other concept schemes and translations included with
the core scheme
rigorous conceptual modeling helped to uncover
conceptualization issues in the MSC
easy maintainability (in our deployment workflow)
Benefits envisaged potentially:
easy maintainability (in the editorial workflow)
promoting widespread adoption thanks to existing search,
query, editing, consistency checking, and annotation tools
supporting reuse in linked data settings, and in legacy settings
(by easier conversion to non-RDF formats)
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 22
23. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Multilingual Labels vs. Mathematical Markup (I)
Two warm-up questions:
1 Who thinks that XML literals in RDF are obsolete?
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 23
24. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Multilingual Labels vs. Mathematical Markup (I)
Two warm-up questions:
1 Who thinks that XML literals in RDF are obsolete?
We do not think so!
2 Who knows why RDF literals may either have a language or a
datatype?
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 23
25. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Multilingual Labels vs. Mathematical Markup (I)
Two warm-up questions:
1 Who thinks that XML literals in RDF are obsolete?
We do not think so!
2 Who knows why RDF literals may either have a language or a
datatype?
I would like to get your advice!
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 23
26. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Multilingual Labels vs. Mathematical Markup (II)
Multilingual labels? No problem! (Plain literals)
Mathematical formulas in labels? No problem! (XML literals)
(just violates a “convention” from the SKOS recommendation)
Both plain and datatyped literals? ☇
Potential workaround: Encode language into the XML
<math xml:lang="en">...</math>
removes language information from the RDF data model
slows down SPARQL filtering by language
multiple prefLabels with “no language”?
Note: Cutting mathematical formulas out of the label texts is not
an option!
Not sure how other SKOS tools like this. . .
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 24
27. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
In Use (I): http://www.math.auth.gr
RDF Storage
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 25
28. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
In Use (II): Exploring Connections within AUTH
connections between three AUTH researchers (using MSC research
topics and other linked data), powered by RelFinder
(http://www.visualdataweb.org/relfinder.php)
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 26
29. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
In Use (III): http://alpha.planetmath.org
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 27
30. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Conclusion & Roadmap
Conclusion
LODified the central classification scheme in mathematics
(actually one of the first mathematical LOD sets)
SKOS and LOD largely satisfied our requirements, . . .
. . . but still semantic web standards are not quite ready for
mathematics.
Roadmap for the MSC dataset itself:
soon official announcement by Math. Reviews/Zentralblatt
adding precise definitions of the MSC classes
adding index terms to classes
introducing a faceted structure (beyond collections)
Roadmap for the Mathematical Web of Data (next slide)
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 28
31. Initial Situation Redesign Requirements Design Decisions Deployment Benefits & Difficulties Use Cases Conclusion & Roadmap
Roadmap: The Mathematical Web of Data
Connection points (besides the obvious DBpedia):
OpenMath Content Dictionaries (defining the semantics of
MathML; we have previously LODified them )
ACM Computing Classification System (soon officially in
SKOS )
PlanetMath (soon exposing its metadata as LOD)
Physics and Astronomy Classif. Scheme (on our own agenda)
European Digital Mathematics Library (interested in LOD)
MSC and other datasets enable fine-grained
classification of mathematical resources smaller than
articles (e.g. blog posts)
democratization of scientific publishing,
⇒
towards networked science
Lange et al. Bringing Mathematics To the Web of Data: the Case of the Mathematics Subject Classification (MSC) 2012-05-30 29