1) INSPIRE is a new digital library that combines the SPIRES HEP database with the Invenio open source digital library platform.
2) It provides powerful search and a navigable collection tree to organize over 2 million HEP documents.
3) DESY contributes HEP ontology keywords and classification of articles since 1964 to INSPIRE.
SEO-Optimized Title for INSPIRE Collaboration Digital Library Project Presentation
1. Zaven Akopov (DESY -L-)
For the INSPIRE Collaboration
DESY Computing Seminar
2. Joint Project of CERN, DESY, Fermilab
and SLAC
SPIRES: wonderful system, largest HEP
database, best-curated content, but..old
engine (>30 years):
need a modern open-source multimedia digital
library
Unify SPIRES content with Invenio
platform
Invenio = Open source digital library
○ http://invenio-software.org
SPIRES + Invenio = InSpire
3. Invenio
Integrated digital library system
written largely in Python
MySQL database
modular built
Navigable collection tree
Documents organized in collections
Regular and virtual collection trees
Customizable portal-boxes for each collection
Powerful search engine
Specially designed indexes to provide fast search speed for
repositories of up to 2,000,000 records
Customizable simple and advanced search interfaces
Flexible metadata
Standard metadata format (MARC)
Handling articles, books, theses, photos, videos, museum
objects and more
User personalization
Baskets, e-mail notifications, comments, etc.
4. DESY participation
Input of Journal/Article Data
HEP Ontology (Keywords) Input
Hierarchy of HEP concepts based on
DESY HEP Thesaurus
DESY assigns keywords and
classification to HEP Articles since 1964
SPIRES/InSPIRE mirror website
5. Where are we?
First Beta site released April 2010
Production Beta released a week ago
http://inspirebeta.net
Live Now
Populated with SPIRES content daily
Additional features
Bugs are getting ironed out, but
already:
15. More to come
Personal libraries, alerts
Claim my papers (with arXiv and ORCID
(Open Researcher and Contributor ID))
Submit theses and old non-arXiv
material
Attach non-text material
OCR of older materials
Even better feeds (with ADS, arXiv,
Publishers)
16. Automatic Disambiguation
Henning Weiler - PhD student@CERN
On 963 documents, 21 real authors
could be identified for the query
"Chen, G".
22 orphans remain
98% identified
17. User Accounts
Tied to academic affiliation
Ability to correct information and
claim papers
Corrections still vetted by staff
Add “corporate accounts” for
collaborations
18. Data - Soon
Partnership and interlinking with HEPData
HepData reloaded: reinventing the HEP data
archive.
Andy Buckley, Mike Whalley. Jun 2010.
e-Print: arXiv:1006.0517 [hep-ex]
http://hepdata.cedar.ac.uk/
HEPData+INSPIRE working with LHC and other
experiments to ease submission process and
interlinking
Move towards citation/tracking use – reputation…
Storage for other objects like ROOT, Mathematica,
etc.
20. Full-cycle of a publication
Up to now, we've captured product:
Papers
Considering Data
Currently, through DPHEP, opportunity to
build infrastructure for capturing the
process:
Internal Notes
Technical/Software Documentation
Logbooks
21. Wikis
Increasingly popular central place to
aggregate documentation
Users structure the data for us
Backups and 'dumps' are generally easy
to make
And usually in an easily digestible
format (like XML)
22. Tools
For MediaWiki, most of the essential
tools already exist.
Wikimedia Foundation (Wikipedia) is
interested in seeing what we do with them.
From discussions with them, they are
supportive of what we're trying to do
23. Nascent BaBar Wiki
MediaWiki Instance with:
162 content pages
201 total pages (talk, redirects, etc.)
22 registered users
Simple script can easily produce dumps.
24. Scenarios
Level 0 Service: Basic Preservation
Index and store wiki snapshot data as if it
were a scientific publication (with many
authors)
Level 1 Service: Readable Snapshots
Level 0 + read-only final version
respecting formatting, etc.
Level 2: Multiple Snapshots
Level 0 + Level 1 for each of multiple wiki
“release points”, with full(?) metadata
Linking with Papers
25. Publication/Drafting History: H1
Example
A publication history includes:
Set of preliminary results (typically, prepared
for/as conference reports), short papers with
associated figures.
Actual publication process which begins with a
pre-T0 report, which goes then through T0 talk
to First/Second/… draft.
Each draft stage has it’s set of answers
(comments by collaboration and answers to
them); typically a referee report
And a final version that goes to the journal.
28. How does it work?
External Users can see the links from
Conference talks to final papers, but
nothing in between
Access control – must be registered and
validated (e-mail ping): already planned
“Corporate” accounts for collaboration to
update page
Individual access via connection with
collaboration…(Any paper? Current
membership? What about long-term?)
In development
29. Access
Main challenge: Access policies and their
technical implementation
Need input from collaborations to create policies.
One size does not fit all.
Easy – master access file maintained by coll.
But not long-term…
Medium – Computation based on author lists
(not always correct?)
Harder – Individual access lists depending on
date of object and date of access
OAIS (ISO standard) etc. can help us implement
these in line with archival best practices