Description of the origins and development of the BookServer architecture and the Open Publication Distribution System (OPDS). Why OPDS Catalogs can help build a web of books. Discussion of the challenges ahead.
3. Motivating issues
Entering the digital fold,
a tangled landscape:
1. finding the book
2. format of the book
3. acquiring the book
4. Finding the book
Open web search? (Google, Bing, etc)
Publisher website? (Tor.com, Sourcebooks)
Online bookstore? (Amazon, Indigo, B&N)
Indie bookstore? (Vroman’s, Powell’s)
Alt. vendor? (Smashwords, Kobo)
5. Format of the book
Highly structured display (pdf)
Downloadable book package (epub, mobi)
Web- or “cloud”-based (Google Editions)
Non-standard enhanced book (Blio)
6. Acquiring the book
Reading systems –
Amazon Kindles, Sony Readers, B&N nook
IBIS Reader, Aldiko, Stanza, Kobo
Standard desktops and laptops
Game consoles (Wii)
Apple iPad
8. What readers want
What readers want to have ..
Be able to find the books they want,
in the formats that they can use,
for the device that they have,
and not have it be painful.
9. Book distributors
What publishers, libraries, bookstores want -
Make books available for discovery,
with accurate descriptive information,
at as many different places as possible,
under the sales / use terms permitted.
11. For the United States
Even the U.S. Dept of Justice is an advocate:
“[book] data provided should be available in
multiple, standard, open formats supported by
a wide variety of different applications, devices,
and screens.”
13. BookServer: A future for books
Creating a new architecture using common,
open standards that permits people to find,
buy, acquire, and read books from any source,
on any device, using many different ebook
applications.
15. Relation: Library catalogs
Library 2.0 Gang (02/09):
Google books and libraries
“Open Catalogue Crawling Protocol”
Google, DLF, Talis, and others
Atom vs Sitemap discussions
17. OPDS “Catalog” launch
“The Open Publication Distribution System
(OPDS) is a generalization of the Atom [XML]
approach used by Stanza's online catalog.
...
I believe this effort has the potential to be a
critical enabler to the growth in access to, and
adoption of, digital books.”
- Bill McCoy, Adobe, 04.09
18. Getting the terms right
1. “BookServer” is the architecture.
2. “OPDS” is the technical specification.
3. “Catalogs” are made using OPDS.
4. “Atom” is the XML scheme for OPDS.
19. How it works
A reader ...
1. Browses a Catalog of titles -
2. selects a title for more information -
3. makes a purchase/borrow decision -
4. obtains book (PayPal, Amazon, Google) -
5. installs and reads the book.
20. What’s in this thing?
Catalogs provide manifests –
List of the titles available
Information about each title
Formats the title is available in
Ways the title can be acquired
21. A good catalog ...
Incorporates:
Flexible search
Intelligent hierarchy
Extensive faceting
22. Easily built
Catalogs can be derived from basic
bibliographic metadata. Such as:
ONIX, MARC, (ahem) spreadsheets
(Internally OPDS Catalogs use
simple Dublin Core metadata.)
23. Catalogs scale
Because Catalogs are easy to make –
Any web site can create a bookstore, incl.
distributors, bookstores, and publishers.
Aggregators can combine multiple catalogs.
Search engines can harvest aggregations.
24. Based on Atom
Because OPDS is based on a commonly
used XML standard, called Atom,
OPDS Catalogs can be read by –
web browsers
news readers (rss)
mobile applications
25. Distribution format
Because Catalogs contain simple data
describing books and their availability –
Catalogs can also be used for B2B, to distribute
data to partners for “harvest” instead of using
complicated standards.
(Future: “real time web” notifications.)
26. Catalogs are emergent
Because we use open standards for describing
data, it is possible to link bibliographic book
data more easily.
§ Book reviews
§ Reading lists
§ Annotations
29. Why not ONIX?
ONIX (and BISG “BookDROP”) are:
Designed for a different use cases
Complex standard with many options
Not widely used beyond publishing
Not understood by web browsers
Established; change is difficult
30. Make Books Apparent
A workshop sponsored by the Internet Archive
October 19-20, Fort Mason, San Francisco, CA
With the assistance (among many others):
O’Reilly Media http://oreilly.com/
Threepress http://threepress.org/
Feedbooks http://feedbooks.com/
Book Oven http://bookoven.com/
33. Building the ecosystem
For this to work, we need:
1. Good (independent!) reading systems
2. Books, journals, magazines, and more!
3. Publishers must contribute current content!
4. Revenue in the system.
36. Issues – I
Metadata
Matching title <> reader is not trivial.
FRBR, recommending, clustering
- and then there is plain old GIGO
37. Issues – II
Aggregation
Two roles for OPDS:
1. simple publication
2. catalog aggregation
Aggregating resembles metasearch:
out of many sources must come order.
38. Issues – III
Identifiers
OMG. Where does one start?
- Author, work, and subjects.
Data from publishers (book and journal);
libraries, trade organizations and assns.
39. Issues – IV.a
Territorial Rights
Publishers carve up markets into territories,
geographic and language-based.
Difficult to parse from application+metadata.
Spanish publishers typically retain worldwide
spanish-language rights.
40. Issues – IV.b
Territorial Rights
Territorial rights make zero sense for
digital editions (n.b. language might).
Publishers must obtain non-geographic
rights for electronic text versions.
(Regional DVD codes is a sad analogy).
41. Issues – V
Search
OPDS defines search via OpenSearch.
OpenSearch ver status is “under development”
and not really controlled by anyone (origin: A9).
Could benefit from support and enhancement.
42. Issues – VI
Faceting
On a small screen device, faceting must be
a normative discovery user interface form.
What is baked in? – Top-20. Classics. New.
What is algorithmically derived, on the fly?
43. Issues – VII
Bookshelves
Users should be able to define and maintain
their own book lists in OPDS format.
These might even be portable across book
hosting services.
44. Issues – VIII
DRM
Bad word, but many publishers still reliant.
Best market solution: Adobe ACS4
Desperate need for open source solution.
(Perhaps premised on “social-DRM” spec.)
45. Issues – IX
Vending
Not a trivial problem.
Need an abstracted selling API.
Application elicits essential purchaser data,
then handles transaction “under the covers”
Paypal, Google Checkout, Amazon Checkout
46. Issues – X
Lending
Internet Archive would like to lend books
(directly, not via a third-party).
Is every lending a renting? (no ... !)
Is there digital first-sale? (yes ... !)
Options: ACS4, streaming (cloud)
47. Issues – XI
Hello World!
Currently no way for new OPDS Catalogs to
announce themselves to the world.
We have discussed a “ping server” to aid the
auto-aggregation of Catalogs. This remains
a manual notification process.