These are the slides from a presentation I gave about Open Library at the California Association of Museums conference earlier this year. It outlines the general approach I took to redesigning the catalog of Open Library.
- http://www.calmuseums.org/index.cfm?fuseaction=Page.ViewPage&pageId=493
- http://openlibrary.org
2. By way of
Introduction
Wednesday, February 3, 2010
3. http://www.flickr.com/photos/dork/4040497259/
Wednesday, February 3, 2010
The Open Library project is produced out of the Internet Archive. We recently moved in to this
church in The Richmond in San Francisco. We’re all dreaming about ways we might be able to
transform it into a library.
4. Universal Access to
All Knowledge
Wednesday, February 3, 2010
Since 1996, the non-profit Internet Archive has been building a digital library of Internet sites
and other things in digital form. archive.org has a ton of texts, video, software, live music...
all sorts of things.
7. archive-it.org
Wednesday, February 3, 2010
Archive-It, a subscription service from the Internet Archive, allows institutions to build and preserve
collections of born digital content. Collections are hosted at the Internet Archive data center and are
accessible to the public with full-text search. If you’re interested, you can see live demos online - the
times are listed on the main page there.
8. nasaimages.org
Wednesday, February 3, 2010
NASA Images was created through a partnership with NASA to bring public access to NASA's
image, video, and audio collections in a single, searchable resource.
12. openlibrary.org
Wednesday, February 3, 2010
The Open Library project is about 3 years old, and its overarching goal for is to have a page on the web
for every book ever published.
We have gathered about 30 million edition records (23 million are available through the site now) from
people like the Library of Congress, University of Toronto, TALIS, San Francisco Public Library and
others, with more on the way. We have built a database infrastructure and the wiki interface, and you
can search millions of book records, narrow results by facet, and search across the full text of about 1.8
million scanned books.
13. Wandered into The Library.
What did I find?
Wednesday, February 3, 2010
I started about 8 months ago. I am a web designer by trade, mostly ignorant of the practice
of librarianship and cataloging. In fact, I’d just come from the world of folksonomy at Flickr.
No classification systems, full of humans and gorgeous photography.
14. Hmm...
• Dense library metadata
• Designed for classic institutional
search/retrieve practice
• Data is very “dry”, often of poor
quality Only title and author, for example
• No insight into the community
Wednesday, February 3, 2010
Coming in cold... so... what am I dealing with here?
15. Good!
• Loads of data > 23 million edition records
• Small user base < 20,000
• Small team 4 people + 2 advisors
• Small architecture 12 servers
• Flexible framework infogami, web.py
Wednesday, February 3, 2010
- But, there were also some good things!!
16. Understand relationships
http://flic.kr/p/6xCJQS
Wednesday, February 3, 2010
So, what have we got, and how does it all inter-relate?
Any relationship can be made into a hyperlink.
21. Open to
Exploration
Wednesday, February 3, 2010
22. Wednesday, February 3, 2010
At a guess, ~95% of ins2tu2onal online collec2ons begin with a search UI
23. Wednesday, February 3, 2010
‐ There’s a presump2on of knowledge, not encouragement of explora2on
‐ How do I know what to search for if I don’t know what you’ve got?
24. OBJECTS - MEANS - REASONS 24
Rules for a Dictionary Catalogue, Charles A. Cutter. 1904, Page 12.
Wednesday, February 3, 2010
Charles CuJer: Rules for a Dic2onary Catalog ‐ par2cularly interested to hear that a librarian should righNully expect their patrons to have
a 2tle, author or subject in mind.
25. 25
Wednesday, February 3, 2010
The trouble was, with many Open Library records, there just wasn’t much there... virtually useless?
26. 26
Wednesday, February 3, 2010
Records in isola2on have no story to tell, nothing to engage with, nothing to explore. Nothing to navigate.
31. Catalog as
Landscape?
http://www.flickr.com/photos/nov03/3639455345/
Wednesday, February 3, 2010
32. Deconstruction
http://flickr.com/photos/tupwanders/3356077817/
Wednesday, February 3, 2010
I’ve learned a wee bit about the history of library metadata... And museum metadata for that
matter.... It seems like the 1960s are a bit of a blight for human understanding, since that’s
the time when we got all excited about computers and their processing power, and seemingly
overwrote a lot of the crafty, poetic description and allusion...
What happens if you blow everything up?
33. LEADER: 01378cam 2200373I 4500
001 ocmocm01143845
003 OCoLC
005 19951211171151.0
008 750117r19531945nyu 000 1 eng u
019 $a4338553
040 $cSLC$dOCL$dTXA$dSFR$dOCoLC
049 $aSFRA
092 $aF$bSaLinger 1953
100 1 $aSalinger, J. D.$q(Jerome David),$d1919-
245 14 $aThe catcher in the rye.
260 $a[New York] :$bNew American Library,$c[1953, c1951]
300 $a192 p.$c18 cm.
490 0 $aSignet book,$vD1667
500 $aReprint of the 1945 ed. published by Little, Brown, Boston.
590 $aBarbara Grier and Donna McBride collection.
650 0 $aTeenage boys$vFiction.
650 0 $aBrothers and sisters$vFiction.
650 0 $aPreparatory schools$vFiction.
650 4 $aAlienation in teenagers$vFiction.
650 4 $aTeenage boys$xInterpersonal relations$vFiction.
650 4 $aEmotionally disturbed teenage boys$vFiction.
690 $aBarbara Grier and Donna McBride collection.
655 4 $aQueer pulps.
907 $a.b15331775$b10-24-07$c07-20-03
998 $axsf$b07-01-03$cm$da$e-$feng$gnyu$h4$i1
935 $aADM-9576
907 $a.b15331775$b02-23-04$c07-20-03
998 $axsf$b07-01-03$cm$da$e-$feng$gnyu$h4$i1
945 $aF SaLinger 1953$g1$i31223037153153$lxsfgl$o-$p$0.00$q-$rc$so
$t1$u0$v0$w0$x0$y.i25499191$z08-05-03
Wednesday, February 3, 2010
- I don’t want Open Library to jettison librarianship, or neglect to acknowledge the brilliant
hard work of librarians over the years...
- You could argue that this sort of computer-y librarianship (or any type of “educated
classification”) was (perhaps unintentionally) designed to obscure the personal... the
practical... the human
- How might we adapt or extend (or revert?) this librarians’ work to appeal to a broader
audience?
- Let’s see what happens when you explode Library of Congress Subject Headings. This data
isn’t even in Open Library - we borrowed it from loc.gov then pulled out the dynamite...
34. LEADER: 01378cam 2200373I 4500
001 ocmocm01143845
003 OCoLC
005 19951211171151.0
008 750117r19531945nyu 000 1 eng u
019 $a4338553
040 $cSLC$dOCL$dTXA$dSFR$dOCoLC
049 $aSFRA
092 $aF$bSaLinger 1953
100 1 $aSalinger, J. D.$q(Jerome David),$d1919-
245 14 $aThe catcher in the rye.
260 $a[New York] :$bNew American Library,$c[1953, c1951]
300 $a192 p.$c18 cm.
490 0 $aSignet book,$vD1667
500 $aReprint of the 1945 ed. published by Little, Brown, Boston.
590 $aBarbara Grier and Donna McBride collection.
650 0 $aTeenage boys$vFiction.
650 0 $aBrothers and sisters$vFiction.
650 0 $aPreparatory schools$vFiction.
650 4 $aAlienation in teenagers$vFiction.
650 4 $aTeenage boys$xInterpersonal relations$vFiction.
650 4 $aEmotionally disturbed teenage boys$vFiction.
690 $aBarbara Grier and Donna McBride collection.
655 4 $aQueer pulps.
907 $a.b15331775$b10-24-07$c07-20-03
998 $axsf$b07-01-03$cm$da$e-$feng$gnyu$h4$i1
935 $aADM-9576
907 $a.b15331775$b02-23-04$c07-20-03
998 $axsf$b07-01-03$cm$da$e-$feng$gnyu$h4$i1
945 $aF SaLinger 1953$g1$i31223037153153$lxsfgl$o-$p$0.00$q-$rc$so
$t1$u0$v0$w0$x0$y.i25499191$z08-05-03
Wednesday, February 3, 2010
- I don’t want Open Library to jettison librarianship, or neglect to acknowledge the brilliant
hard work of librarians over the years...
- You could argue that this sort of computer-y librarianship (or any type of “educated
classification”) was (perhaps unintentionally) designed to obscure the personal... the
practical... the human
- How might we adapt or extend (or revert?) this librarians’ work to appeal to a broader
audience?
- Let’s see what happens when you explode Library of Congress Subject Headings. This data
isn’t even in Open Library - we borrowed it from loc.gov then pulled out the dynamite...
35. 650 0 $aTeenage boys$vFiction.
650 0 $aBrothers and sisters$vFiction.
650 0 $aPreparatory schools$vFiction.
650 4 $aAlienation in teenagers$vFiction.
650 4 $aTeenage boys$xInterpersonal relations$vFiction.
650 4 $aEmotionally disturbed teenage boys$vFiction.
Wednesday, February 3, 2010
- I don’t want Open Library to jettison librarianship, or neglect to acknowledge the brilliant
hard work of librarians over the years...
- You could argue that this sort of computer-y librarianship (or any type of “educated
classification”) was (perhaps unintentionally) designed to obscure the personal... the
practical... the human
- How might we adapt or extend (or revert?) this librarians’ work to appeal to a broader
audience?
- Let’s see what happens when you explode Library of Congress Subject Headings. This data
isn’t even in Open Library - we borrowed it from loc.gov then pulled out the dynamite...
36. 650 0 $aTeenage boys$vFiction.
650 0 $aBrothers and sisters$vFiction.
650 0 $aPreparatory schools vFiction.
650 0 $aAlienation in teenagers vFiction.
650 0 $aTeenage boys$xInterpersonal relations vFiction.
650 0 $aEmotionally disturbed teenage boys vFiction.
Wednesday, February 3, 2010
- I don’t want Open Library to jettison librarianship, or neglect to acknowledge the brilliant
hard work of librarians over the years...
- You could argue that this sort of computer-y librarianship (or any type of “educated
classification”) was (perhaps unintentionally) designed to obscure the personal... the
practical... the human
- How might we adapt or extend (or revert?) this librarians’ work to appeal to a broader
audience?
- Let’s see what happens when you explode Library of Congress Subject Headings. This data
isn’t even in Open Library - we borrowed it from loc.gov then pulled out the dynamite...
37. Teenage boys, Fiction, Brothers and sisters,
Preparatory schools, Alienation in teenagers,
Teenage boys, Interpersonal relations,
Emotionally disturbed teenage boys
Wednesday, February 3, 2010
- I don’t want Open Library to jettison librarianship, or neglect to acknowledge the brilliant
hard work of librarians over the years...
- You could argue that this sort of computer-y librarianship (or any type of “educated
classification”) was (perhaps unintentionally) designed to obscure the personal... the
practical... the human
- How might we adapt or extend (or revert?) this librarians’ work to appeal to a broader
audience?
- Let’s see what happens when you explode Library of Congress Subject Headings. This data
isn’t even in Open Library - we borrowed it from loc.gov then pulled out the dynamite...
38. Teenage boys, Fiction, Brothers and sisters,
Preparatory schools, Alienation in teenagers,
Teenage boys, Interpersonal relations,
Emotionally disturbed teenage boys
Wednesday, February 3, 2010
- I don’t want Open Library to jettison librarianship, or neglect to acknowledge the brilliant
hard work of librarians over the years...
- You could argue that this sort of computer-y librarianship (or any type of “educated
classification”) was (perhaps unintentionally) designed to obscure the personal... the
practical... the human
- How might we adapt or extend (or revert?) this librarians’ work to appeal to a broader
audience?
- Let’s see what happens when you explode Library of Congress Subject Headings. This data
isn’t even in Open Library - we borrowed it from loc.gov then pulled out the dynamite...
39. Wednesday, February 3, 2010
So, we’ve exploded all the subject headings into constituent parts, retaining their types
(subject, person, place, time, work, org etc), and made them searchable. You can see here a
search for any subjects that mention Brothers and Sisters
40. Wednesday, February 3, 2010
Looking at the subject page, you can see the Works with the most editions in the top panel,
with a handy indicator to tell you if you can read an electronic version....
41. Wednesday, February 3, 2010
If I scroll down...we’ve collated all the publish dates of all the editions with that subject
42. Wednesday, February 3, 2010
And, we can also display subjects that are used most often in conjunction with “Brothers and
Sisters”, as well as the authors who write most about them, and publishers who publish
books about them
43. Wednesday, February 3, 2010
We can also collect subjects together at the author level. Here you can see what sorts of
subjects Salinger writes about...
44. Wednesday, February 3, 2010
Subjects related to J. D. Salinger - note that we’ve retained the Place/Person/Time categories,
but it’s likely we’ll fold Orgs, Works etc into the more general subject bucket.
45. Wednesday, February 3, 2010
Incidentally, my colleague Lance Arthur popped in and updated the Salinger record with a
note of his death.
46. “Books, even after they have been
given a shelf and a number, retain a
mobility of their own. Left to their own
devices, they assemble in unexpected
formations; they follow secret rules of
similarity, unchronicled genealogies,
common interests and themes.”
Alberto Manguel, The Library at Night
Page 163, “The Library as Chance”
Wednesday, February 3, 2010
47. “Books, even after they have been
given a shelf and a number, retain a
mobility of their own. Left to their own
devices, they assemble in unexpected
formations; they follow secret rules of
similarity, unchronicled genealogies,
common interests and themes.”
Alberto Manguel, The Library at Night
Page 163, “The Library as Chance”
Wednesday, February 3, 2010
Here are some other interesting examples...
57. Wednesday, February 3, 2010
Place and subject?
Wondering about whether or not you could actually stand on the surface of Halley’s Comet...
Is that a helpful classification of a place?
Which leads me on to the next chapter - a sort of radical exposure...
61. Wednesday, February 3, 2010
- incidentally, there were only 2 matches for the singular “gold mine”
- usage volume of any particular heading is indicated by the order of the list. most books ==
most used heading
- i wonder if these lists could help to normalise heading usage somehow... if that’s what we
want to do, of course...
63. U.S. Dept. of Agriculture, Forest Service, Forest Products Laboratory, 127 books
Forest Products Laboratory, 50 books
Dept. of Agriculture, Forest Service, Forest Products Laboratory, 38 books
U.S. Dept. of Agriculture, 37 books
U.S. Dept. of Agriculture, Forest Service, 30 books
Forest Products Laboratory, Forest Service, U.S. Dept. of Agriculture, 13 books
U.S. Forest Products Laboratory, 10 books
Dept. of Agriculture, 7 books
Wednesday, February 3, 2010
Prolific authors on the subject of Wood
http://upstream.openlibrary.org/subjects/wood
So, diabolically rational, and yet, pretty inconsistent, when you can see it in the aggregate
like this.
64. Norton 5,072 books
W.W. Norton 2,371 books
W. W. Norton & Company 2,320 books
W. W. Norton 1,577 books
W W Norton & Co Inc 933 books
W W Norton & Co Ltd 824 books
W.W. Norton & Co. 490 books
W.W. Norton & Company 281 books
Distributed by W.W. Norton 269 books
W W Norton & Co Inc (Np) 207 books
Jeffrey Norton Pub 151 books
Norton*(ww Norton Co 144 books
W. W. Norton & company, inc. 124 books
W. W. Norton & Co. 120 books
W.W.Norton 112 books
W.W. Norton & Company, inc. 85 books
Distributed by W.W. Norton & Co. 65 books
W W Norton & Co (Sd) 63 books
W.W. Norton & company, inc. 54 books
Distributed to the book trade by W.W. Norton 52 books
W.W. Norton & Company, Inc. 51 books
W.W. Norton & Company Ltd 48 books
W. W. Norton & Company, inc. 46 books
Wednesday, February 3, 2010
Variants on a publisher’s name
65. Wednesday, February 3, 2010
There’s also the issue of using old headings, for example, the sort of heading about a person
that contains birth/death dates. Useful disambiguation information. But, what happens when
they die?
66. Wednesday, February 3, 2010
For those of you who may be unaware, Mr. Bacon is no longer with us.
69. History
Wednesday, February 3, 2010
Publish dates... I’m wondering if a cataloguing system had a required field somewhere...
70. ?
History
Wednesday, February 3, 2010
Published in the future? Let alone nine thousand nine hundred and ninety nine?
71. “Build it so anyone can
contribute any amount.”
Clay Shirky
Wednesday, February 3, 2010
So, the idea is, someone sees an error like those publishing dates, and can go into the record
to correct them.
72. Wednesday, February 3, 2010
Admittedly, this is the editing history of a personal friend, Dinah Sander - who some of you
may know - but, after seeing the history graphs with incorrect info - Dinah couldn’t stop
herself from jumping in and removing the dodgy data.
73. Open to
Elaboration
Wednesday, February 3, 2010
74. Substrate:
any surface on which a plant or animal lives or on
which a material sticks
http://flic.kr/p/4itJcB
Wednesday, February 3, 2010
There’s also an alternate definition which suggests a substrate is catalytic; something that
facilitates a reaction.
75. What if we consider the library
records like that?
http://flic.kr/p/4itJcB
Wednesday, February 3, 2010
- The trick is, these records are like a new language. To use them and operate within them
requires specific training. While this is not necessarily a bad thing, and experts are wonderful,
it means that people like me (reasonably clever, been to Uni) can’t make use of them.
- What we’ve tried to do is reveal the substrate; to show the landscape of librarianship and
the beautiful work of classification that has been happening for centuries.
- Using the data we already had in our catalog, and without colliding with the taxonomy/vs
folksonomy issue.
76. http://flic.kr/p/6zyU3U Tension?
Wednesday, February 3, 2010
The Taxonomy vs Folksonomy debate may be represented thusly.
77. 1) Books are for use.
2) Every reader his [or her] book.
3) Every book its reader.
4) Save the time of the User.
5) The library is a growing organism.
Wednesday, February 3, 2010
So, on the basis of the idea of our current catalog being a substrate, as Ranganathan
suggests in his five laws of library science...
78. 1) Books are for use.
2) Every reader his [or her] book.
3) Every book its reader.
4) Save the time of the User.
5) The library is a growing organism.
Wednesday, February 3, 2010
Some suggest that this last law was intended more as a pointer to a library’s physical space:
its staff, its buildings, its shelves. I think it’s also useful to consider the law in the
classification space, particularly if you imagine that the Open Library catalog today is
effectively a substrate for further connections, elaboration and even corrections.
80. http://flic.kr/p/38TZ
Wednesday, February 3, 2010
What if a catalog looks like this? Is crystalline? What if it is unconstrained by the need to sort,
say, alphabetically?
From the artist of this image, Jared Tarbell: “Lines like crystals form at perpendicular angles
to existing lines. A complex form emerges.
1000 classic computational substrate, color palette stolen from Jackson Pollock: A simple
perpendicular growth rule creates intricate city-like structures. The simple rule, the complex
results, the enormous potential for modification; this has got to be one of my all time favorite
self-discovered algorithms. Lines likes crystals grow on a computational substrate.”
81. Activity/History
Wednesday, February 3, 2010
One of the key components to any happy social system is the visibility of other people, and a
sense of activity. This is one of the key elements we’re focussed on in the redesign. This
particular list shows all edits by humans on Open Library, and actually, turns out to be a
handy way to spot check what’s happening. You’ll notice too, there’s a special tab for the
variety of edits that we run across the system using bots. Often pretty mechanical and
repetitive, we found that the bots obscure the humans if you just mush everything up in a big
list, so we separated them.
82. Wednesday, February 3, 2010
So, here’s an example of a record I happened to spot one day as I glanced through the Recent
Activity list...
83. Wednesday, February 3, 2010
If you look closely, you’ll notice that apparently, this person believes the Collected Poems to
be part of the “pooop” series... and that they enjoy bacon.
84. Wednesday, February 3, 2010
Open Library is a Wiki. That means that you can see the entire editing history of any one
record, and easily undo any errors (or mischief).
91. Wednesday, February 3, 2010
There are lots and lots of sites on the web that deal with bookish information. Goodreads,
Librarything etc. Why not connect Open Library records to these sites too?
92. Wednesday, February 3, 2010
The Guardian, for example, also broadcasts !mely bookish information. We’re wondering how
we might fold that in to providing jump points for people into Open Library...
93. “We shouldn't waste
people's time making fixes
that would be better done
by machine.”
Edward Betts, Chief Data Munger, Open Library
Wednesday, February 3, 2010
94. Canonical ID?
Collect them.
Wednesday, February 3, 2010
95. Canonical ID?
Exchange them.
Wednesday, February 3, 2010
96. http://openlibrary.org/books/olid/OL7440033M
http://openlibrary.org/books/isbn/0385472579
http://openlibrary.org/books/isbn/9780385472579
http://openlibrary.org/books/lccn/93005405
http://openlibrary.org/books/oclc/28419896
http://openlibrary.org/books/id/240727
http://openlibrary.org/books/amazon/...
http://openlibrary.org/books/bookmooch/...
http://openlibrary.org/books/goodreads/...
http://openlibrary.org/books/ocaid/...
http://openlibrary.org/books/librarything/...
http://openlibrary.org/books/paperback_swap/...
http://openlibrary.org/books/Your ID Here/...
Wednesday, February 3, 2010
You can already ping Open Library with an ID other than the Open Library identifier to see if
we have any matches.
97. http://openlibrary.org/books/olid/OL7440033M
http://openlibrary.org/books/isbn/0385472579
http://openlibrary.org/books/isbn/9780385472579
http://openlibrary.org/books/lccn/93005405
http://openlibrary.org/books/oclc/28419896
http://openlibrary.org/books/id/240727
http://openlibrary.org/books/amazon/...
http://openlibrary.org/books/bookmooch/...
http://openlibrary.org/books/goodreads/...
http://openlibrary.org/books/librarything/...
http://openlibrary.org/books/ocaid/...
http://openlibrary.org/books/paperback_swap/...
http://openlibrary.org/books/Your ID Here/...
Wednesday, February 3, 2010
99. Tools
http://www.flickr.com/photos/genkigecko/3371739666/
Wednesday, February 3, 2010
Tools to help people help the data.
100. Wednesday, February 3, 2010
This is part of the list of Works by J. D. Salinger, which as you can see is far from perfect.
Humans can spot in a moment that some of these should be blended. Computers? Not so
much.
102. Lists
http://www.flickr.com/photos/cibi/3149659494/
Wednesday, February 3, 2010
- keep track of edits on things you’re interested in, or have edited
- export a small subset of records (bibliography, MARC, XML etc)
- provide another pivot for navigation in the networked catalog - “George has this book on
her “Famous Cheeses” list”
104. Lending
http://www.flickr.com/photos/readinginpublic/3999260222/
Wednesday, February 3, 2010
Starting to investigate what it might mean to lend of out-of-print ebooks.
If anyone would like to join in on that, please let me know.
105. upstream.openlibrary.org
http://www.flickr.com/photos/nationallibrarynz_commons/3326203787/ http://flickr.com/photos/daveynin/560170975/
Wednesday, February 3, 2010
We still have a lot to do. I’ve shown you some of the muddiness in the catalog, probably
mainly because our data is aggregated from a variety of systems. But, we think it’s a new way
to look for a book to read, and that’s exciting! I hope you’ll take some time to poke around
the new site. And please, do let me know what you think!
106. Thank You!
George Oates, glo@archive.org
http://flickr.com/photos/roadsidepictures/244926428/
Wednesday, February 3, 2010