2. 2
Š2015 All rights reserved.
⢠Experience Director, Content Strategy;
Razorfish New York
⢠Co-editor of scatter/gather, a content
strategy blog:
http://scattergather.razorfish.com
⢠Author of Nimble: A Razorfish Report
on Publishing in the Digital Age (June
2010): http://nimble.razorfish.com
⢠Twitter: @rlovinger
29. 29
Š2015 All rights reserved.
Semi-structured information
allowed us to map the files to
content types and site sections,
and add some metadata (author,
published date, keywords, etc.)
10 years
x 50 issues per year
x 100 files per issue (approx.)
50,000 estimated articles
30. 30
Š2015 All rights reserved.
Once in the CMS, we could add
photos, links, formatting, etc.
31. 31
Š2015 All rights reserved.
For the content already in the
CMS, keywords had been
manually typed in by authors
⢠6790 âdifferentâ keywords
⢠Removed 12% during clean up
⢠Typos
⢠Redundant
⢠Not Useful
32.
33. 33
Š2015 All rights reserved.
⢠Star Wars: Episode I -- The Phantom Menace
⢠Episode 1
⢠Episode I
⢠Phantom Menace
⢠Star Wars Episode I The Phantom Menace
⢠Star Wars Episode I: The Phantom Menace
⢠Star Wars prequel
⢠Star Wars: Episode 1 -- The Phantom Menace
⢠Star Wars: Episode i -- the Phantom Menace
⢠Star Wars: Episode I: The Phantom Menace
⢠Star Wars: Episode I--The Phantom Menace
⢠Star Wars: Episode I--The Phantom Menance
⢠Star Wars: Episode One -- The Phantom Menace
⢠Star Wars: The Phantom Menace
⢠Star Wars: The Phantom Menace -- Episode I
⢠The Phantom Menace
⢠The Phanton Menace
35. 35
Š2015 All rights reserved.
⢠TAFKAP?
⢠The Artist
⢠Artist Formerly Known as Prince
⢠The Artist Formerly Known As Prince
⢠The Artist formerly known as Prince
⢠the Artist Formerly Known as Prince
⢠The Artist Formerly Known as Prince (PKA)
36.
37. 37
Š2015 All rights reserved.
⢠The magazine was once a week
⢠The website published new
articles several times a day
⢠Plus: Over 50,000 past articles!
⢠How could we better use all
that content?
38. 38
Š2015 All rights reserved.
If you like James Bond, we wanted it to be easy for you to
discover everything we had.
Cover Story
Interview
Photo Gallery
Etc.
41. 41
Š2015 All rights reserved.
We put our controlled vocabulary into categories, to make them more
distinct and meaningful.
For example:
⢠Book > Product > Harry Potter and the Goblet of Fire
⢠Movie > Product > Harry Potter and the Goblet of Fire
⢠Person > Individual > Daniel Radcliffe
⢠Person > Individual > J.K. Rowling
43. 43
⢠Relationships
defined for each
media type
⢠Managed
separately from
the article content
⢠The full set of
metadata was
available to all
articles
44. 44
Š2015 All rights reserved.
⢠Standard relationships
⢠For example, for Movie:
- Lead Performers
- Director
- Writer
- Release Date
- EW Grade
- Etc.
⢠Select a related category for
each relationship, as applicable
⢠Some allow multiple values
45. 45
⢠Authors just
selected the
primary category
⢠Related metadata
pulled in
automatically
⢠Updates appeared
on all articles
*Metadata categories and
relationships were managed
by a dedicated data librarian
47. 47
Š2015 All rights reserved.
⢠âBest Resultsâ linked directly to
an aggregated page based on
the category.
⢠For example:
- âCats & Dogsâ vs. âThe Truth
About Cats & Dogsâ
- The Green Mile (Movie) vs. The
Green Mile (Book)
48.
49. 49
⢠Wal-mart sold gallon jars of
Vlasic pickles for $2.97.
⢠A popular item â priced so low
it nearly put Vlasic out of
business.
⢠By achieving their goals, they
put themselves in a position
they might not survive.
See: http://www.fastcompany.com/47593/wal-mart-you-dont-know
50. 50
Š2015 All rights reserved.
⢠We wanted people to
discover older content, and
they did!
⢠By 2006, we had 16 years of
magazine and web content.
⢠Other Time Inc. publications
were interested in using our
categorization system, too.
54. 54
Š2015 All rights reserved.
The creator of Freebase (a semi-semantic UGC site for structured
content, now read-only) said EW.com was way ahead of its time.
60. 60
⢠An informal post on August 4th
⢠Notification sent out September 30th
⢠Shut down October 31st
61. 61
âWhat happened to my web page on my husband, Bob Champine,
that took me many years to put together on his career and which
meant a lot to me and to the aviation community. I noticed with 9.0
I lost the left margin and the picture of him exiting the X-1. I need to
restore it to the internet as it is history. Please tell me what to do. I
will be glad to retype it, I just donât want it lost to the world. I need
help. Gloria Champineâ
63. 63
âArchive Team is a loose collective of rogue archivists,
programmers, writers and loudmouths dedicated to
saving our digital heritage. Since 2009 this variant
force of nature has caught wind of shutdowns,
shutoffs, mergers, and plain old deletions - and done
our best to save the history before it's lost forever.â
72. 72
⢠In 6 months Archive Team saved 900 Gb
⢠Estimated 4-5 Tb total
⢠Other people saved additional pages,
but probably Âź is gone forever
⢠For many people, Geocities was their
first web presence
76. 76
Those screenshots were automatically generated from
Geocities sites rescued by Archive Team in 2009
See more at One Terabyte of Kilobyte Age Photo Op:
http://oneterabyteofkilobyteage.tumblr.com/
77. 77
Due to lack of metadata:
⢠The rescued data was less useful
⢠Really bulky files
⢠Case-sensitive filenames difficult to access and read
⢠Not in a web-ready format (WARC)
⢠The process was less efficient and more error prone
⢠Poor tracking of completed activity
⢠Lots of duplication of data
⢠Took way too long (6 months vs. 3 days)
⢠Could have gotten all the data in a month (estimated)
79. 79
Š2015 All rights reserved.
Mission:
The Internet Archiveâs purposes include offering permanent access
for researchers, historians, scholars, people with disabilities, and the
general public to historical collections that exist in digital format.
Photo by Ulf Benjaminsson
83. 83
Save the history before it's lost
forever
Offer permanent access to
historical collections that exist in
digital format
84. 84
Š2015 All rights reserved.
Internet Archive contains: web pages, texts, videos, audio files,
software, and images. (Plus concerts and collections)
⢠Media Type makes it Readable or Playable
⢠Emulator (for software) makes it Executable
⢠Subject Keywords makes it Findable
85.
86. 86
Š2015 All rights reserved.
⢠Is it Accurate?
⢠Is it Credible?
⢠What is the Source? (machines or people)
⢠Itâs a lot of Effort. Do we have enough people and time?
87.
88. 88
Š2015 All rights reserved.
Additional processing takes place, depending on the type
89. 89
⢠Description and keywords are required, but open fields
⢠Other metadata is optional
92. 92
Š2015 All rights reserved.
⢠For user-generated content, itâs just easier for people not to.
⢠Internet Archive will never have enough people on staff to do it
properly.
94. 94
⢠Small a pool of volunteers, and
their drive didnât last long
⢠Tools didnât provide immediate
feedback/satisfaction. They had
to email their inputs and wait.
Photo by psyberartist
95. 95
⢠10 most common words + 10
most common 2-word phrases
⢠Applied to 200,000 items
⢠Much more scalable
⢠Heavily machine assisted: a
person can validate data and
create collections
Photo by James St. John
98. 98
Topics:
switch, atari,
antenna, game,
cable, terminals,
console, television,
video, program,
power supply,
console unit, video
computer, game
program, computer
system, atari game,
power switch,
switch box, atari
video, screw
terminals
99. 99
Having the stuff is vital, the
most important thing. But
itâs also vital to have a
system by which these
things are described.
âIf a person canât get the
information they need, then
weâre failing.â
Photo by Rachel Lovinger
100.
101. 101
⢠Jason had converted to a
metadata advocate
But I realized thatâŚ
⢠Content strategists who care
about the long game should
think like historians,
archivists and futurists, too.
103. 103
⢠Dutch leader in academic research and education on
biodiversity and taxonomy.
⢠Has a collection of 37 million natural history objects.
104. 104
Describe, understand and explore biodiversity for human
wellbeing and the future of our planet.
They do this with:
⢠Accessible collections
⢠Contributions to global
scientific research
⢠Awe of natural history
⢠Openly shared knowledge
105. 105
⢠From 2010 to June 2015
⢠250 staff members & 450 volunteers
⢠Digitizing 7 million objects in detail
⢠Adding metadata for the other 30 million objects
106. 106
⢠Information is
more easily
discovered,
studied, and used.
⢠Scientists
worldwide can
access it directly
online, without
assistance.
⢠Some of this data
has never been
available in digital
form before.
107. 107
⢠Scientific name
⢠Where it was found
⢠When it was found
⢠Who found it
âObjects [in the collection] have no scientific value
without this information.â - Suzanne de Jong-Kole
111. 111
⢠Vele Handen = Many Hands
⢠People helped transcribe
hand written labels
⢠In 9 months, people did
200,000, of which about half
were usable.
112. 112
The person who collected the specimen wrote the metadata on the label.
This could be a professional researcher, or a non-professional enthusiast.
115. 115
When they wrote this metadata, they had no idea that nearly
half a millennium later people would be âdigitizingâ it.
116. 116
Š2015 All rights reserved.
The âlove noteâ is when
you behave selflessly for
a partner â or customer â
that doesnât exist yet.
A drawing Jason drew in my notebook in high
school, 20+ years before we ever dated.