Web annotation has been receiving increased attention recently with the organization of the Open Annotation Collaboration and new tools for open annotation, such as Hypothes.is. In this paper, we investigate the prevalence of orphaned annotations, where a live Web page no longer contains the text that had previously been annotated in the
Hypothes.is annotation system (containing 20,953 highlighted text annotations).
1. Quantifying Orphaned
Annotations in Hypothes.is
Mohamed Aturban, Michael L. Nelson, and Michele C. Weigle
Department of Computer Science,
Old Dominion University,
Norfolk, VA 23529
1
TPDL 2015
Poznan, Poland
September 13-17, 2015
2. What is Web Annotation?
2
Handwritten Annotations Web Annotations
Haslhofer, B., Simon, R., Sanderson, R., Van de Sompel, H.: The Open Annotation Collaboration (OAC) model. In: Proceedings of the IEEE Workshop on Multimedia on the Web (MMWeb). pp. 5–9. IEEE (2011)
http://networkedlearningcollaborative.com/wp-content/uploads/2015/07/53e12bf10cf2d79877a53311.pdf
3. • Open Annotation Collaboration (OAC)
group defines an annotation as a set of
connected resources
What is Web Annotation?
3
4. • Open Annotation Collaboration (OAC)
group defines an annotation as a set of
connected resources
What is Web Annotation?
4
5. Why is Web Annotation Important?
• A collaborative tool
• Social criticism
• Education for students and teachers
• Scholarly and academic purposes
• Editing and publishing
Web annotations do not modify the original resource
5
7. The Annotation is Attached to the Live Web
7
http://www.wired.co.uk/news/archive/2014-02/06/tim-berners-lee-reclaim-the-
web/viewgallery/332234
August 2015 (from live web)
https://hypothes.is/a/tp5hnn4PTjuSg7h
bc7_cxQ
Annotation made in February 2014
Tags
8. A Review of Annotation Attachment to
the Live & Archived Web
8
Does an annotation attach to its
target’s live webpage?
Does an annotation attach to any
archived copies (Mementos) of the
target webpage
1 YES YES
2 YES NO
3 NO YES
4 NO NO
9. The Annotation is Attached to the Live Web
and to an Archived Copy (Memento) of the
Webpage
9
https://web.archive.org/web/20140207083733/http://
www.wired.co.uk/news/archive/2014-02/06/tim-
berners-lee-reclaim-the-web/viewgallery/332234
February 2014 (an archived copy)
http://www.wired.co.uk/news/archive/2014-
02/06/tim-berners-lee-reclaim-the-
web/viewgallery/332234
August 2015 (from live web)
https://hypothes.is/a/tp5hnn4PT
juSg7hbc7_cxQ
Annotation made in
February 2014
10. The Annotation is Attached to the Live Web but
No Mementos Are Available
10
http://tkbr.ccsp.sfu.ca/pub802/2015/01/more-
horsepower-to-wattpad/
August 2015 (from live web)
https://hypothes.is/a/o_2W8QwZR
Dm8w0F1dqRxUQ and
https://hypothes.is/a/NE8AT6R3Tn
6Qg6FeYy1C0w
Annotation made in
February 2015
The Annotation is
in Danger of being
Orphaned
11. The Annotation is Not Attached to the Live
Web but It is Attached to Mementos
11
https://web.archive.org/web/201412101210
18/http://climatefeedback.org/
December 2014 (an archived copy)
climatefeedback.org
Annotation made in December
2014
climatefeedback.org
August 2015 (from live web)
12. The Annotation is Not Attached to the Live
Web and No Mementos Are Available
12
http://renaissancejohnson.weebly.com/spensers-
wordcloud.html
August 2015 (from live web)
https://hypothes.is/a/wFLZKLGqS8
S3Zyfr_Rmu4w
Annotation in July 2015
The Annotation is
Orphaned
13. Four Different Cases For Annotation
Attachment
Does an annotation attach to its
target’s live webpage?
Does an annotation attach to any
archived copies (Mementos) of the
target webpage
1 YES YES
2 YES NO
3 NO YES
4 NO NO
13
In Danger of being Orphaned
Orphaned
Safe
Can Reattach to Memento
14. We Studied Hypothes.is
Annotations
•How many annotations are orphaned?
•How many annotations are in danger of
being orphaned?
•How many annotations can be reattached
to mementos in public web archives?
14
15. Related Work
• OAC introduced the idea to make annotations reusable
through different systems
• Annotations, as web resources, have unique URIs
• Sanderson and Van de Sompel introduced a framework
to make web annotations persistent over time
• Integrating features in the Open Annotation Data Model with
the Memento framework
• Reconstructing annotations for a given memento
• Retrieving mementos for a given annotation
• Kreymer's Browsertrix provides on-demand web
archiving
• Whenever an annotation is created, a copy of the related
webpage could be archived automatically
• Funded by Hypothes.is
15
Sanderson, R., Van de Sompel, H.: Making web annotations persistent over time. In: Proceedings of the 10th ACM/IEEE Joint Conference on Digital Libraries
(JCDL). pp. 1–10. ACM (2010)
http://blog.webrecorder.io/2015/06/open-annotation-fund-project.html
https://hypothes.is/blog/fund-on-demand-web-archiving-completion/
16. Annotations in Hypothes.is Are
Increasing
16
January 2015
(7744)
August 2015
( 33,946 )
January 2015 - dataset used in TPDL 2015 paper
August 2015 - dataset presented here and in arXiv version
17. Annotation Types in Hypothes.is
17
Highlighted
Text
Note Tags Number of
Annotations
11,289
9858
9252
1835
1356
348
8
33,953
18. Annotation Types in Hypothes.is
18
Highlighted
Text
Note Tags Number of
Annotations
11,289
9858
9252
1835
1356
348
8
33,953
We studied
20,953
annotations
that contain
highlighted
text
19. Several Academic Sites Use
Hypothes.is Widely
19
Number of Annotations
Contain Highlighted
Text
Host
1222 caseyboyle.net
1191 www.perseus.tufts.edu
887 rhetoric.eserver.org
875 networkedlearningcollaborative.com
749 sosol.perseids.org
733 tkbr.ccsp.sfu.ca
526 shakespeare.mit.edu
391 hypothes.is
356 renaissancejohnson.weebly.com
336 moodle2.wesleyan.edu
20. We Issued HTTP Head Requests for
Target URIs of All Annotations
20
Number of
Annotations
Status Code Example
18,167 200 OK http://www.w3.org/Talks/9704WWW6-tbl/slide16.htm
820 Unresolvable URIs file:///Users/peggy/Desktop/CIRCLE-youthvoting-individualPages.pdf
778 Timeout http://testbelfastgroup.digitalscholarship.emory.edu/
666 404 https://www.facebook.com/manunymous
190 Soft 4XX http://www.transmography.net/braineryworkshop/camperforce-by-joseph/
87 401 http://wiki.shuttleworthfoundation.org/~shuttlew/wiki/index.php?title=Nov_2
013_DW_Dogfood_prep
80 503 https://via.hypothes.is/http://b.pagekite.me/blog/2015-04-
27_Roadmap_to_v1.html
68 403 http://onlinelibrary.wiley.com/store/10.1002/2013EF000191/asset/eft214.pdf?
v=1&t=hppouayu&s=e3a980c9e2c6317987306d4e1d76c690c29fe758
48 410 https://www.scribd.com/word/removal/31126999
21 406 https://www.scribd.com/deleted/95457320
19 500 http://androidfrat.com/2015/01/the-new-usb-announcement-just-killed-the-
usb-super-position/
9 400, 416,
504, 520
https://ibpublishing.ibo.org/live-exist/rest/app/pub.xql?doc=EX_Instructions_
2013_e&part=10&chapter=4&page=1
21. Out of 33,946 Annotations,
We Investigated 20,133 Target URIs
21
22. {
"updated": "2014-02-10T22:51:03.920650+00:00",
"target": [
{
"source": "http://www.wired.co.uk/news/archive/2014-02/06/tim-berners-
lee-reclaim-the-web",
"selector": [
{ "endContainer":
"/div[3]/form[1]/div[2]/article[1]/div[1]/div[1]/div[2]/div[1]/div[1]/p[1]",
"endOffset": 187,
"type": "RangeSelector",
"startOffset": 0,
"startContainer":
“/div[3]/form[1]/div[2]/article[1]/div[1]/div[1]/div[2]/div[1]/div[1]/p[1]"
},
{ "exact": "Twenty-five years on from the web's inception, its creator has
urged the public to re-engage with its original design: a
decentralised internet that at its very core, remains open to all.",
"prefix": "anChris Woods / chrismwoods.com",
"type": "TextQuoteSelector",
"suffix": "Speaking with Wired editor Davi"
},
{ "start": 307,
"end": 494,
"type": "TextPositionSelector"
}]}],
"tags": ["w3", "re-decentrilization" ],
"text": "",
"created": "2014-02-10T22:51:03.920636+00:00",
"uri": "http://www.wired.co.uk/news/archive/2014-02/06/tim-berners-lee-
reclaim-the-web",
"user": "acct:aculich@hypothes.is",
"consumer": "00000000-0000-0000-0000-000000000000",
"id": "tp5hnn4PTjuSg7hbc7_cxQ",
"permissions": {
"admin": [ "acct:aculich@hypothes.is“ ],
…
An Annotation: JSON (L) and Visualized (R)
22
http://www.wired.co.uk/news/archive/2014-02/06/tim-berners-lee-
reclaim-the-web/viewgallery/332234
August 2015
23. • Compare an annotation’s highlighted text with a
live webpage’s content.
• Download annotation JSON from
Hypothes.is
• Extract the text from the annotation’s target
URI
• If the annotation’s highlighted text is found
in the webpage the annotation is
attached to the live web
23
Methodology
24. 24
Only 78% of Highlighted Text
Annotations Attach to the Live Web
25. Discovering Mementos for All
Resolvable Target URIs
• Using LANL Memento Aggregator
• Considering only mementos with datetime
immediately before or after the annotation’s creation
date
• Four cases regarding the availability of mementos for
the annotation’s target URI:
• Mementos exist before and after the annotation’s creation
date
• Mementos exist only before the annotation’s creation date
• Mementos exist only after annotation’s creation date
• No mementos exist
25
26. Mementos Exist Before and After the
Annotation Creation Date
26
25% (4986) of resolvable target URIs
27. 27
Mementos Exist Only Before the
Annotation Creation Date
12% (2477) of resolvable target URIs
28. 28
7% (1397) of resolvable target URIs
Mementos Exist Only After the
Annotation Creation Date
30. Are Annotations Attached to
Existing Mementos?
• Similar to checking if an annotation is attached to the
live web
• If the highlighted text is found in a memento the
annotation is attached and could be recovered.
• 8860 annotations have a target URI with at least one
memento
• Of these, 90% (7963) can be attached to a memento
30
31. Annotation Targets with Existing Mementos
Before and After the Annotation Creation Date
31
Attached to Live
Web Page
Attached to
Memento (Before)
Attached to
Memento (After)
Number of
Annotations
Yes Yes Yes 4091
Yes Yes No 93
Yes No Yes 100
Yes No No 182
No Yes Yes 251
No Yes No 69
No No Yes 44
No No No 156
4986 (Total)
32. Attached to Live
Web Page
Attached to
Memento (Before)
Attached to
Memento (After)
Number of
Annotations
Yes Yes Yes 4091
Yes Yes No 93
Yes No Yes 100
Yes No No 182
No Yes Yes 251
No Yes No 69
No No Yes 44
No No No 156
4986 (Total)
Annotation Targets with Existing Mementos
Before and After the Annotation Creation Date
32
In Danger
Orphaned
33. Annotation Targets with Existing Mementos
Only Before the Annotation Creation Date
33
Attached to Live
Web Page
Attached to
Memento (Before)
Number of
Annotations
Yes Yes 1984
Yes No 235
No Yes 133
No No 125
2477 (Total)
34. Attached to Live
Web Page
Attached to
Memento (Before)
Number of
Annotations
Yes Yes 1984
Yes No 235
No Yes 133
No No 125
2477 (Total)
Annotation Targets with Existing Mementos
Only Before the Annotation Creation Date
34
In Danger
Orphaned
35. Attached to Live
Web Page
Attached to
Memento (After)
Number of
Annotations
Yes Yes 1148
Yes No 101
No Yes 50
No No 98
1397 (Total)
Annotation Targets with Existing Mementos
Only After the Annotation Creation Date
35
36. Attached to Live
Web Page
Attached to
Memento (After)
Number of
Annotations
Yes Yes 1148
Yes No 101
No Yes 50
No No 98
1397 (Total)
Annotation Targets with Existing Mementos
Only After the Annotation Creation Date
36
In Danger
Orphaned
37. Annotation Targets with
No Existing Mementos
37
Attached to
Live Web Page
Number of
Annotations
Yes 7839
No 3434
11,273 (Total)
38. Attached to
Live Web Page
Number of
Annotations
Yes 7839
No 3434
11,273 (Total)
Annotation Targets with
No Existing Mementos
38
In Danger
Orphaned
39. How Many Orphaned Annotations
Does Hypothes.is Have?
• 19% (3813) of annotations are orphaned
• 41% (8357) of annotations are in danger of
being orphaned
• In total, 60% (12,170) of annotations are either
orphaned or in danger of being orphaned
39
40. How Many Annotations Can Be
Reattached Using Web Archives?
• Archives could only save 3% (547) of annotations that
would otherwise be orphaned
• 37% (7416) of annotations are safe -- attached to the
live web and also attached to one or more
mementos.
40
41. Archives Used to Attach Annotations
to Mementos
41
Archive Attached to
Live Web
Not Attached to
Live Web
web.archive.org 6997 (94.3%) 455 (83.1%)
archive.is 679 (9.15%) 39 (7.12%)
wayback.archive-it.org 562 (7.57%) 47 (8.59%)
github.com 80 (1.07%) 21 (3.83%)
wayback.vefsafn.is 71 (0.95%) 53 (9.68%)
arxiv.org 18 (0.24%) 0
webarchive.loc.gov 3 (0.04%) 0
webarchive.org.uk 4 (0.05%) 0
webarchive.nationalarchives.gov.uk 2 (0.02%) 0
discordia.wikia.com 1 (0.01%) 0
Total 8417 (113.4%) 615 (112.32%)
A single
annotation
may reattach
to mementos
from multiple
archives
42. The Status of Current Hypothes.is
Annotations
42
Highlighted Text Annotations
with Resolvable Target URIs :
20,133
43. Conclusion
• We analyzed the attachment of 20,953 highlighted
text annotations in Hypothes.is.
• 60% of annotations are orphaned or in danger of
being orphaned
• 7963 mementos from 10 different web archives
could be used to keep the remaining 40% of
annotations safe.
43
Archiving webpages at the time of annotation is
important to avoid orphaned annotations.