David Newbury presented on fuzzy dates and how to represent temporal information digitally. He discussed how computers use a single timeline approach that can lose context, while humans describe intervals using natural language and varying precisions. Newbury proposed a model using four dates to describe a state's beginning, ongoing duration, and end to address these issues. This allows computational handling of dates alongside human-readable descriptions, addressing the needs of technologists, cataloguers, and humanists.
Finology Group â Insurtech Innovation Award 2024
Â
Fuzzy Dates & the Digital Humanities
1. Fuzzy Dates & the Digital Humanities
David Newbury, Lead Developer, Art Tracks, Carnegie Museum of Art
Keystone DH 2017, Philadelphia, PA
A computational and linguistic technique for digitally encoding a
description of when events took place.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 1
2. What we're going to talk about:
1. How should we think about"when"?
2. Dates! How do they work?
3. How do computers help with this?
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 2
3. Time is the Simplest Thing.
Computer Scientists solved time back in 1971,
when they deïŹned the ïŹrst Thursday in 1970
as the beginning of history.
Nothing important happened before that day,
and history will end on January 19th,2038.
...according to UNIX time.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 3
4. The humanities take
a slightly more nuanced view.
...Asdoastronomers,geologists,andmymother.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 4
5. Computability & Correctness
As technologists,we want to to sort and compare using math.
As humanists,we want to maintain the uncertainty of knowledge.
As catalogers,we want an system that's easy to read and write.
These goals are fundamentally in conflict.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 5
6. Why'd you have to go
and make things
so complicated1
?
1
Lavigne,A.,Christy,L.,Spock,S.,Edwards,G.(2001).Complicated.[Recorded by Lavigne,A.] On LetGo,CD.New York,New
York: Arista.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 6
7. Thankfully, we're not alone.
Humans have been ïŹguring out ways to
talk about"whens"since January 1,4713 BC.
...when history began.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 7
8. How do we describe "whens"?
â the 19th Century
â July 15,2016
â The Tokugawa period
â Julian Day 19962.123
â 1499798191 (UNIX Time)
We describe Intervals.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 8
9. Intervals
An interval is a position and an extent
within a temporal reference system.
If you want to learn more about the theory behind these terms (and much of this talk),
start with Allen Interval Calculus2
and the Time Ontology in Owl3
.
3
Time Ontology in Owl.W3C Candidate Recommendation 06 June 2017.https://www.w3.org/TR/owl-time
2
Towards a general theory of action and time.ArtiïŹcial Intelligence 23,pp.123-154..J.F.Allen.1984.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 9
10. Position & Timelines
It's useful to discuss position
by analogy to Timelines.
A timeline is a number line with a start
position,or epoch,and a clock.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 10
11. Epochs
The epoch is a deïŹned
starting point for a timeline.
It can be any moment in time
that we can agree to start counting from.
(For example, January 1, 1 CE.)
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 11
12. Clocks
The Clock is a regularly repeating physical
event (a'tick') and a way to count the'ticks'.
Each'tick'is single instance of some
temporal unit.
(Usually, ticks are counted in "seconds", "days", or "years".)
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 12
13. Calendars
A Calendar an algorithm for turning
speciïŹc counts of ticks into human-
readable forms,usually based on the
movement of astronomical bodies.
(1 tick is a day, January has 31 days, a year has 12 months, etc.)
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 13
14. Extent
Extent captures the duration of a interval.
It's typically described with a number
of temporal units,but can also be
described using a calendar.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 14
15. SoâJuly 15,2016 is an speciïŹc interval:
It has an extent
of 84600 seconds,
and a position
of the 15th day,
of the 7th month,
of the 2016th year
from the epoch
deïŹned by the
Gregorian calendar.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 15
16. So...who cares?
Computers calculate time using a single clock.
Historians records dates with differing calendars.
This model allows us to describe events and states
using the original calendar from a primary source,
while remaining easily computable.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 16
17. A brief moment on instants.
Instants are just intervals with a extent of zero,like geometric points.
Forget about them.You don't want to talk about instantsâthey're
philosophical catnip for people who want to ïŹght about the boundaries
of the imaginary.
(Unless that's your thing...in which case come talk to me afterwards.)
(Bring scotch.)
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 17
18. This is important:
Temporal coordinate systems4
deïŹne"when"
using a position and a extent.
Every position,even one without explicit duration,
has a ïŹnite extent,based on the minimum precision
of the clock.
There are NO moments in time.
4
see http://www.iso.org/iso/isocatalogue/cataloguedetail?csnumber=26013
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 18
19. Circa Dates
considered Harmful.
Circa dates encode a lot of this informationâbut in your super-secret
special way.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 19
20. How do computers think about "when"?
Computers use a format called ISO 8601,
which allows dates of varying precision.
2016-07-15
2016-07
2016
These could describe the same instant,
but they're recorded with different units.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 20
21. Comparing and Sorting dates
Computers really want everything
to be on a single-precision timeline.
2016-07-15
2016-07-01/2016-07-31
2016-01-01/2016-12-31
These are the same instants,but
described using ISO 8601 intervals.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 21
22. Do you hear that?
It's the sad sound of context being lost.
"Something happened in 2016"
"Something happened between January 1,2016 and December 31,2016"
arenotequivalent.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 22
23. When you record dates,
you must document
the precision of that date.
Otherwise, knowledge is lost.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 23
24. Library Folk to the rescue!
Extended Date/Time Format (EDTF) is a
Library of Congress proïŹle/extension to
ISO 8601.5
It deïŹnes a standard for recording
precision & certainty for dates.
5
http://www.loc.gov/standards/datetime/pre-submission.html
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 24
25. Do I have to learn about EDTF?
(Yet another standards document that I have to read...)
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 25
26. Humanized EDTF
As part of the cultural_dates library,
we have a tool that will take April 1990?
and turn it into:
1990-04?
for cataloging as EDTF.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 26
27. Humanized EDTF
As part of the cultural_dates library,
we have a tool that will take April 1990?
and turn it into:
beginning=1990-04-01; ending=1990-04-31;
precision=month; certain=false
for computational tasks.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 27
28. Human Readable Forms
For Humanists For Catalogers For Technologists
the 19th century 19xx 1900-01-01/1999-12-31
the 1990s 199x 1990-01-01/1999-12-31
1990 1990 1990-01-01/1990-12-31
October 1990 1990-10 1990-10-01/1999-10-31
October 17,1990 1990-10-17 1990-10-17
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 28
29. Similar work in the ïŹeld:
â Python: https://github.com/ixc/python-edtf
â Ruby: https://github.com/duke-libraries/edtf-humanize
â Javascript: https://github.com/nicompte/edtfy
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 29
30. On to the clever bit!
...but ïŹrst, a bit more theory.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 30
31. States and Events
A state is a period of time
during which some condition is true.
An event is an instantaneous action
that changes some condition.
States begin and end with events.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 31
32. States or Events
You can infer either from the other
so there's no need to record both.
Doing so can create paradoxes.
We usually think about states,but we tend
to have documentation about events.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 32
33. Events: are they really instantaneous?
"ThisworkwaspurchasedinJulyof1850."
Using the described units,the event
took place at an position represented by
July,1850,with an extent of one month.
However,our timeline counts in days,and
July,1850 has an extent of 31 days.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 33
34. Four Instants, Four Intervals
On a timeline using a speciïŹc clock,
a state can be described using four
instants.
These instants can be combined
into sub-intervals in two ways:
one describing the events,
and the other describing the state.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 34
35. The Clever Bit.
We can describe the extent of a state
by recording four dates.
These four dates are not just useful for
recording clock precision.
"Purchasedsometimebetween
July15,1850andJuly18,1850.
Soldinthe1950s."
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 35
36. The Really Clever Bit.
Some of these dates can be missing.
"PurchasedbeforeJuly18,1850.
Soldinthe1950s."
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 36
37. Beginning Events
Describing the event that
starts a state:
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 37
38. Ending Events
Describing the event that
completes a state:
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 38
39. Special Cases
Describing speciïŹc cases that result in
with strange grammatical constructions.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 39
40. So many combinations
"1990 until July 1995"
"the 16th century until sometime after the 1850s"
"sometime between 1990 and 1991 until at least July 15,1995"
You can mix different date precisions and patterns.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 40
41. ISO 8061, EDTF, and Prose.
The Technologist gets to work with four ISO 8061 dates.
The Cataloger gets to work with one or two EDTF intervals.
The Humanist gets to work with decent,readable prose.
Each can read and write in their preferred form.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 41
42. The CIDOC-CRM.
If you care about the CIDOC-CRM,
this model is based on and compatible
with their model of temporality.
If you don't care...
please ignore this slide.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 42
43. Museum Provenance
Documentation on how all this works is at:
http://www.museumprovenance.org/reference/dates
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 43
44. Github
(Mostly) working code is available at:
https://github.com/arttracks/cultural_dates
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 44
45. One Last Thing...
This is not the only computable way to
catalog"when".
Ordinal temporal reference systems
record names,order and relationships,
but not the explicit extent or position.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 45
46. Thank you.
Art Tracks is a project of Carnegie Museum of Art.
For more details, visit
www.museumprovenance.org
Initial funding for Art Tracks was provided in part by a generous grant by the
Institute of Museum and Library Services.Funding for Phase II has been
provided by the National Endowment for the Humanities with additional
research support provided by the Samuel H.Kress Foundation and the
Paul Mellon Centre for Studies in British Art.
David Newbury (@workergnome): Fuzzy Dates & the Digital Humanities.Keystone DH,July 13,2017. 46