Bibliotheca Digitalis. Reconstitution of Early Modern Cultural Networks. From Primary Source to Data.
DARIAH / Biblissima Summer School, 4-8 July 2017, Le Mans, France.
5th and last day, July 8th – Digital representation and data accuracy for Humanities.
Visualisation in Digital Humanities for Understanding, Cleaning, and Explaining.
Jean-Daniel Fekete – Research Scientist, INRIA.
Abstract: https://bvh.hypotheses.org/3330#conf-JDFekete
Telling a Story – or Even Propaganda – Through Data VisualizationDemetris Trihinas
Ähnlich wie Bibliotheca Digitalis Summer school: Visualisation in Digital Humanities for Understanding, Cleaning, and Explaining - Jean-Daniel Fekete (20)
Bibliotheca Digitalis Summer school: Visualisation in Digital Humanities for Understanding, Cleaning, and Explaining - Jean-Daniel Fekete
1. Bibliotheca Digitalis
Reconstitution of Early Modern Cultural Networks
From Primary Source to Data
DARIAH / Biblissima Summer School
Le Mans, 4-8 July 2017
Visualisation in Digital Humanities
for Understanding, Cleaning, and
Explaining
5th and last day, July 8th – Digital representation and data accuracy for Humanities
Jean-Daniel Fekete
Research Scientist, INRIA
2. 7/8/2017
1
Visualisation in Digital Humanities for
Understanding, Cleaning, and Explaining
Jean-Daniel Fekete
INRIA
http://www.aviz.fr/~fekete
Visualization?
Visualization is any technique for creating
images, diagrams, or animations to
communicate a message
[Wikipedia, Visualization, May 2016]
Information visualization is the study of
(interactive) visual representations of abstract
data to reinforce human cognition
[Card, S. and Mackinlay, J. and Shneiderman B., Readings in Information Visualization, 1999]
July 8th 2017 Summer School Le Mans
3. 7/8/2017
2
Visualization and Visual Perception
• Visualization is grounded in the visual and
cognitive capabilities of humans
– Inferring from visual forms
• Relies on visual capabilities of the human eye
and brain
– Preattentive processing
– Ready…is there a red circle in the next slide?
July 8th 2017 Summer School Le Mans
Preattentive Processing
July 8th 2017 Summer School Le Mans
4. 7/8/2017
3
Preattentive Processing
July 8th 2017 Summer School Le Mans
Preattentive Processing
• Preattentive processing
– 200ms response time (in a glimpse)
– Effortless
– Reliable estimates
• Many visual features can be perceived preattentively:
– Orientation of line/bloc, length, width, size, curvature, cardinality, etc.
• Problems:
– Preattentive features interfere with each other
• Except one
– Preattentive features have limitations
• 7 colors max (Healey, 96)
• 2 or 3 shapes
July 8th 2017 Summer School Le Mans
5. 7/8/2017
4
Preattentive Processing
July 8th 2017 Summer School Le Mans
Where does Visualization Stands?
Theory / Law
Model
Descriptive statistics
Facts / Measurements
Support xor
Contradict Induces?
Fits
Describes
July 8th 2017 Summer School Le Mans
6. 7/8/2017
5
Example
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Raw Data from Anscombe’s Quartet
[Source: Anscombe's quartet, Wikipedia]
July 8th 2017 Summer School Le Mans
Statistical Analysis
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Mean of x 9.0
Variance of x 11.0
Mean of y 7.5
Variance of y 4.12
Correlation between x and y 0.816
Linear regression line y = 3 + 0.5x
For all columns, the main descriptive statistics are identical
[Source: Anscombe's quartet, Wikipedia]
July 8th 2017 Summer School Le Mans
7. 7/8/2017
6
Visual Representation of the Data
Visual representation reveals a different story
[Source: Anscombe's quartet, Wikipedia]
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
July 8th 2017 Summer School Le Mans
Same Stats, Different Graphs: Generating Datasets with Varied Appearance
and Identical Statistics through Simulated Annealing [CHI17]
July 8th 2017 Summer School Le Mans
https://www.autodeskresearch.com/publications/samestats
8. 7/8/2017
7
Where does Visualization Stands?
Theory / Law
Model
Visualization
Facts / Measurements
Support xor
Contradict Induces?
Fits
Describes
Descriptive
Statistics
July 8th 2017 Summer School Le Mans
Four Scales
• Most DH projects rely on the concept of
collections of documents or artifacts
• Visualization can be effective to make sense of
these collections
– But there is no “one size fits all”
• I will present visualizations to manage the
four scales
• With queries, smaller scales can be extracted
from larger scales
July 8th 2017 Summer School Le Mans
9. 7/8/2017
8
Scale Matters!
• 100 - 103 : Small corpus (Master’s thesis / PhD)
• 103 – 106 : Collaborative project
• 106 – 109 : Institutional project (BnF, LoC) or portal
• > 109 : Large scale
– Europeana, Google
Powers of Ten™ (1977)
July 8th 2017 Summer School Le Mans
https://www.youtube.com/watch?v=0fKBhvDjuy0
100 – 103: Small Corpus
• Myriad of visualizations available for small
corpora
– Text, network, genealogy, manuscripts, maps, etc.
• Using these visualizations for exploring small
corpora reveals interesting unexpected
information ALWAYS
• On Web sites dedicated to small corpora,
visualization will help navigate and understand
the scope of the corpus
July 8th 2017 Summer School Le Mans
10. 7/8/2017
9
100: One document
• N. McCurdy, J. Lein, K. Coles, M. Meyer. Poemage: Visualizing the Sonic Topology of
a Poem. IEEE Transactions on Visualization and Computer Graphics (Proceedings of
InfoVis 2015), pages 439-448, January 2016
July 8th 2017 Summer School Le Mans
http://www.sci.utah.edu/~nmccurdy/Poemage/
https://vimeo.com/136205958
http://xkcd.com/657/
100: One document
July 8th 2017 Summer School Le Mans
http://vis.cs.ucdavis.edu/~tanahashi/storylines/
11. 7/8/2017
10
100 – 103: Small(ish) Networks
July 8th 2017 Summer School Le Mans
http://vistorian.net/
100 – 103: Small Corpus
N. Dufournaud
Thesis
~1000 documents
July 8th 2017 Summer School Le Mans
http://nicole.dufournaud.org/
13. 7/8/2017
12
July 8th 2017 Summer School Le Mans
Migration Map
Space&Time: GeoTime
[link]
July 8th 2017 Summer School Le Mans
14. 7/8/2017
13
100 – 103: Archeological Collection
Create a spreadsheet
• 1 line per object found
• 1 column per feature
• 1 black dot at the
intersection when an object
has a feature
July 8th 2017 Summer School Le Mans
July 8th 2017 Summer School Le Mans
15. 7/8/2017
14
100 – 103: Bertifier
• Play with our tool online
July 8th 2017 Summer School Le Mans
http://www.aviz.fr/bertifier
https://www.youtube.com/watch?v=tJxAF_a_yBQ
Visualizing an XML Corpus: Compus
• Transform the following XML document:
0 1 2 3 4
012345678901234567890123456789012345678901234567
<A>abcd<B>efgh</B><C>ijkl<D>mnop</D></C>qrst</A>
• into a set of intervals :
A=[0,48[, B=[7,18[, C=[18,40[, D=[25,36[
• One color is given to each element
• Only XML elements are visualized
July 8th 2017 Summer School Le Mans
16. 7/8/2017
15
July 8th 2017 Summer School Le Mans
100 – 103: Diffamation
(Chevalier et al. CHI 2010, http://www.aviz.fr/diffamation/)
July 8th 2017 Summer School Le Mans
17. 7/8/2017
16
100 – 103: Multidimensional Data
Summer School Le MansJuly 8th 2017
July 8th 2017 Summer School Le Mans
18. 7/8/2017
17
100 – 103: Small Corpus
July 8th 2017 Summer School Le Mans
http://multiviz.gforge.inria.fr/scatterdice/oscars/
100 – 103: Small Corpus
• Myriad of visualizations available for small
corpora
– Text, network, genealogy, manuscripts, maps, etc.
• Using these visualizations for exploring small
corpora reveals interesting unexpected
information ALWAYS
• On Web sites dedicated to small corpora,
visualization will help navigate and understand
the scope of the corpus
July 8th 2017 Summer School Le Mans
19. 7/8/2017
18
103 – 106: Library/Coll. Project
• Too many items to show each of them in detail
• Still need to provide guidance to users
• Many tools exist but entering data become
technical
July 8th 2017 Summer School Le Mans
103 – 106: Jigsaw
July 8th 2017 Summer School Le Mans
20. 7/8/2017
19
103 – 106: Parallel Tag Clouds
Parallel Tag Clouds to Explore Faceted Text Corpora (Collins et al., VAST 2009)
July 8th 2017 Summer School Le Mans
http://vialab.science.uoit.ca/portfolio/parallel-tag-clouds-to-explore-faceted-text-corpora
July 8th 2017 Summer School Le Mans
21. 7/8/2017
20
De-duplication
D-Dupe: An Interactive Tool for Entity Resolution in Social Networks (Mustafa Bilgic, Louis Licamele,
Lise Getoor, Ben Shneiderman), In Visual Analytics Science and Technology (VAST), 2006.
• Resolving named entity using relation network
July 8th 2017 Summer School Le Mans
103 – 106: Genealogies
July 8th 2017 Summer School Le Mans
23. 7/8/2017
22
July 8th 2017 Summer School Le Mans
106 – 109: Institutional project
• Only aggregated information can be presented
• Faceted browsing / search very useful!
– Use it!
• e.g. Europeana: 53 106 items
July 8th 2017 Summer School Le Mans
24. 7/8/2017
23
106 – 109: Institutional project (HAL)
July 8th 2017 Summer School Le Mans
http://traces1.saclay.inria.fr/inria/
106 – 109: EU Project Cendari
July 8th 2017 Summer School Le Mans
25. 7/8/2017
24
106 – 109: EU Project Cendari
July 8th 2017 Summer School Le Mans
https://notes.cendari.dariah.eu/
106 – 109: Institutional project
• Only aggregated information can be presented
• Faceted browsing / search very useful!
– Use it!
• e.g. Europeana: 53 106 items
• Problem: metadata quality and semantics
• What is the date of a book?
July 8th 2017 Summer School Le Mans
26. 7/8/2017
25
> 109: World Scale
• Few providers
– Google
– Photo collections (Flickr)
– Astronomical databases
• The cost of computing facets is too high for
interactive time responses
• No good general solution
July 8th 2017 Summer School Le Mans
> 109: Internet Backbone
• Where are you?
• Who cares?
July 8th 2017 Summer School Le Mans
27. 7/8/2017
26
> 109: Query Previews
• Query over very large data about the Earth
July 8th 2017 Summer School Le Mans
http://www.cs.umd.edu/hcil/eosdis/
Conclusion
• Larger collections are harder to manage
– Big data problem
• A large collection can always be queried to
extract a smaller collection
– Scaling down the results and increasing the number of
techniques usable
• Still, current technologies are limited for DH
– No management of uncertainty
– No reasonable model of old geographical concepts
– No good model of time and date
• Still, use the tools and ask for improvements!
July 8th 2017 Summer School Le Mans
28. 7/8/2017
27
References
• Jacques Bertin, Semiology of Graphics: Diagrams, Networks, Maps.
ESRI Press; Nov. 2010. ISBN: 9781589482616
• Edward Tufte. The Visual Display of Quantitative Information.
Cheshire, CT: Graphics Press, 2010 ISBN 0-9613921-4-2
• Tamara Munzner. Visualization Analysis and Design. A K Peters
Visualization Series, CRC Press, 2014. ISBN 9781466508910
• Alberto Cairo. The Truthful Art: Data, Charts, and Maps for
Communication. New Riders, 2016. ISBN 0321934075
• Tableau for Students: https://www.tableau.com/academic/students
• Jänicke, Stefan; Franzini, Greta; Cheema, Muhammad Faisal;
Scheuermann, Gerik. On Close and Distant Reading in Digital
Humanities: A Survey and Future Challenges. Eurographics
Conference on Visualization (EuroVis) – STARs. 2015.
http://dx.doi.org/10.2312/eurovisstar.20151113
July 8th 2017 Summer School Le Mans