1. Finding Ostriches in the Courtroom
Enabling Insight with Linguistic Visualization
Christopher Collins
University of Toronto (to Dec 2009)
University of Ontario Institute of Technology (Jan 2010-)
2. Target Audience
General Domain Language
Public Experts Researchers
Real-time Single Document Linguistic
Discrete Corpus NLP
Continuous Corpus CL
3. Problem Areas
Real-time Single Document Linguistic
Discrete Corpus NLP
Continuous Corpus CL
8. External Cognition
• External cognition is the interaction
between internal and external
representations when performing cognitive
tasks.
• Computational offloading is the extent to
which external representations can reduce
the amount of cognitive effort to solve a
problem.
Yvonne Rogers, New Theoretical Approaches for Human-Computer Interaction, 2004.
9. Document Visualization
Collins, C.; Carpendale, S.; Penn, G.
DocuBurst: Visualizing Document Content using Language Structure.
Proceedings of Eurographics/IEEE VGTC Symposium on Visualization, June, 2009.
11. DocuBurst
games game
taken take
absolute,noun,10
chair,noun,2
moment,noun,11
game,noun,30
reality,noun,3
take,verb,13
represent,verb,17
...
game IS activity
WordNet chair IS furniture
15. Corpus Visualization
• Beyond similarity and clustering
– How do we discern differences within and between
document collections?
Collins, C.; Viégas, F.; Wattenberg, M.
Parallel Tag Clouds to Explore and Analyze Faceted Text Corpora.
To appear in Proc. IEEE Symposium on Visual Analytics Science & Technology (VAST), 2009.
16. Our Data: U.S. Federal Court Decisions
Data from public.resource.org
17. Visualization Design Patent Invention
17
• Size = significance of
difference (G2 score)
• Order = alphabetic
• Edges = word occurring in
multiple columns
21. Bridging the Linguistic Divide
Open APIs for data
NYT, Twitter, Google
?
Open APIs for NLP
- Summarization
- Keyword extraction
Toolkits and APIs for - Sentiment analysis
Visualization
Processing, Rafael,
Flare, Flash