This document provides an introduction to data visualization and basic programming concepts. It discusses different types of visualizations and tools for creating visualizations, including ManyEyes and SIMILE. It also touches on data cleaning, joining additional data sources, and dealing with complex data. The document encourages beginning with small projects and provides links for learning more about visualization and programming.
Digitised Manuscripts and the British Library's new IIIF viewer
Data visualisations as a gateway to programming
1. Data visualisations as a gateway
to programming
Mia Ridge @mia_out
THATCamp Feminisms West
Scripps College, California, March 2013
2. AKA: a whirlwind tour of data
visualisation
(and some bits to tempt you into playing
with code)
3. ‘Start small, make things, and
then when you’re done, make
some more things.’
Jake Levine, http://www.niemanlab.org/2013/03/jake-levine-why-
learning-to-code-isnt-as-important-as-learning-to-build-something/
5. Some points about code
• Computers are annoyingly pedantic
• Scripting isn't rocket science (but it is 'hard
fun')
6. Overview
• What is data visualisation?
• Tools and types of visualisations
• A bit of programming jargon
• Activity options: play with data in ManyEyes
or tweak timeline/map code to try basic
programming
7. Registering with Many Eyes
• In your browser, go to http://www-
958.ibm.com/software/analytics/manyeyes/regis
ter and register for a Many Eyes account
– Check your email to make sure the registration has
come through for later use
• There’s a dataset loaded into ManyEyes that you
can try different things with but you might find
that you want to tweak new versions to achieve
particular effects
9. Who are you?
• One sentence on your interest in data
visualisation, do you have any potential uses
in mind?
10. What is data visualisation?
• '…the graphical display of abstract information for
two purposes: sense-making (also called data
analysis) and communication’ (Stephen Few)
• '…showing quantitative and qualitative
information so that a viewer can see
patterns, trends, or anomalies, constancy or
variation, in ways that other forms – text and
tables – do not allow.' (Michael Friendly)
• '…interactive, visual representations of abstract
data to amplify cognition‘ (Card et al., 1999)
11. Scholarly data visualisations
• Visualisations as ‘distant reading’ where
distance is ‘a specific form of knowledge:
fewer elements, hence a sharper sense of
their overall interconnection’ (Moretti, 2005)
• Inspiring curiosity and research questions
• But - what do they leave out?
12. Types of visualisations
• Different types of data in:
quantitative, qualitative, geographic, time
series, entities
(people, places, events, concepts, things)
• Static, interactive
• Exploratory, explanatory: find new insights, or
tell a story?
• Pragmatic, analytic? Abstract, emotive?
• http://infosthetics.com/archives/infovis/
13. Visualisation types in Many Eyes
http://www-958.ibm.com/software/analytics/manyeyes/page/Visualization_Options.html
14. Considerations for humanities data
• Commercial tools often assume
complete, born-digital datasets – no missing
fields, consistent data entry over time
• Humanities and GLAM
(galleries, libraries, museums, archives)
records contain uncertainty and fuzziness (e.g.
date ranges, uncertain places, creators, etc)
15. Messiness in data
• 'Begun in Kiryu, Japan, finished in France'
• 'Bali? Java? Mexico?'
• Variations on USA:
– U.S.
– U.S.A
– U.S.A.
– USA
– United States of America
– USA ?
– United States (case)
• Inconsistency in uncertainty
– U.S.A. or England
– U.S.A./England ?
– England & U.S.A.
17. Cleaning data for visualisations
Humanities data often needs manual cleaning to:
remove rows where vital information is missing
tidying inconsistencies in term lists or spelling
converting words to numbers (e.g. dates)
remove hard returns and non-ASCII characters (or
change data format)
split multiple values in one field into other
columns (e.g. author name, date in one field)
expanded coded values (e.g. countries, language)
18. What other data can you join to yours?
Information from general sites like
Wikipedia, Freebase, VIAF
Information from other GLAMs
Other information about the same
event, place, person, object, etc
General contextualising information –
science, history, reviews, citations?
19. Dealing with complex data
• Find a visualisation type that can harbour the
data in a meaningful way or reduce the data in
a meaningful way.
– e.g. go from individual values to distribution of
values
– e.g. introduce interaction: overview, zoom and
filter, details on demand (Ben Shneiderman)
24. Variables and comments
• Variables: containers that store things
• Comments: leave messages for other programmers;
the computer can't see them
• Operators: small, simple bits of functionality
25. Getting unstuck
• Try copy/pasting or typing the error message into
Google.
• Make different versions as you go, use software
to compare two versions of a file
• Asking for help: what steps would someone need
to take to reproduce the problem? What did you
expect the output to be and what happened
instead?
• Most browsers have built-in tools to help you
debug JavaScript.
26. Getting unstuck
• Make a copy of the exercise file first so you can
always compare with one that works
• If it breaks or doesn't work:
– Check that “quotes’ and {brackets) are matched
– Check that any named thing is spelt consistently
– Check upper/lower case
– Ask the person next to you (sometimes explaining it
helps you spot the issue)
– If the last version works, use software to compare two
versions of a file
27. Visualising ‘Inspiring Women’
• ManyEyes – online tool, no code required
• SIMILE – start with a working example, read
through the commented code and try the
exercises listed in the comments
28. ‘Inspiring Women’ in ManyEyes
• Log into ManyEyes
• Go to http://ibm.co/ZP7UKI – visualisation
options available from there
• Choose a type of visualisation and evaluate
the results
– What cleaning, extra data or transformation might
be needed?
– You may need to iterate with different versions of
the data from http://bit.ly/ZwH6iy
29. Review: visualisation tools
• What did the tools you tried do well? Poorly?
• Were the tool and the data a good match for
each other?
• Which tools might be useful in the future?
30. ‘Start small, make things, and then
when you’re done, make some more
things.’
Some links: http://bit.ly/UJwgEz
Thank you!
Mia Ridge, Open University
http://openobjects.org.uk/
@mia_out
Editor's Notes
Learn the basics of programming by fiddling with existing visualisations and prepared exercises.Background: participants will be thinking about how to structure data for use in software, learning basic programming concepts, and moving towards tinkering with scripts. This is a great workshop for humanists who want a friendly intro to the world of programming.Find out more at http://www.miaridge.com/resources-for-data-visualisation-for-analysis-in-scholarly-research/
This is the “don't be scared” slide! Computers are really picky about spelling, white space, matching quote marks, how sentences end... Think of your most pedantic friend, and multiply that by 1000. It's like dealing with a grumpy six year old - it might be tricky to negotiate, but it's not going to kill either of you. Thinking computationally is like cooking a few courses for a fancy dinner party – you learn what needs to be prepped in advance or just before serving, which steps must be done in a particular order and what can be done at any time.Hard fun – phrase comes from gaming – when something is challenging it's even more rewarding when you finally crack it. A lot of my 'don't be scared' message is aimed at getting you over those first hurdles and into the rewarding stuff. Persistence (or stubbornness) is one of the key characteristics of a good programmer. The process of finding a path through something you're still figuring out is something programmers and researchers have in common.
Short workshop, leaving loads out – have prepared two routes you can go – one is using pre-made data in a tool called ManyEyes to learn about how different types of visualisations work, the other is about loading up a page that will draw a timeline based on data in a Google Spreadsheet, and playing with bits of code to start to learn how it all comes together on a web page.When you’re working with your own data, about 80% of your time is spent massaging it into shape. Researching data also takes a long time – several evenings spent putting together this list, and it’s nowhere near complete and lots of values are still missing. Starts to bring in questions about writing history – it’s not like working with born-digital scientific etc datasets.There’s a bit of me talking at the start, but I want to let you get stuck into trying things out as soon as possible. This does mean it’s up to you to get the most out of it – ask questions, let me know when you get stuck, follow your own curiosity in thinking about what to try in the time.Knowing your way around a browser will help but no hardcore technical skills are required. Making good visualisations takes time, but I hope you’ll get a taste of what can be done.
You can load this and have a play while I talk. I created this as an excuse to play with software called Neatline that’s designed for hand-crafted visualisations with maps and timelines. One nice thing about this is that it illustrates how far some technical skills can take you – and it’s not all about code, some of it has a big overlap with things like design and library science.Currently PhD student in Digital Humanities in the Department of History, Open UniversityPhD and MSc (Human-Computer Interaction) research on crowdsourcingCall myself a cultural heritage technologist (Science Museum, Museum of London, Melbourne Museum) because it encompasses my background as programmer and business analyst, my later interest in user experience design and research, and now my Digital Humanities research.
Data visualisation is about creating insight, or the formation of a mental model – a new way of thinking about data.Few, Stephen. 2013. ‘Data Visualization for Human Perception’. Ed. MadsSoegaard and RikkeFriis Dam. The Encyclopedia of Human-Computer Interaction, 2nd Ed. Aarhus, Denmark: The Interaction Design Foundation. Accessed January 14. http://www.interaction-design.org/encyclopedia/data_visualization_for_human_perception.html.Michael Friendly quoted at http://www.visualcomplexity.com/vc/blog/?p=1076If interested in the history of visualisation, find out more http://datavis.ca/milestones/ Milestones in the history of data visualisation or http://www.cabinetmagazine.org/issues/13/timelines.php CABINET // A Timeline of Timelines
Hopefully have some ideas now for how visualisations can enable 'scholars to ask increasingly complex research questions by analysing large scale datasets with freely available tools.’ Thinking now about how visualisations can be used to understand, analyse and present large-scale datasets in the humanities and science, and the value of visualisation tools in understanding the shape of a data set. In digital humanities, part of discourse around distant and close reading. Enables overview of many sources over long periods of time, highlighting changes in style, genre or content. Visualisation allows a view of large numbers of items and with tools like entity recognition, can help put them in spatial, historical or cultural context. Ultimately about enabling spotting of patterns; patterns can lead to hypothesis.
Lots of different ways to think about types... Do you want to find new insights, or to communicate or convince? Can be exploratory (find stories)/explanatory (tell stories) in purpose, and range from analytic/pragmatic - abstract/emotive axis Source: http://www.slideshare.net/visualisingdata/andy-kirks-facebook-talkA Tale of Two Types of Visualization and Much Confusion, Robert Kosara: 'two major types of data-based visualization, and understanding the differences. … Pragmatic Visualization…even if understanding this requires some work and experience, the goal of this method is to communicate the data, as efficiently as possible. ... If a visualization is designed to visually represent data, and to do that in such a way as to gain new insights into that data, it shall be called a pragmatic visualization. The basic idea is that using the human visual system (instead of automatic means like data mining or statistics), we can gain insight into data, and develop an understanding of the data and the structures in it. To determine whether a visualization is pragmatic, we simply ask if it allows us to efficiently read the data (or at least the relationships between subsets) from the display.' Cf Artistic Visualization
Scatterplots: good for relationships between variablesMatrix chart: good for multi-dimensional dataBubble chart: good for data with big variations in numbersLine, stack graphs: good for changes in numbers over timePie charts: good for showing proportionsTreemap: good for hierarchical structuresWord tree: good for unstructured textPhrase Net: display common relationships between words in textMaps: display data by location
What types of data are suitable for visualisation? ; the issues researchers commonly encounter when applying tools designed for the commercial sector to typically fuzzy, incomplete and complex humanities data; Data within one dataset might have been prepared by different departments, in different original systems or at different times, so when cleaning data, some content might be more likely to drop out than others.
Examples from the Cooper Hewitt collection. I spent 3/5 of my time at the Cooper Hewitt just trying to get the data clean enough to vaguely represent the collection. The problem is that computers think U.S., U. S. , U.S.A., U. S. A. , United States, United States of America are six different places.Fields also contain things like internal notes about potential duplicates, unexpected extra information - notes on what type of location, etc. Lots of inconsistencies - uncertainty and date ranges expressed in different ways.More common museum issues - What year is 'early 18th century'? What do you do with '1836 (probably)'?
Tools die when they encounter messy data
There are also lots of software libraries for creating visualisations http://selection.datavisualization.ch/ lets you toggle between ones that require you to code and ones that don’t) but many require some programming knowledge.If you want to do really interesting things, invent new type of visualisations or find ways of presenting your specific data, you might need to get stuck into some code. Finding someone to work with can be a good way of learning if you don’t have any training available to you.
Visualization Options Available in Many Eyeshttp://www-958.ibm.com/software/data/cognos/manyeyes/page/Visualization_Options.htmlData formats for uploading data1] Prepare your data. First, find the data set that you want to put into Many Eyes. The size limit is 5 megabytes. Data tables If your data is a list of values, first format it into a table with informative column headers. If your columns have different units of measure, be sure to include the units in the headers. Use a spreadsheet program such as Microsoft Excel or a text file where columns are separated with tabs. If this is your first upload, read the format guidelines. If you have a specific visualization in mind, take a look at its explanation page for additional information.Free TextIf your data is free text (such as an essay or a speech), open the data in a word processor or web browser, select the text, and copy it to the clipboard by typing control-C (Windows) or command-C (Macintosh).http://www-958.ibm.com/software/analytics/manyeyes/datasets/new
The code is heavily (and chattily) commented with things to try so that you can start to see how the code effects what happens on the page.
CSDiff(Windows)
It physically hurts me to see unmatched quotes because they have been the cause of so much trauma in the past
Visualisation type - review previous slides, think about whether you're:Comparing categories;Assessing hierarchies & part-to-whole relationships;Showing changes over time;Charting connections and relationships;Mapping geo-spatial dataYou might get further working in pairs… [Exercises must include: creating a data visualisation (learn how to use online tools to create visualisations that explore British Library datasets such as the British National Bibliography or 19th Century books, designed to result in something to take home to mum); using Google Refine to clean and prepare data. Do, clean, re-do? How to design so that failure is a learning experience? Small, controlled 'compare and contrast' experiments with ManyEyes? Do exercise on discussing how visualisations are good or bad in terms of design?]
Find out more at http://www.miaridge.com/resources-for-data-visualisation-for-analysis-in-scholarly-research/