SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Conversations with 
Data Tony Hirst 
Computing and Communications, 
The Open University
(Recognising 
and addressing 
a skills gap)
“The Technical Tools of Statistics” read at the 125th Anniversary Meeting of the American Statistical Association, 
Boston, November 1964, published in April 1965 American Statistician. 
http://cm.bell-labs.com/cm/ms/departments/sia/tukey/memo/techtools.html 
/via Adam Cooper, “Exploratory Data Analysis” 
http://blogs.cetis.ac.uk/adam/2012/05/18/exploratory-data-analysis/ 
John Tukey 
“journeyman 
carpenter of data-analytical 
tools”
“A Boy's Work is Never Done”, KellyB. (flickr: foreverphoto/2467694199/)
“Exploratory data analysis 
is an attitude, 
a flexibility, 
and reliance on display, 
not a bundle of techniques 
and should be so taught.” 
John Tukey 
Tukey, John W. "We need both exploratory and confirmatory." The 
American Statistician 34.1 (1980): 23-25. 
http://www.ece.rice.edu/~fk1/classes/ELEC697/TukeyEDA.pdf
“I … cannot disagree strongly enough with statements 
about the dangers of putting powerful tools in the 
hands of novices. Computer algebra, statistics, and 
graphics systems provide plenty of rope for novices to 
hang themselves and may even help to inhibit the 
learning of essential skills needed by researchers. The 
obvious problems caused by this situation do not 
justify blunting our tools, however. They require better 
education in the imaginative and disciplined use of 
these tools. And they call for more attention to the 
way powerful and sophisticated tools are presented to 
novice users.” 
Leland Wilkinson, The Grammar of Graphics, Springer-Verlag, 1999, 
ISBN 0-387-98774-6, p15-16.
Data 
accessibility 
Data 
sensemaking
Clean 
Shape 
Augment 
Look
Dirty Data
openrefine.org
Shapes…
I see trees…
See also: IPython notebook demo 
http://nbviewer.ipython.org/gist/psychemedia/9c54721e853403b43d21/pivotTable_demo.ipynb
“There is no more reason to expect 
one graph to ‘tell all’ than to expect 
one number to do the same.” 
-- John Tukey
If quantities are conserved, 
can you think of them in terms of flow?
“[T]he picture examining eye 
is the best finder we have 
of the wholly unanticipated.” 
Tukey, John W. "We need both exploratory and confirmatory." The 
American Statistician 34.1 (1980): 23-25. 
http://www.ece.rice.edu/~fk1/classes/ELEC697/TukeyEDA.pdf 
John Tukey
How can we 
look at data?
How do we 
ask questions 
of data?
underspend filetype:xls site:gov.uk 
Search limits
Structured queries 
underspend filetype:xls site:gov.uk 
select webPages where 
text like “%underspend%” 
and filetype=“xls” 
and domain=“gov.uk” 
SQL
Count things 
Sort things
http://www.coolinfographics.com/blog/2014/8/29/false-visualizations-sizing-circles-in-infographics.html
How do we 
interpret the 
answers?
Look for 
outliers 
Top 3… 
…bottom 3
Outliers may be rare occurrences 
over time too… 
Streaks and runs…
Look for 
similarities & 
differences
Look for 
trends
Look for 
patterns & 
structure
“Hand-drawing of graphs, except 
perhaps for reproduction in books 
and in some journals, is now 
economically wasteful, slow, and 
on the way out.” 
– John Tukey
Recording your 
conversations
Rstudio.org
IPython Notebook
“I know of no person or group that is 
taking nearly adequate advantage of 
the graphical potentialities of the 
computer.” 
– John Tukey
Hopefully, that 
contained some 
ouseful.info 
-- @psychemedia

Weitere ähnliche Inhalte

Ähnlich wie Conversations with data

BNW Technology Presentation
BNW Technology PresentationBNW Technology Presentation
BNW Technology PresentationRachel
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisEva Durall
 
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"Darlene Cavalier
 
Argumentation 101 for Learning Analytics PhDs!
Argumentation 101 for Learning Analytics PhDs!Argumentation 101 for Learning Analytics PhDs!
Argumentation 101 for Learning Analytics PhDs!Simon Buckingham Shum
 
Should Intelligent Design replace the Darwinian Theory of Evolution? - Contra
Should Intelligent Design replace the Darwinian Theory of Evolution? - ContraShould Intelligent Design replace the Darwinian Theory of Evolution? - Contra
Should Intelligent Design replace the Darwinian Theory of Evolution? - Contraghostexorcist
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information RetrievalMatthew Lease
 
UCSD Library Presentation 10182010
UCSD Library Presentation 10182010UCSD Library Presentation 10182010
UCSD Library Presentation 10182010Philip Bourne
 
data science @NYT ; inaugural Data Science Initiative Lecture
data science @NYT ; inaugural Data Science Initiative Lecturedata science @NYT ; inaugural Data Science Initiative Lecture
data science @NYT ; inaugural Data Science Initiative Lecturechris wiggins
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DUniversity of Washington
 
Carla Diana's CHI2011 recap
Carla Diana's CHI2011 recapCarla Diana's CHI2011 recap
Carla Diana's CHI2011 recapCarla Diana
 
Being Engelbartian
Being EngelbartianBeing Engelbartian
Being EngelbartianJohn Bradley
 
And Then the Internet Happened Prospective Thoughts about Concept Mapping in ...
And Then the Internet Happened Prospective Thoughts about Concept Mapping in ...And Then the Internet Happened Prospective Thoughts about Concept Mapping in ...
And Then the Internet Happened Prospective Thoughts about Concept Mapping in ...Daniel McLinden
 
Learning Analytics as Educational Knowledge Infrastructure
Learning Analytics as Educational Knowledge InfrastructureLearning Analytics as Educational Knowledge Infrastructure
Learning Analytics as Educational Knowledge InfrastructureSimon Buckingham Shum
 
Meyer dig ethno_2013sdp
Meyer dig ethno_2013sdpMeyer dig ethno_2013sdp
Meyer dig ethno_2013sdpEric Meyer
 
Pliny: 4 perspectives
Pliny: 4 perspectivesPliny: 4 perspectives
Pliny: 4 perspectivesJohn Bradley
 
And Then the Internet Happened Prospective Thoughts about Concept Mapping in ...
And Then the Internet Happened Prospective Thoughts about Concept Mapping in ...And Then the Internet Happened Prospective Thoughts about Concept Mapping in ...
And Then the Internet Happened Prospective Thoughts about Concept Mapping in ...Daniel McLinden
 
Part 1 Information networking as technology tools, uses, and soci.docx
Part 1  Information networking as technology tools, uses, and soci.docxPart 1  Information networking as technology tools, uses, and soci.docx
Part 1 Information networking as technology tools, uses, and soci.docxherbertwilson5999
 
Kenneth Cukier gfke 2014
Kenneth Cukier gfke 2014Kenneth Cukier gfke 2014
Kenneth Cukier gfke 2014innovationoecd
 

Ähnlich wie Conversations with data (20)

BNW Technology Presentation
BNW Technology PresentationBNW Technology Presentation
BNW Technology Presentation
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
 
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"
 
Argumentation 101 for Learning Analytics PhDs!
Argumentation 101 for Learning Analytics PhDs!Argumentation 101 for Learning Analytics PhDs!
Argumentation 101 for Learning Analytics PhDs!
 
Should Intelligent Design replace the Darwinian Theory of Evolution? - Contra
Should Intelligent Design replace the Darwinian Theory of Evolution? - ContraShould Intelligent Design replace the Darwinian Theory of Evolution? - Contra
Should Intelligent Design replace the Darwinian Theory of Evolution? - Contra
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
150609 c4 e-universityinnovationecosystems
150609 c4 e-universityinnovationecosystems150609 c4 e-universityinnovationecosystems
150609 c4 e-universityinnovationecosystems
 
UCSD Library Presentation 10182010
UCSD Library Presentation 10182010UCSD Library Presentation 10182010
UCSD Library Presentation 10182010
 
data science @NYT ; inaugural Data Science Initiative Lecture
data science @NYT ; inaugural Data Science Initiative Lecturedata science @NYT ; inaugural Data Science Initiative Lecture
data science @NYT ; inaugural Data Science Initiative Lecture
 
Neo luddism
Neo luddismNeo luddism
Neo luddism
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&D
 
Carla Diana's CHI2011 recap
Carla Diana's CHI2011 recapCarla Diana's CHI2011 recap
Carla Diana's CHI2011 recap
 
Being Engelbartian
Being EngelbartianBeing Engelbartian
Being Engelbartian
 
And Then the Internet Happened Prospective Thoughts about Concept Mapping in ...
And Then the Internet Happened Prospective Thoughts about Concept Mapping in ...And Then the Internet Happened Prospective Thoughts about Concept Mapping in ...
And Then the Internet Happened Prospective Thoughts about Concept Mapping in ...
 
Learning Analytics as Educational Knowledge Infrastructure
Learning Analytics as Educational Knowledge InfrastructureLearning Analytics as Educational Knowledge Infrastructure
Learning Analytics as Educational Knowledge Infrastructure
 
Meyer dig ethno_2013sdp
Meyer dig ethno_2013sdpMeyer dig ethno_2013sdp
Meyer dig ethno_2013sdp
 
Pliny: 4 perspectives
Pliny: 4 perspectivesPliny: 4 perspectives
Pliny: 4 perspectives
 
And Then the Internet Happened Prospective Thoughts about Concept Mapping in ...
And Then the Internet Happened Prospective Thoughts about Concept Mapping in ...And Then the Internet Happened Prospective Thoughts about Concept Mapping in ...
And Then the Internet Happened Prospective Thoughts about Concept Mapping in ...
 
Part 1 Information networking as technology tools, uses, and soci.docx
Part 1  Information networking as technology tools, uses, and soci.docxPart 1  Information networking as technology tools, uses, and soci.docx
Part 1 Information networking as technology tools, uses, and soci.docx
 
Kenneth Cukier gfke 2014
Kenneth Cukier gfke 2014Kenneth Cukier gfke 2014
Kenneth Cukier gfke 2014
 

Mehr von Tony Hirst

15 in 20 research fiesta
15 in 20 research fiesta15 in 20 research fiesta
15 in 20 research fiestaTony Hirst
 
Jupyternotebooks ou.pptx
Jupyternotebooks ou.pptxJupyternotebooks ou.pptx
Jupyternotebooks ou.pptxTony Hirst
 
Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptxTony Hirst
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacksTony Hirst
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriateTony Hirst
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriateTony Hirst
 
Robotlab jupyter
Robotlab   jupyterRobotlab   jupyter
Robotlab jupyterTony Hirst
 
Fco open data in half day th-v2
Fco open data in half day  th-v2Fco open data in half day  th-v2
Fco open data in half day th-v2Tony Hirst
 
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopNotes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopTony Hirst
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireTony Hirst
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interestTony Hirst
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXTony Hirst
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefineTony Hirst
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingoTony Hirst
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Tony Hirst
 
Lincoln jun14datajournalism
Lincoln jun14datajournalismLincoln jun14datajournalism
Lincoln jun14datajournalismTony Hirst
 
Lincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data JournalismLincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data JournalismTony Hirst
 

Mehr von Tony Hirst (20)

15 in 20 research fiesta
15 in 20 research fiesta15 in 20 research fiesta
15 in 20 research fiesta
 
Dev8d jupyter
Dev8d jupyterDev8d jupyter
Dev8d jupyter
 
Ili 16 robot
Ili 16 robotIli 16 robot
Ili 16 robot
 
Jupyternotebooks ou.pptx
Jupyternotebooks ou.pptxJupyternotebooks ou.pptx
Jupyternotebooks ou.pptx
 
Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptx
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacks
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriate
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriate
 
Robotlab jupyter
Robotlab   jupyterRobotlab   jupyter
Robotlab jupyter
 
Fco open data in half day th-v2
Fco open data in half day  th-v2Fco open data in half day  th-v2
Fco open data in half day th-v2
 
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopNotes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 Workshop
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wire
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interest
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKX
 
Week4
Week4Week4
Week4
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefine
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingo
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
 
Lincoln jun14datajournalism
Lincoln jun14datajournalismLincoln jun14datajournalism
Lincoln jun14datajournalism
 
Lincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data JournalismLincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data Journalism
 

Kürzlich hochgeladen

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 

Kürzlich hochgeladen (20)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 

Conversations with data

Hinweis der Redaktion

  1. Wikipedia – Journeyman: “A journeyman is an individual who has completed an apprenticeship and is fully educated in a trade or craft, but not yet a master. To become a master, a journeyman has to submit a master work piece to a guild for evaluation and be admitted to the guild as a master. “In parts of Europe, as in later medieval Germany, spending time as a wandering journeyman (Wandergeselle), moving from one town to another to gain experience of different workshops, was an important part of the training of an aspirant master. Carpenters in Germany have retained the tradition of traveling journeymen even today, although only a few still practice.”
  2. Bar charts are a very effective way of displaying particular sorts of information, such as counts. But what other ways are there of displaying data?
  3. Bar charts are a very effective way of displaying particular sorts of information, such as counts. But what other ways are there of displaying data?
  4. Datawrapper provides a variety of chart types, including: horizontal and vertical (column) bar charts, grouped bars that collate different bars according to groups (for example, election on election percentage of the vote for different political parties), stacked column charts (for example, for a selection of countries we could display a column showing the total number of medals constructed by stacking the individual gold, silver and bronze medal counts for those countries) line charts, which are widely used for plotting some value on the vertical y-axis against time on the horizontal x-axis pie charts, to show proportions of a whole, and variants thereof, such as the donut chart (a pie chart with the middle cut out) simple data tables (never underestimate the power of a table – they can be really useful for showing specific values, and can be very powerful when allowing the user to sort the table either by ascending or descending values in particular columns) maps, which as we shall see, can draw out very powerful relationships across data elements.
  5. We’ve also seen some other “basic” charts that can be useful for displaying the distribution of data elements: the block histogram shows a count on the y-axis of data elements falling within particular ranges of values on the x-axis the scatterplot allows us to plot two values against each other, for example height versus weight. These charts can provide us with clues about possible correlations or relationships between the two values. Some scatterplot tools further allow us to colour each point according to group membership so that we can look to see whether numbers are clustered or grouped according to group membership.
  6. Visualising data is a powerful way of asking questions of data – what data points you choose to display and how you display them represent the framing of the question. What the data looks like is the response, but a response that often takes careful reading. The data source has drawn you the answer – you need to turn it into words that you can use to formulate further questions to check your understanding of the answer first provided. (Each question (each chart) typically leads to another… or more than one other…) Asking questions that have a graphical answer is one way of querying a data source – but are there other approaches? Let’s explore that a little more – what do we mean by asking questions of data?
  7. Custom search engines are a powerful tool for helping us developed focussed web search tools that limit results to a particular part of the web we are interested in, either by location or topic. We can also use (advanced) search limits in ‘everyday’ web queries using the major web search engine. For example, the query shown on this slide searches for the word underspend appearing in Excel spreadsheets (filetype:xls) that can be found on UK government websites (or more specifically, websites hosted on the gov.uk domain (site:gov.uk)). Another query limit combination I have found useful is: confidential filetype:ppt This can turn up presentations that have been delivered at closed corporate events but that have leaked on to the web…
  8. Even if you don’t consider yourself a geek or database expert, writing advanced search queries using search limits is but a small step away from writing queries over databases themselves. One of the most widely used languages for querying databases is SQL. The above slide shows a simple, made up SQL query that could have a similar effect to the simpler search engine query made over a very simple search engine database. The idea is that we select those webPages where the text content of the webpage contains the word underspend anywhere – the % signs denote wildcard characters so the underspend word can appear preceded or followed by any number of arbitrary characters. We also want the query to be limited to pages that have a particular filetype and domain. Far more complicated queries can be written over far more complex databases. What’s important is that you develop an idea of what sorts of database structure and query are possible, not necessarily that you can run and query such databases yourself. For more examples, see: Asking Questions of Data – Garment Factories Data Expedition – http://schoolofdata.org/2013/05/24/asking-questions-of-data-garment-factories-data-expedition/ Asking Questions of Data – Some Simple One-Liners http://schoolofdata.org/2013/05/13/asking-questions-of-data-some-simple-one-liners/
  9. One of the simplest, but often one of the most useful, things we can do is to count things. You just need to be creative in what you count! One of the nice features about working with database query languages such as SQL is that we can write queries that count the number of responses and allows us to rank results on that basis. For example, in a database of public spending transactions with different companies, we could count the number of transactions with a particular company, sum the value of transactions carried out with a particular company, or find the companies with the largest total amount spent with a particular company.
  10. This further refinement of the same graphic shows how the two values can be compared. On the left, each column is rank ordered and lines connect similar items, offering a direct columnar or column based comparison. On the right, the ordering is according to the rank order of the right hand column,, allowing direct comparison across the rows.
  11. As has already been mentioned, a key part of the journalistic exercise is putting things into context. When working with data, interpreting what the data says often depends on understanding the context and more importantly, the caveats, that arise by virtue of asking a particular question of a particular dataset that has been collected in a particular way under particular conditions. That said, given a particular data set, are there any obvious questions we can ask of it?
  12. When results are ranked, as for example in the case of league tables, there are often easy picking stories to be had around top 3/bottom three positions. In national rankings, local news stories can be identified if your local schools or council appears in either of those extremes. For contextualisation purposes, it often makes sense to look at distributions. Many summary statistics report on the mean value, but looking at measures of variation, or spread, about a mean, as well as the position of a median value, can often change the context of a story. If the lecture room has 20 students in it on an income of £6,000 maintenance loan per year, the total income is £120,000 and their average mean income is £6,000. If an academic in the room is on £40,000, the total income for the room is £160,000. The average mean income is now just a little over £7,500. If we define a poverty level as a mean income below £10, 000, the members of the room are, on average, in poverty. If a senior academic such as professor on an income over £65,000 wanders into the room, the total income goes to over £225,000. With 22 people now in the room, the average mean income is now over £10,000: the room is out of poverty. The median average income, however, is still at £6,000. As well as top, bottom, mean and median, we should also look to outliers. If Bill Gates or Mark Zuckerberg walks into a bar, the average net worth of people in that bar is likely to go up to a level of previously unimagined wealth. Here are several reasons why you should pay attention to outliers: they may be ‘dirty’ or incorrect data points that need to be corrected and that may well raise questions about data quality; the outlier may truly be an outlier, a remarkable point and a story in its own right; the outlier may skew other measures, such as mean values or other summary statistics. In such cases, it may make sense to use other measures or to rerun the summary statistic without including the outlier values to get a better feel for how the other members of the distribution relate to each other.
  13. This rather dense graphic is a view over local council spending data in my local area as relates to spend on libraries. The separate charts show the accumulated spend over a period of time with different suppliers. The intention of the display was to provide at a glance a view of accumulated spend with different companies across different directorates and spending areas to see whether any companies had a significant spend compared to other companies. The table at the bottom shows the top of a league table of companies with the largest accumulated spend by directorate and expense type. At first glance, the spend on phone lines with different suppliers seems to outweigh the spend on books. How can that be? Are the librarians spending their time calling premium rate phone lines? If we guess at 20 libraries and a 6 month spend period, then assume that the phone lines correspond to broadband data bills, do the monthly payments per library still seem outrageous? These assumptions are testable via questions to the relevant authorities, of course, but demonstrate the care we need to take when trying to understand why a number that may appear to be large is that large. See also: Local Council Spending Data – Time Series Charts http://blog.ouseful.info/2013/11/06/local-council-spending-data-time-series-charts/
  14. As well as looking for outliers, we should also look for similarities between things we expect to be different and differences between things we expect to be the same, or at least, similar.
  15. Looking again at some of my local council’s spending data, I noticed a search on “music” pulled back what appeared to be a shift in responsibility between directorates for spend on school music service provision. An obvious question that follows is: if the service did change hands (something we can check), was there a resulting difference in the way that the directorates were spending? Could we, for example, identify whether any projects got dropped (or at least, renamed out of scope!)? This forensic approach can also be used to track the consequences of a shift in control of a service, if we know it has happened. When a service changes hand, we can keep a note of the fact and then a year on look for evidence in whether treatment of the service has changed, at least in consequences for spending. See also: What Role, If Any, Does Spending Data Have to Play in Local Council Budget Consultations? http://blog.ouseful.info/2013/11/03/what-role-if-any-does-spending-data-have-to-play-in-local-council-budget-consultations/
  16. If you in the position of paying for energy supply bills – electricity and gas – you’ll probably be familiar with the idea that payments are set so you tend to overpay on a monthly basis. After collecting the interest on your overpayments, the utility companies may eventually get round to sending you a small repayment to cover the excess (ex- of any interest, of course…). Is the same true at the council level? One thing I noticed in the spend my local council spent with supplier Southern Electric was that there appeared to be more than a few “negative payments”. So where were these coming from? The chart shown in this slide has positive payments made by date (not ordered on an evenly space timeline) in black, and the magnitude of negative payments shown in red. Where a red triangle sits over a black dot, this shows that a positive and negative payment of the same amount were made on the same day. Why’s that? Some days show several negative payments – again, what’s happening? There’s not necessarily anything suspicious going on, but what story does this chart appear to tell us, particularly in terms of the similarities in amount of certain positive and negative spends?
  17. Just by the by, this chart refines the question I’m asking of the spend with Southern Electric, asking for more information about positive and negative payments made on the gas and electricity accounts separately.
  18. As well as similarities and differences, data can tell us tales about trends…
  19. Regular releases from the ONS – the Office of National Statistics – provide bread and butter news stories on a regular basis according to a known schedule. For example, monthly job seeker figures get a monthly write-up in OnTheWight, the hyperlocal news blog local to me. The report makes a comparison between the current figures and figures from the previous month and from the same month of the previous year. The aim is is so that we can see how the numbers have changed month on month, and year on year. I started to explore a simple script that would take data directly from the ONS and produce assets that could be reused in a news story – for example, to produce a table showing the change in figures over recent months. I also started to explore ways in which we could automate the production of prose from the data [code: https://gist.github.com/psychemedia/7536017]. For example, the following phrase was generated automatically from monthly figures: The total number of people claiming Job Seeker's Allowance (JSA) on the Isle of Wight in October was 2781, up 94 from 2687 in September, 2013, and down 377 from 3158 in October, 2012. The words up and down were selected based on simple if-then rule that compared figures to see which was the greater. The numbers and dates are pulled in from the data. The other words are canned phrases. The automated production of text from data is something that has received attention from several companies, particular in the area of baseball reports and financial reporting. See for example: http://blog.ouseful.info/2013/05/22/notes-on-narrative-science-and-automated-insight/ Being able to define sentences and natural language constructions that can be used as templates to display data in textual form is a skill that could well feed into specialist areas of data driven reporting. Identifying the patterns in the data that can be mapped onto natural language explanations of those patterns in a reliable way is another area in which wordsmiths, statisticians and developers may have to work together in the future.
  20. If we plot a line chart with some quantity against a time axis, we can often see increasing or decreasing trends over time. If we are looking for constant rates of increase in some value, it often makes sense to use a log/logarithmic scale to display that value on the y-axis Periodic trends can also be seen as ‘waves’ appearing in the line over time, but other displays can draw out periodicity or seasonality in a more visually compelling way. For example, in these charts – of jobless figures on the Isle of Wight once again – we have months ordered along the horizontal x-axis and the number of job allowance claimants on the vertical y-axis. The separate coloured lines represent different years. On the left, we use a legend to identify the lines, on the right is an example of labeling the lines directly. The lines show strong seasonality in behaviour. Being a tourist destination, job seeker figures tend to fall over the summer months. Putting lines for several years on the same axis allows us to compare annual cycles over time.
  21. Another trend we can try to pull out is change over years for each given month. Here, the horizontal x-axis blocks out the months, as before, but within each month we have an ordered range of years. The line within each block thus represents the year-on-year change in numbers within a given month. The step change within each month suggests that the way the figures were calculated changed significantly several years ago. Further reading: a good guide to statistics as used by government, include a description of the way that “seasonal adjustments” are handled, is provided by the House of Commons Library’s Statistical Literacy Guide http://www.parliament.uk/business/publications/research/briefing-papers/SN04944/statistical-literacy-guide
  22. As well as the patterns we can see over time by plotting data against a time axis, we can also look for patterns in space…
  23. In part because they are so recognisable to the majority of people as an idea as well as an artefact, maps are widely used in many publications. I have already mentioned how the use of a map to compare travel claims by MPs based on their constituency locations provided a way of making a particular sort of comparison between MPs (in particular, a comparison based on geographical location). But we can take the idea of a map more generally, as a spatial distribution of points that are related in some way, with strong relations represented as spatial proximity. Things that are close together on the page are taken to be close together in some sort of space, a space which may be conceptual or social, not just (or not even) geographic.
  24. Take this map, for example, a map of Twitter users commonly followed by a sample of followers of @UL_journalism. The map has been laid out so that Twitter users who are heavily interlinked are grouped closely together (for the most part, at least). A network statistic has been used in an attempt to colour clusters of nodes with high interconnection. The coloured regions thus represent a first attempt at identifying different groupings of Twitter user. You will note how the spatial layout algorithm and the grouping/colouring algorithm complement each other well – they both seem to tell a similar story, where the story is that certain groups of individuals are somehow alike. About the technique: http://schoolofdata.org/2014/02/14/mapping-social-positioning-on-twitter/ Let’s have a closer look at some of the regions…
  25. As well as similarities and differences, data can tell us tales about trends…
  26. As well as similarities and differences, data can tell us tales about trends…