SlideShare ist ein Scribd-Unternehmen logo
1 von 76
Big Data as the Fuel and Visual
Analytics as the Engine Mount of
the Digital Transformation
Prof. Dr. Diego Kuonen, CStat PStat CSci
Statoo Consulting, Berne, Switzerland
@DiegoKuonen + kuonen@statoo.com + www.statoo.info
‘CAS Data Visualization Lecture #4’, HKB, Berne, Switzerland — July 7, 2017
About myself (about.me/DiegoKuonen)
PhD in Statistics, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.
MSc in Mathematics, EPFL, Lausanne, Switzerland.
• CStat (‘Chartered Statistician’), Royal Statistical Society, UK.
• PStat (‘Accredited Professional Statistician’), American Statistical Association, USA.
• CSci (‘Chartered Scientist’), Science Council, UK.
• Elected Member, International Statistical Institute, NL.
• Senior Member, American Society for Quality, USA.
• President of the Swiss Statistical Society (2009-2015).
Founder, CEO & CAO, Statoo Consulting, Switzerland (since 2001).
Professor of Data Science, Research Center for Statistics (RCS), Geneva School of Economics
and Management (GSEM), University of Geneva, Switzerland (since 2016).
Founding Director of GSEM’s new MSc in Business Analytics program (starting fall 2017).
Principal Scientific and Strategic Big Data Analytics Advisor for the Directorate and Board of
Management, Swiss Federal Statistical Office (FSO), Neuchˆatel, Switzerland (since 2016).
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
2
About Statoo Consulting (www.statoo.info)
• Founded Statoo Consulting in 2001.
2017 − 2001 = 16 + .
• Statoo Consulting is a software-vendor independent Swiss consulting firm
specialised in statistical consulting and training, data analysis, data mining
(data science) and big data analytics services.
• Statoo Consulting offers consulting and training in statistical thinking, statistics,
data mining and big data analytics in English, French and German.
Are you drowning in uncertainty and starving for knowledge?
Have you ever been Statooed?
Source: Andy Kirk, June 21, 2012 (slideshare.net/visualisingdata/the-8-hats-of-data-visualisation).
‘DeepArt, the computer that paints your portrait’ (deepart.io)
See also actu.epfl.ch/news/deepart-the-computer-that-paints-your-portrait/ (April 11, 2016).
Something to aspire to?
Source: ‘Visualizing Facebook friendships’, Paul Butler, Facebook Engineering, December 14, 2010 (goo.gl/x3SXm2).
Contents
Contents 9
1. Demystifying the ‘big data’ hype 11
2. Demystifying the two approaches of ‘learning from data’ 20
3. Demystifying ‘visual analytics’ 27
4. Demystifying ‘exploratory data analysis’ 33
5. ‘Visual thinking’ goals and principles 36
6. Conclusion and opportunities 61
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
9
‘Data is arguably the most important natural
resource of this century. ... Big data is big news just
about everywhere you go these days. Here in Texas,
everything is big, so we just call it data.’
Michael Dell, 2014
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
10
1. Demystifying the ‘big data’ hype
• ‘Big data’ have hit the business, government and scientific sectors.
The term ‘big data’ — coined in 1997 by two researchers at the NASA — has
acquired the trappings of religion.
• But, what exactly are ‘big data’?
The term ‘big data’ applies to an accumulation of data that can not be
processed or handled using traditional data management processes or tools.
Big data are a data management infrastructure which should ensure that the
underlying hardware, software and architecture have the ability to enable ‘learning
from data’, i.e. ‘analytics’.
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
11
• The following characteristics — ‘the four Vs’ — provide a definition:
– ‘Volume’ : ‘data at rest’, i.e. the amount of data ( ‘data explosion problem’),
with respect to the number of observations ( ‘size’ of the data), but also with
respect to the number of variables ( ‘dimensionality’ of the data);
– ‘Variety’ : ‘data in many forms’, ‘mixed data’ or ‘broad data’, i.e. different
types of data (e.g. structured, semi-structured and unstructured, e.g. log files,
text, web or multimedia data such as images, videos, audio), data sources (e.g.
internal, external, open, public), data resolutions (e.g. measurement scales and
aggregation levels) and data granularities;
– ‘Velocity’ : ‘data in motion’ or ‘fast data’, i.e. the speed by which data are
generated and need to be handled (e.g. streaming data from devices, machines,
sensors and social data);
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
12
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
13
– ‘Veracity’ : ‘data in doubt’ or ‘trust in data’, i.e. the varying levels of noise and
processing errors, including the reliability (‘quality over time’), capability and
validity of the data.
• ‘Volume’ is often the least important issue: it is definitely not a requirement to
have a minimum of a petabyte of data, say.
Bigger challenges are ‘variety’ (e.g. combining different data sources such as
internal data with social networking data and public data) and ‘velocity’, but most
important is ‘veracity’ and the related quality of the data .
Indeed, big data come with the data quality and data governance challenges of
‘small’ data along with new challenges of its own!
Existing ‘small’ data quality frameworks need to be extended, i.e. augmented!
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
14
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
15
Big data in 1939
Source: ‘The history of the Tube in pictures’ (goo.gl/dmJymR).
Criticism of sceptics: these four Vs have always been there! But, what is new?
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
16
‘Data is part of Switzerland’s infrastructure, such as
road, railways and power networks, and is of great
value. The government and the economy are obliged
to generate added value from these data.’
digitalswitzerland, November 22, 2016
Source: digitalswitzerland’s ‘Digital Manifesto for Switzerland’ (digitalswitzerland.com).
The 5th V of big data: ‘Value’ , i.e. the ‘usefulness of data’.
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
17
Intermediate summary: the ‘five Vs’ of (big) data
‘Volume’, ‘Variety’ and ‘Velocity’ are the ‘essential’ characteristics of (big) data;
‘Veracity’ and ‘Value’ are the ‘qualification for use’ characteristics of (big) data.
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
18
‘Data are not taken for museum purposes; they are
taken as a basis for doing something. If nothing is to
be done with the data, then there is no use in
collecting any. The ultimate purpose of taking data
is to provide a basis for action or a recommendation
for action.’
W. Edwards Deming, 1942
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
19
2. Demystifying the two approaches of ‘learning from data’
Data science, statistics and their connection
• The demand for ‘data scientists’ — the ‘magicians of the big data era’ — is
unprecedented in sectors where value, competitiveness and efficiency are data-driven.
The term ‘data science’ was originally coined in 1997 by a statistician.
Data science — a rebranding of ‘data mining’ — is the non-trivial
process of identifying valid (that is, the patterns hold in general, i.e. being
valid on new data in the face of uncertainty), novel, potentially useful
and ultimately understandable patterns or structures or models or trends or
relationships in data to enable ‘data-driven decision making’.
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
20
• Is data science ‘statistical d´ej`a vu’?
But, what is ‘statistics’?
Statistics is the science of ‘learning from data’ (or of making sense out
of data), and of measuring, controlling and communicating uncertainty.
It is a process that includes everything from planning for the collection of data and
subsequent data management to end-of-the-line activities such as drawing conclusions
of data and presentation of results.
Uncertainty is measured in units of probability, which is the currency (or grammar)
of statistics.
Statistics is concerned with the study of data-driven decision making in the face
of uncertainty.
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
21
What distinguishes data science from statistics?
• Statistics traditionally is concerned with analysing primary (e.g. experimental or
‘made’ or ‘designed’) data that have been collected to explain and check the validity
of specific existing ‘ideas’, i.e. through operationalisation of theoretical concepts.
Primary data analytics or top-down (explanatory and confirmatory) analytics.
‘Idea (theory) evaluation or testing’ .
‘Deductive reasoning’ as ‘idea (theory) first’.
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
22
• Data science, on the other hand, typically is concerned with analysing secondary
(e.g. observational or ‘found’ or ‘organic’) data that have been collected (and
designed) for other reasons (and often not ‘under control’ of the investigator) to
create new ideas (‘theories’).
Secondary data analytics or bottom-up (exploratory and predictive) analytics.
‘Idea (theory) generation’ .
‘Inductive reasoning’ as ‘data first’.
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
23
‘We see the data analyst’s insistence on ‘letting the
data speak to us’ by plots and displays as an
instinctive understanding of the need to encourage
and to stimulate the pattern recognition and model
generating capability of the right brain. Also, it
expresses his concern that we not allow our pushy
deductive left brain to take over too quickly and
perhaps forcibly produce unwarranted conclusions
based on an inadequate model.’
George E. P. Box, 1984
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
24
• The two approaches of analytics, i.e. ‘learning from data’, are complementary and
should proceed side by side — in order to enable proper data-driven decision making
and to enable continuous improvement.
Source: Box, G. E. P. (1976). Science and statistics. Journal of the American Statistical Association, 71, 791–799.
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
25
‘Neither exploratory nor confirmatory is sufficient
alone. To try to replace either by the other is
madness. We need them both.’
John W. Tukey, 1980
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
26
3. Demystifying ‘visual analytics’
Visual analytics is the representation and presentation of data that
exploits our visual perception abilities in order to amplify cognition.
Visual analytics is also known as exploratory or visual data mining or visual
exploration ( ‘exploratory data analysis’).
The goal is simply to explore the data without any clear ideas of what we are
looking for ( discovering the unexpected by inductive reasoning).
A good enough description of a behaviour will often suggest an explanation for it
as well.
At the very least, a good description suggests where to start looking for an
explanation by deductive reasoning.
Visual analytics is an example of so-called ‘unsupervised learning’ from data.
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
27
‘The greatest value of a picture is when it forces us
to notice what we never expected to see.’
John W. Tukey, 1977
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
28
The importance of data visualisation (Anscombe, 1973)
Anscombe’s quartet : four data sets that have same mean, same variance, same
correlation and same ‘simple linear regression line’. They seem to be very similar!
But the visual impression of each of the scatterplots is very different! Indeed:
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
29
Only data set ‘I’ (top left) is well represented by a ‘simple linear regression line’.
Always look at a scatterplot of the data before blindly computing a numerical
summary like a correlation coefficient!
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
30
Alberto Cairo’s Datasaurus (goo.gl/NQLvvD)
Source: ‘Same Stats, Different Graphs’ (autodeskresearch.com/publications/samestats).
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
31
‘A computer should make both calculations and
graphs. Both sorts of output should be studied; each
will contribute to understanding.’
Francis J. Anscombe, 1973
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
32
4. Demystifying ‘exploratory data analysis’
• Exploratory data analysis (introduced by John W. Tukey in the 1970s) is an
inductive approach to data analysis (i.e. ‘data first’) that emphasises the use of
informal graphical procedures not based on prior assumptions about the structure of
the data or on formal models for the data.
‘Unsupervised learning’ from data.
• The essence of this approach is that, broadly speaking, data are assumed to possess
the following structure:
data = smooth + rough,
where the ‘smooth’ is the underlying regularity or pattern in the data.
The objective of the exploratory approach is to separate the ‘smooth’ from the
‘rough’ with minimal use of formal mathematics or statistical methods.
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
33
• Exploratory data analysis is an approach and philosophy for data analysis that
employs a variety of techniques to, for example,
– maximise insight into data by exploring them;
– see overall patterns and detailed behaviour;
– uncover underlying structure;
– extract ‘important’ variables;
– detect ‘outliers’ and ‘anomalies’;
– test underlying ‘assumptions’; and
– develop ‘parsimonious’ models.
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
34
‘No catalog of techniques can convey a willingness to
look for what can be seen, whether or not
anticipated. Yet this is at the heart of exploratory
data analysis. The graph paper — and transparencies
— are there, not as a technique, but rather as a
recognition that the picture-examining eye is the best
finder we have of the wholly unanticipated.’
John W. Tukey, 1980
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
35
5. ‘Visual thinking’ goals and principles
Visual representations of data
• Visual representations of data are used primarily for two purposes:
to search for meaningful patterns and then examine them once they are found
to gain understanding ( graphics for exploration);
to communicate meaningful findings to others ( graphics for presentation).
• The goal of visualisation is to aid our understanding of data by leveraging the
human visual system’s highly tuned ability to, for example, see patterns, spot trends
and identify outliers.
Well-designed visual representations can replace cognitive calculations with simple
perceptual ‘inferences’ and improve comprehension, memory and decision making.
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
36
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
37
‘Visual thinking tools are especially important
because they harness the visual pattern finding part
of the brain. Almost half the brain is devoted to the
visual sense and the visual brain is exquisitely capable
of interpreting graphical patterns, even scribbles, in
many different ways. Often, to see a pattern is to
understand the solution to a problem.’
Colin Ware, 2008
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
38
Visual analytics in 1854: John Snow & the cholera epidemic
• Map drawn by the epidemiologist John Snow in 1854 to show a suspected
cause-effect relationship between deaths from the cholera epidemic in London
during September 1854 and locations of water pump-wells:
The affected well is clearly identified by the concentration of cases in its vicinity.
A good data representation is the key of solving the problem.
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
39
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
40
‘For close upon 100 years we have been free in this
country from epidemic cholera, and it is a freedom
which, basically, we owe to the logical thinking, acute
observations and simple sums of John Snow.’
Austin B. Hill, 1955
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
41
Visual analytics in 1861: Charles J. Minard & Napoleon’s march to Moscow
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
42
‘Minard’s graphic tells a rich, coherent story with its multivariate data, far
more enlightening than just a single number bouncing along over time. It
may well be the best statistical graphic ever drawn .’
Edward R. Tufte, 2001
‘The chart gives a fantastic summary of Napoleon’s campaign. It is
possible that Tolstoy did the same thing more artistically in ‘War and
Peace’, but he needed a thousand pages to do so.’
Anders Wallgren et al., 1996
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
43
‘Graphical excellence is that which gives to the viewer
the greatest number of ideas in the shortest time
with the least ink in the smallest space.’
Edward R. Tufte
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
44
Visual perception
Do you perceive the following figure correctly?
Source: Heuer (1999, Chapter 2).
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
45
The article is written twice in each of the three phases. This is commonly
overlooked because visual perception is influenced by our expectations about how
these familiar phrases are normally written.
This simple experiment demonstrates one of the most fundamental principles
concerning visual perception:
we tend to perceive what we expect to perceive .
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
46
‘It takes more information, and more unambiguous
information, to recognise an unexpected phenomenon
than an expected one.’
Richards J. Heuer, 1999
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
47
Visual exploration paradigm
• Visual data exploration usually follows a three step process:
overview first,
zoom and filter ( ‘drill-down’), and
then details on demand.
This process is called the ‘Information Seeking Mantra’ (Shneiderman, 1996).
• Analytical navigation is often cyclical in nature.
We do not just start with the overview, zoom and filter, then access details on
demand, and end the process there.
An examination of the details often prompts us to expand the picture to an
overview once again.
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
48
Asia at night
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
49
South and North Korea at night
Notice Seoul, South Korea, and how dark it is in North Korea.
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
50
NASA image taken from the ISS on January 30, 2014
Source: earthobservatory.nasa.gov/IOTD/view.php?id=83182
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
51
‘The only limits to exploratory data analysis are those
imposed by time constraints and the creativity of the
data analyst.’
Stephan Morgenthaler, 2009
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
52
Goals involving the visual display of quantitative information
• Gelman and Unwin (2013) propose the following six goals — the first three ( )
being discovery goals and the second three (◦) being communication goals :
give an overview;
convey the sense of the scale and the complexity of the data;
explore the data set using flexible displays to discover unexpected aspects of the
data;
◦ communicate information (to self and others);
◦ tell a story (see Minard’s map of Napoleon’s Russian campaign);
◦ attract attention and stimulate interest.
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
53
‘Each of these goals can be important, but it would
be meaningless to try to achieve all of them at once.
... discovery ... and communication ... can go
together — we want to communicate our discoveries.
... To put it simply, we communicate when we display
a convincing pattern, and we discover when we
observe deviations from our expectations.’
Andrew Gelman and Antony Unwin, 2013
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
54
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
55
‘Dichtigkeit der Bev¨olkerung im Jahre 1888’ (goo.gl/Q51yvU)
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
56
‘Dichtigkeit des Rindviehbestandes im Jahre 1911’ (goo.gl/q1G9Rt)
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
57
‘150 Jahre schweiz. Bundesstaat im Lichte der Statistik’ (goo.gl/2Xm69Q)
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
58
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
59
‘Graphs are indispensable in data analysis, because
the human visual system is so good at recognising
patterns that the unexpected can leap out and hit
the investigator between the eyes.’
Anthony C. Davison, 2003
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
60
6. Conclusion and opportunities
• Graphics are worth more than a thousand words; they are priceless!
However, a ‘poor’ graphic is worse than no graphic at all.
• ‘Better’ graphics will promote understanding of the data and of the process from
which they came.
As such, they will focus on the data and will ‘let the data speak to us’ ( avoid
‘chartjunk’!).
Graphics must show data variation, not design variation!
• Despite an awful lot of marketing hype, big data are here to stay — as well as the
‘Internet of Things’ (IoT; a term coined in 1999!) — and will impact every single
domain!
The future of (visual) analytics is limitless!
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
61
‘The job is to learn from data with the help of
mathematical tools. This process of fully
understanding data can only be obtained through
visual aid.’
Egon S. Pearson, 1956
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
62
• The key elements for a successful (big) data analytics, data science and visual
analytics future are statistical principles and rigour of humans!
• Statistics, (big) data analytics, data science and visual analytics are aids to thinking
and not replacements for it!
They should be envisaged to complement and augment humans, not replacements
for it!
Nowadays, with the digital transformation and the related data revolution, humans
need to augment their strengths to become more ‘powerful’: by automating any
routinisable work and by focusing on their core competences.
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
63
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
64
‘As yet I know of no person or group that is taking
nearly adequate advantage of the graphical
potentialities of the computer. ... In exploration they
[computer-drawn graphs] are going to be the data
analyst’s greatest single resource.’
John W. Tukey, 1965
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
65
‘The data may not contain the answer. The
combination of some data and an aching desire for an
answer does not ensure that a reasonable answer can
be extracted from a given body of data.’
John W. Tukey, 1986
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
66
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
67
‘To properly analyze data, we must first understand
the process that produced the data. While many ...
take the view that data are innocent until proven
guilty, ... it is more prudent to take the opposite
approach, that data are guilty until proven innocent.’
Roger W. Hoerl, Ronald D. Snee and Richard D. De Veaux, 2014
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
68
Given the underlying data veracity, still something to aspire to?
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
69
Something else to aspire to?
Source: Cook, D., Lee, E.-K. & Majumder, M. (2016; Video 1 at goo.gl/YbtG7o).
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
70
‘Exploratory data analysis can never be the whole
story, but nothing else can serve as the foundation
stone — as the first step.’
John W. Tukey, 1977
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
71
Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved.
72
Have you been Statooed?
Prof. Dr. Diego Kuonen, CStat PStat CSci
Statoo Consulting
Morgenstrasse 129
3018 Berne
Switzerland
email kuonen@statoo.com
@DiegoKuonen
web www.statoo.info
Selective list of references
• Anscombe, F. J. (1973). Graphs in statistical analysis. The American Statistician, 27, 17–21.
• Box, G. E. P. (1976). Science and statistics. Journal of the American Statistical Association, 71, 791–799.
• Box, G. E. P. (1984). The importance of practice in the development of statistics. Technometrics, 26, 1–8.
• Buja, A., Cook, D., Hofmann, H., Lawrence, M., Lee, E.-K., Swayne, D. F. & Wickham, H. (2009). Statistical inference for exploratory data analysis
and model diagnostics. Philosophical Transactions of The Royal Society, A, 367, 4361–4383.
• Cairo, A. (2013). The Functional Art — An Introduction to Information Graphics and Visualization. Berkeley, CA: New Riders.
• Cairo, A. (2016). The Truthful Art: Data, Charts, and Maps for Communication. Berkeley, CA: New Riders.
• Cleveland, W. S. (1993). Visualizing Data. Summit, NJ: Hobart Press.
• Cleveland, W. S. (1994). The Elements of Graphing Data. Summit, NJ: Hobart Press.
• Cook, D., Lee, E.-K. & Majumder, M. (2016). Data visualization and statistical graphics in big data analysis. Annual Review of Statistics and Its
Application, 3, 133–159.
• Cook. D. & Swayne, D. F. (2007) Interactive and Dynamic Graphics for Data Analysis: With Examples Using R and GGobi. New York: Springer.
• Gelman, A. & Unwin, A. (2013). Infovis and statistical graphics: different goals, different looks (with Discussion). Journal of Computational and
Graphical Statistics, 22, 2–49.
• Heuer, R. J. (1999). Psychology of Intelligence Analysis. Center for the Study of Intelligence, U.S. Central Intelligence Agency.
• Hoerl, R. W., Snee, R. D. & De Veaux, R. D. (2014). Applying statistical thinking to ‘big data’ problems. Wiley Interdisciplinary Reviews: Computational
Statistics, 6, 222–232.
• Kirk, A., Timms, S., Rininsland, Æ. & Teller, S. (2016). Data Visualization: Representing Information on Modern Web. Birmingham, UK: Packt
Publishing.
• Shneiderman, B. (1996). The eyes have it: a task by data type taxonomy for information visualizations. In Proceedings of the IEEE Symposium on
Visual Languages. Los Alamos, CA: IEEE, 336–343.
• Theus, M. & Urbanek, S. (2009). Interactive Graphics for Data Analysis: Principles and Examples. Boca Raton, FL: Chapman & Hall/CRC.
• Thomas, J. J. & Cook, K. A., Eds. (2005). Illuminating the Path — The Research and Development Agenda for Visual Analytics. U.S. National
Visualization and Analytics Center.
• Tufte, E. R. (1990). Envisioning Information. Cheshire, CT: Graphics Press.
• Tufte, E. R. (1997). Visual Explanations. Cheshire, CT: Graphics Press.
• Tufte, E. R. (2001). The Visual Display of Quantitative Information (2nd edition). Cheshire, CT: Graphics Press.
• Tukey, J. W. (1962). The future of data analysis. Annals of Mathematical Statistics, 33, 1–67.
• Tukey, J. W. (1977). Exploratory Data Analysis. Reading, MT: Addison-Wesley.
• Tukey, J. W. (1980). We need both exploratory and confirmatory. The American Statistician, 34, 23–25.
• Tukey, J. W. & Wilk, M. B. (1966). Data analysis and statistics: an expository overview. In The Collected Works of John W. Tukey, L. V. Jones (Ed.).
Monterey, CA: Wadsworth & Brooks/Cole, 549–578.
• Unwin, A., Theus, M. & Hofmann, H. (2006). Graphics of Large Datasets: Visualizing a Million. New York: Springer.
• Wallgren, A., Wallgren, B., Persson, R., Jorner, U., & and Haaland, J. (1996). Graphing Statistics & Data. London: Sage Publications.
• Ware, C. (2008). Visual Thinking for Design. Morgan Kaufmann Publishers.
Copyright c 2001–2017 by Statoo Consulting, Switzerland. All rights reserved.
No part of this presentation may be reprinted, reproduced, stored in, or introduced
into a retrieval system or transmitted, in any form or by any means (electronic,
mechanical, photocopying, recording, scanning or otherwise), without the prior
written permission of Statoo Consulting, Switzerland.
Warranty: none.
Trademarks: Statoo is a registered trademark of Statoo Consulting, Switzerland.
Other product names, company names, marks, logos and symbols referenced herein
may be trademarks or registered trademarks of their respective owners.
Presentation code: ‘Dataviz.HKB.BFH.2017’.
Typesetting: LATEX, version 2 . PDF producer: pdfTEX, version 3.141592-1.40.3-2.2 (Web2C 7.5.6).
Compilation date: 07.07.2017.

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Demystifying Big Data, Data Science and Statistics, along with Machine Intell...
Demystifying Big Data, Data Science and Statistics, along with Machine Intell...Demystifying Big Data, Data Science and Statistics, along with Machine Intell...
Demystifying Big Data, Data Science and Statistics, along with Machine Intell...
 
Managing Uncertainty to Improve Decision Making - Statistical Thinking for Qu...
Managing Uncertainty to Improve Decision Making - Statistical Thinking for Qu...Managing Uncertainty to Improve Decision Making - Statistical Thinking for Qu...
Managing Uncertainty to Improve Decision Making - Statistical Thinking for Qu...
 
A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...
A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...
A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...
 
A Swiss Statistician's 'Big Tent' Overview of Big Data and Data Science in Ph...
A Swiss Statistician's 'Big Tent' Overview of Big Data and Data Science in Ph...A Swiss Statistician's 'Big Tent' Overview of Big Data and Data Science in Ph...
A Swiss Statistician's 'Big Tent' Overview of Big Data and Data Science in Ph...
 
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...
 
Overview of Big Data, Data Science and Statistics, along with Digitalisation,...
Overview of Big Data, Data Science and Statistics, along with Digitalisation,...Overview of Big Data, Data Science and Statistics, along with Digitalisation,...
Overview of Big Data, Data Science and Statistics, along with Digitalisation,...
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)
 
A Statistician's View on Big Data and Data Science (Version 2)
A Statistician's View on Big Data and Data Science (Version 2)A Statistician's View on Big Data and Data Science (Version 2)
A Statistician's View on Big Data and Data Science (Version 2)
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)
 
A Statistician's View on Big Data and Data Science in Pharmaceutical Developm...
A Statistician's View on Big Data and Data Science in Pharmaceutical Developm...A Statistician's View on Big Data and Data Science in Pharmaceutical Developm...
A Statistician's View on Big Data and Data Science in Pharmaceutical Developm...
 
A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)
A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)
A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)
 
A Statistician's Introductory View on Big Data and Data Science (Version 7)
A Statistician's Introductory View on Big Data and Data Science (Version 7)A Statistician's Introductory View on Big Data and Data Science (Version 7)
A Statistician's Introductory View on Big Data and Data Science (Version 7)
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)
 
A Statistician's View on Big Data and Data Science (Version 3)
A Statistician's View on Big Data and Data Science (Version 3)A Statistician's View on Big Data and Data Science (Version 3)
A Statistician's View on Big Data and Data Science (Version 3)
 
Data as the Fuel and Analytics as the Engine of the Digital Transformation: D...
Data as the Fuel and Analytics as the Engine of the Digital Transformation: D...Data as the Fuel and Analytics as the Engine of the Digital Transformation: D...
Data as the Fuel and Analytics as the Engine of the Digital Transformation: D...
 
"Data as the Fuel and Analytics as the Engine of the Digital Transformation -...
"Data as the Fuel and Analytics as the Engine of the Digital Transformation -..."Data as the Fuel and Analytics as the Engine of the Digital Transformation -...
"Data as the Fuel and Analytics as the Engine of the Digital Transformation -...
 
Big Data, Data Science, Machine Intelligence and Learning: Demystification, C...
Big Data, Data Science, Machine Intelligence and Learning: Demystification, C...Big Data, Data Science, Machine Intelligence and Learning: Demystification, C...
Big Data, Data Science, Machine Intelligence and Learning: Demystification, C...
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)
 
Production Processes of Official Statistics & Data Innovation Processes Augme...
Production Processes of Official Statistics & Data Innovation Processes Augme...Production Processes of Official Statistics & Data Innovation Processes Augme...
Production Processes of Official Statistics & Data Innovation Processes Augme...
 
Data as the Fuel and Analytics as the Engine of the Digital Transformation: D...
Data as the Fuel and Analytics as the Engine of the Digital Transformation: D...Data as the Fuel and Analytics as the Engine of the Digital Transformation: D...
Data as the Fuel and Analytics as the Engine of the Digital Transformation: D...
 

Ähnlich wie Big Data as the Fuel and Visual Analytics as the Engine Mount of the Digital Transformation

Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data science
Jordan Engbers
 

Ähnlich wie Big Data as the Fuel and Visual Analytics as the Engine Mount of the Digital Transformation (11)

Data Science definition
Data Science definitionData Science definition
Data Science definition
 
Let's talk about Data Science
Let's talk about Data ScienceLet's talk about Data Science
Let's talk about Data Science
 
Why Data Science is a Science
Why Data Science is a ScienceWhy Data Science is a Science
Why Data Science is a Science
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Data science innovations
Data science innovations Data science innovations
Data science innovations
 
Big data Paper
Big data PaperBig data Paper
Big data Paper
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learning
 
Digital cultural heritage spring 2015 day 2
Digital cultural heritage spring 2015 day 2Digital cultural heritage spring 2015 day 2
Digital cultural heritage spring 2015 day 2
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Big Data: Big Issues for IP
Big Data: Big Issues for IPBig Data: Big Issues for IP
Big Data: Big Issues for IP
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data science
 

Kürzlich hochgeladen

Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
raffaeleoman
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
Kayode Fayemi
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
Sheetaleventcompany
 

Kürzlich hochgeladen (20)

Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptx
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 

Big Data as the Fuel and Visual Analytics as the Engine Mount of the Digital Transformation

  • 1. Big Data as the Fuel and Visual Analytics as the Engine Mount of the Digital Transformation Prof. Dr. Diego Kuonen, CStat PStat CSci Statoo Consulting, Berne, Switzerland @DiegoKuonen + kuonen@statoo.com + www.statoo.info ‘CAS Data Visualization Lecture #4’, HKB, Berne, Switzerland — July 7, 2017
  • 2. About myself (about.me/DiegoKuonen) PhD in Statistics, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland. MSc in Mathematics, EPFL, Lausanne, Switzerland. • CStat (‘Chartered Statistician’), Royal Statistical Society, UK. • PStat (‘Accredited Professional Statistician’), American Statistical Association, USA. • CSci (‘Chartered Scientist’), Science Council, UK. • Elected Member, International Statistical Institute, NL. • Senior Member, American Society for Quality, USA. • President of the Swiss Statistical Society (2009-2015). Founder, CEO & CAO, Statoo Consulting, Switzerland (since 2001). Professor of Data Science, Research Center for Statistics (RCS), Geneva School of Economics and Management (GSEM), University of Geneva, Switzerland (since 2016). Founding Director of GSEM’s new MSc in Business Analytics program (starting fall 2017). Principal Scientific and Strategic Big Data Analytics Advisor for the Directorate and Board of Management, Swiss Federal Statistical Office (FSO), Neuchˆatel, Switzerland (since 2016). Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 2
  • 3.
  • 4. About Statoo Consulting (www.statoo.info) • Founded Statoo Consulting in 2001. 2017 − 2001 = 16 + . • Statoo Consulting is a software-vendor independent Swiss consulting firm specialised in statistical consulting and training, data analysis, data mining (data science) and big data analytics services. • Statoo Consulting offers consulting and training in statistical thinking, statistics, data mining and big data analytics in English, French and German. Are you drowning in uncertainty and starving for knowledge? Have you ever been Statooed?
  • 5.
  • 6. Source: Andy Kirk, June 21, 2012 (slideshare.net/visualisingdata/the-8-hats-of-data-visualisation).
  • 7. ‘DeepArt, the computer that paints your portrait’ (deepart.io) See also actu.epfl.ch/news/deepart-the-computer-that-paints-your-portrait/ (April 11, 2016).
  • 8. Something to aspire to? Source: ‘Visualizing Facebook friendships’, Paul Butler, Facebook Engineering, December 14, 2010 (goo.gl/x3SXm2).
  • 9. Contents Contents 9 1. Demystifying the ‘big data’ hype 11 2. Demystifying the two approaches of ‘learning from data’ 20 3. Demystifying ‘visual analytics’ 27 4. Demystifying ‘exploratory data analysis’ 33 5. ‘Visual thinking’ goals and principles 36 6. Conclusion and opportunities 61 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 9
  • 10. ‘Data is arguably the most important natural resource of this century. ... Big data is big news just about everywhere you go these days. Here in Texas, everything is big, so we just call it data.’ Michael Dell, 2014 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 10
  • 11. 1. Demystifying the ‘big data’ hype • ‘Big data’ have hit the business, government and scientific sectors. The term ‘big data’ — coined in 1997 by two researchers at the NASA — has acquired the trappings of religion. • But, what exactly are ‘big data’? The term ‘big data’ applies to an accumulation of data that can not be processed or handled using traditional data management processes or tools. Big data are a data management infrastructure which should ensure that the underlying hardware, software and architecture have the ability to enable ‘learning from data’, i.e. ‘analytics’. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 11
  • 12. • The following characteristics — ‘the four Vs’ — provide a definition: – ‘Volume’ : ‘data at rest’, i.e. the amount of data ( ‘data explosion problem’), with respect to the number of observations ( ‘size’ of the data), but also with respect to the number of variables ( ‘dimensionality’ of the data); – ‘Variety’ : ‘data in many forms’, ‘mixed data’ or ‘broad data’, i.e. different types of data (e.g. structured, semi-structured and unstructured, e.g. log files, text, web or multimedia data such as images, videos, audio), data sources (e.g. internal, external, open, public), data resolutions (e.g. measurement scales and aggregation levels) and data granularities; – ‘Velocity’ : ‘data in motion’ or ‘fast data’, i.e. the speed by which data are generated and need to be handled (e.g. streaming data from devices, machines, sensors and social data); Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 12
  • 13. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 13
  • 14. – ‘Veracity’ : ‘data in doubt’ or ‘trust in data’, i.e. the varying levels of noise and processing errors, including the reliability (‘quality over time’), capability and validity of the data. • ‘Volume’ is often the least important issue: it is definitely not a requirement to have a minimum of a petabyte of data, say. Bigger challenges are ‘variety’ (e.g. combining different data sources such as internal data with social networking data and public data) and ‘velocity’, but most important is ‘veracity’ and the related quality of the data . Indeed, big data come with the data quality and data governance challenges of ‘small’ data along with new challenges of its own! Existing ‘small’ data quality frameworks need to be extended, i.e. augmented! Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 14
  • 15. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 15
  • 16. Big data in 1939 Source: ‘The history of the Tube in pictures’ (goo.gl/dmJymR). Criticism of sceptics: these four Vs have always been there! But, what is new? Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 16
  • 17. ‘Data is part of Switzerland’s infrastructure, such as road, railways and power networks, and is of great value. The government and the economy are obliged to generate added value from these data.’ digitalswitzerland, November 22, 2016 Source: digitalswitzerland’s ‘Digital Manifesto for Switzerland’ (digitalswitzerland.com). The 5th V of big data: ‘Value’ , i.e. the ‘usefulness of data’. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 17
  • 18. Intermediate summary: the ‘five Vs’ of (big) data ‘Volume’, ‘Variety’ and ‘Velocity’ are the ‘essential’ characteristics of (big) data; ‘Veracity’ and ‘Value’ are the ‘qualification for use’ characteristics of (big) data. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 18
  • 19. ‘Data are not taken for museum purposes; they are taken as a basis for doing something. If nothing is to be done with the data, then there is no use in collecting any. The ultimate purpose of taking data is to provide a basis for action or a recommendation for action.’ W. Edwards Deming, 1942 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 19
  • 20. 2. Demystifying the two approaches of ‘learning from data’ Data science, statistics and their connection • The demand for ‘data scientists’ — the ‘magicians of the big data era’ — is unprecedented in sectors where value, competitiveness and efficiency are data-driven. The term ‘data science’ was originally coined in 1997 by a statistician. Data science — a rebranding of ‘data mining’ — is the non-trivial process of identifying valid (that is, the patterns hold in general, i.e. being valid on new data in the face of uncertainty), novel, potentially useful and ultimately understandable patterns or structures or models or trends or relationships in data to enable ‘data-driven decision making’. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 20
  • 21. • Is data science ‘statistical d´ej`a vu’? But, what is ‘statistics’? Statistics is the science of ‘learning from data’ (or of making sense out of data), and of measuring, controlling and communicating uncertainty. It is a process that includes everything from planning for the collection of data and subsequent data management to end-of-the-line activities such as drawing conclusions of data and presentation of results. Uncertainty is measured in units of probability, which is the currency (or grammar) of statistics. Statistics is concerned with the study of data-driven decision making in the face of uncertainty. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 21
  • 22. What distinguishes data science from statistics? • Statistics traditionally is concerned with analysing primary (e.g. experimental or ‘made’ or ‘designed’) data that have been collected to explain and check the validity of specific existing ‘ideas’, i.e. through operationalisation of theoretical concepts. Primary data analytics or top-down (explanatory and confirmatory) analytics. ‘Idea (theory) evaluation or testing’ . ‘Deductive reasoning’ as ‘idea (theory) first’. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 22
  • 23. • Data science, on the other hand, typically is concerned with analysing secondary (e.g. observational or ‘found’ or ‘organic’) data that have been collected (and designed) for other reasons (and often not ‘under control’ of the investigator) to create new ideas (‘theories’). Secondary data analytics or bottom-up (exploratory and predictive) analytics. ‘Idea (theory) generation’ . ‘Inductive reasoning’ as ‘data first’. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 23
  • 24. ‘We see the data analyst’s insistence on ‘letting the data speak to us’ by plots and displays as an instinctive understanding of the need to encourage and to stimulate the pattern recognition and model generating capability of the right brain. Also, it expresses his concern that we not allow our pushy deductive left brain to take over too quickly and perhaps forcibly produce unwarranted conclusions based on an inadequate model.’ George E. P. Box, 1984 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 24
  • 25. • The two approaches of analytics, i.e. ‘learning from data’, are complementary and should proceed side by side — in order to enable proper data-driven decision making and to enable continuous improvement. Source: Box, G. E. P. (1976). Science and statistics. Journal of the American Statistical Association, 71, 791–799. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 25
  • 26. ‘Neither exploratory nor confirmatory is sufficient alone. To try to replace either by the other is madness. We need them both.’ John W. Tukey, 1980 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 26
  • 27. 3. Demystifying ‘visual analytics’ Visual analytics is the representation and presentation of data that exploits our visual perception abilities in order to amplify cognition. Visual analytics is also known as exploratory or visual data mining or visual exploration ( ‘exploratory data analysis’). The goal is simply to explore the data without any clear ideas of what we are looking for ( discovering the unexpected by inductive reasoning). A good enough description of a behaviour will often suggest an explanation for it as well. At the very least, a good description suggests where to start looking for an explanation by deductive reasoning. Visual analytics is an example of so-called ‘unsupervised learning’ from data. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 27
  • 28. ‘The greatest value of a picture is when it forces us to notice what we never expected to see.’ John W. Tukey, 1977 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 28
  • 29. The importance of data visualisation (Anscombe, 1973) Anscombe’s quartet : four data sets that have same mean, same variance, same correlation and same ‘simple linear regression line’. They seem to be very similar! But the visual impression of each of the scatterplots is very different! Indeed: Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 29
  • 30. Only data set ‘I’ (top left) is well represented by a ‘simple linear regression line’. Always look at a scatterplot of the data before blindly computing a numerical summary like a correlation coefficient! Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 30
  • 31. Alberto Cairo’s Datasaurus (goo.gl/NQLvvD) Source: ‘Same Stats, Different Graphs’ (autodeskresearch.com/publications/samestats). Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 31
  • 32. ‘A computer should make both calculations and graphs. Both sorts of output should be studied; each will contribute to understanding.’ Francis J. Anscombe, 1973 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 32
  • 33. 4. Demystifying ‘exploratory data analysis’ • Exploratory data analysis (introduced by John W. Tukey in the 1970s) is an inductive approach to data analysis (i.e. ‘data first’) that emphasises the use of informal graphical procedures not based on prior assumptions about the structure of the data or on formal models for the data. ‘Unsupervised learning’ from data. • The essence of this approach is that, broadly speaking, data are assumed to possess the following structure: data = smooth + rough, where the ‘smooth’ is the underlying regularity or pattern in the data. The objective of the exploratory approach is to separate the ‘smooth’ from the ‘rough’ with minimal use of formal mathematics or statistical methods. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 33
  • 34. • Exploratory data analysis is an approach and philosophy for data analysis that employs a variety of techniques to, for example, – maximise insight into data by exploring them; – see overall patterns and detailed behaviour; – uncover underlying structure; – extract ‘important’ variables; – detect ‘outliers’ and ‘anomalies’; – test underlying ‘assumptions’; and – develop ‘parsimonious’ models. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 34
  • 35. ‘No catalog of techniques can convey a willingness to look for what can be seen, whether or not anticipated. Yet this is at the heart of exploratory data analysis. The graph paper — and transparencies — are there, not as a technique, but rather as a recognition that the picture-examining eye is the best finder we have of the wholly unanticipated.’ John W. Tukey, 1980 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 35
  • 36. 5. ‘Visual thinking’ goals and principles Visual representations of data • Visual representations of data are used primarily for two purposes: to search for meaningful patterns and then examine them once they are found to gain understanding ( graphics for exploration); to communicate meaningful findings to others ( graphics for presentation). • The goal of visualisation is to aid our understanding of data by leveraging the human visual system’s highly tuned ability to, for example, see patterns, spot trends and identify outliers. Well-designed visual representations can replace cognitive calculations with simple perceptual ‘inferences’ and improve comprehension, memory and decision making. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 36
  • 37. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 37
  • 38. ‘Visual thinking tools are especially important because they harness the visual pattern finding part of the brain. Almost half the brain is devoted to the visual sense and the visual brain is exquisitely capable of interpreting graphical patterns, even scribbles, in many different ways. Often, to see a pattern is to understand the solution to a problem.’ Colin Ware, 2008 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 38
  • 39. Visual analytics in 1854: John Snow & the cholera epidemic • Map drawn by the epidemiologist John Snow in 1854 to show a suspected cause-effect relationship between deaths from the cholera epidemic in London during September 1854 and locations of water pump-wells: The affected well is clearly identified by the concentration of cases in its vicinity. A good data representation is the key of solving the problem. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 39
  • 40. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 40
  • 41. ‘For close upon 100 years we have been free in this country from epidemic cholera, and it is a freedom which, basically, we owe to the logical thinking, acute observations and simple sums of John Snow.’ Austin B. Hill, 1955 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 41
  • 42. Visual analytics in 1861: Charles J. Minard & Napoleon’s march to Moscow Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 42
  • 43. ‘Minard’s graphic tells a rich, coherent story with its multivariate data, far more enlightening than just a single number bouncing along over time. It may well be the best statistical graphic ever drawn .’ Edward R. Tufte, 2001 ‘The chart gives a fantastic summary of Napoleon’s campaign. It is possible that Tolstoy did the same thing more artistically in ‘War and Peace’, but he needed a thousand pages to do so.’ Anders Wallgren et al., 1996 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 43
  • 44. ‘Graphical excellence is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space.’ Edward R. Tufte Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 44
  • 45. Visual perception Do you perceive the following figure correctly? Source: Heuer (1999, Chapter 2). Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 45
  • 46. The article is written twice in each of the three phases. This is commonly overlooked because visual perception is influenced by our expectations about how these familiar phrases are normally written. This simple experiment demonstrates one of the most fundamental principles concerning visual perception: we tend to perceive what we expect to perceive . Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 46
  • 47. ‘It takes more information, and more unambiguous information, to recognise an unexpected phenomenon than an expected one.’ Richards J. Heuer, 1999 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 47
  • 48. Visual exploration paradigm • Visual data exploration usually follows a three step process: overview first, zoom and filter ( ‘drill-down’), and then details on demand. This process is called the ‘Information Seeking Mantra’ (Shneiderman, 1996). • Analytical navigation is often cyclical in nature. We do not just start with the overview, zoom and filter, then access details on demand, and end the process there. An examination of the details often prompts us to expand the picture to an overview once again. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 48
  • 49. Asia at night Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 49
  • 50. South and North Korea at night Notice Seoul, South Korea, and how dark it is in North Korea. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 50
  • 51. NASA image taken from the ISS on January 30, 2014 Source: earthobservatory.nasa.gov/IOTD/view.php?id=83182 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 51
  • 52. ‘The only limits to exploratory data analysis are those imposed by time constraints and the creativity of the data analyst.’ Stephan Morgenthaler, 2009 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 52
  • 53. Goals involving the visual display of quantitative information • Gelman and Unwin (2013) propose the following six goals — the first three ( ) being discovery goals and the second three (◦) being communication goals : give an overview; convey the sense of the scale and the complexity of the data; explore the data set using flexible displays to discover unexpected aspects of the data; ◦ communicate information (to self and others); ◦ tell a story (see Minard’s map of Napoleon’s Russian campaign); ◦ attract attention and stimulate interest. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 53
  • 54. ‘Each of these goals can be important, but it would be meaningless to try to achieve all of them at once. ... discovery ... and communication ... can go together — we want to communicate our discoveries. ... To put it simply, we communicate when we display a convincing pattern, and we discover when we observe deviations from our expectations.’ Andrew Gelman and Antony Unwin, 2013 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 54
  • 55. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 55
  • 56. ‘Dichtigkeit der Bev¨olkerung im Jahre 1888’ (goo.gl/Q51yvU) Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 56
  • 57. ‘Dichtigkeit des Rindviehbestandes im Jahre 1911’ (goo.gl/q1G9Rt) Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 57
  • 58. ‘150 Jahre schweiz. Bundesstaat im Lichte der Statistik’ (goo.gl/2Xm69Q) Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 58
  • 59. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 59
  • 60. ‘Graphs are indispensable in data analysis, because the human visual system is so good at recognising patterns that the unexpected can leap out and hit the investigator between the eyes.’ Anthony C. Davison, 2003 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 60
  • 61. 6. Conclusion and opportunities • Graphics are worth more than a thousand words; they are priceless! However, a ‘poor’ graphic is worse than no graphic at all. • ‘Better’ graphics will promote understanding of the data and of the process from which they came. As such, they will focus on the data and will ‘let the data speak to us’ ( avoid ‘chartjunk’!). Graphics must show data variation, not design variation! • Despite an awful lot of marketing hype, big data are here to stay — as well as the ‘Internet of Things’ (IoT; a term coined in 1999!) — and will impact every single domain! The future of (visual) analytics is limitless! Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 61
  • 62. ‘The job is to learn from data with the help of mathematical tools. This process of fully understanding data can only be obtained through visual aid.’ Egon S. Pearson, 1956 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 62
  • 63. • The key elements for a successful (big) data analytics, data science and visual analytics future are statistical principles and rigour of humans! • Statistics, (big) data analytics, data science and visual analytics are aids to thinking and not replacements for it! They should be envisaged to complement and augment humans, not replacements for it! Nowadays, with the digital transformation and the related data revolution, humans need to augment their strengths to become more ‘powerful’: by automating any routinisable work and by focusing on their core competences. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 63
  • 64. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 64
  • 65. ‘As yet I know of no person or group that is taking nearly adequate advantage of the graphical potentialities of the computer. ... In exploration they [computer-drawn graphs] are going to be the data analyst’s greatest single resource.’ John W. Tukey, 1965 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 65
  • 66. ‘The data may not contain the answer. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.’ John W. Tukey, 1986 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 66
  • 67. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 67
  • 68. ‘To properly analyze data, we must first understand the process that produced the data. While many ... take the view that data are innocent until proven guilty, ... it is more prudent to take the opposite approach, that data are guilty until proven innocent.’ Roger W. Hoerl, Ronald D. Snee and Richard D. De Veaux, 2014 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 68
  • 69. Given the underlying data veracity, still something to aspire to? Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 69
  • 70. Something else to aspire to? Source: Cook, D., Lee, E.-K. & Majumder, M. (2016; Video 1 at goo.gl/YbtG7o). Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 70
  • 71. ‘Exploratory data analysis can never be the whole story, but nothing else can serve as the foundation stone — as the first step.’ John W. Tukey, 1977 Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 71
  • 72. Copyright c 2001–2017, Statoo Consulting, Switzerland. All rights reserved. 72
  • 73. Have you been Statooed? Prof. Dr. Diego Kuonen, CStat PStat CSci Statoo Consulting Morgenstrasse 129 3018 Berne Switzerland email kuonen@statoo.com @DiegoKuonen web www.statoo.info
  • 74. Selective list of references • Anscombe, F. J. (1973). Graphs in statistical analysis. The American Statistician, 27, 17–21. • Box, G. E. P. (1976). Science and statistics. Journal of the American Statistical Association, 71, 791–799. • Box, G. E. P. (1984). The importance of practice in the development of statistics. Technometrics, 26, 1–8. • Buja, A., Cook, D., Hofmann, H., Lawrence, M., Lee, E.-K., Swayne, D. F. & Wickham, H. (2009). Statistical inference for exploratory data analysis and model diagnostics. Philosophical Transactions of The Royal Society, A, 367, 4361–4383. • Cairo, A. (2013). The Functional Art — An Introduction to Information Graphics and Visualization. Berkeley, CA: New Riders. • Cairo, A. (2016). The Truthful Art: Data, Charts, and Maps for Communication. Berkeley, CA: New Riders. • Cleveland, W. S. (1993). Visualizing Data. Summit, NJ: Hobart Press. • Cleveland, W. S. (1994). The Elements of Graphing Data. Summit, NJ: Hobart Press. • Cook, D., Lee, E.-K. & Majumder, M. (2016). Data visualization and statistical graphics in big data analysis. Annual Review of Statistics and Its Application, 3, 133–159. • Cook. D. & Swayne, D. F. (2007) Interactive and Dynamic Graphics for Data Analysis: With Examples Using R and GGobi. New York: Springer. • Gelman, A. & Unwin, A. (2013). Infovis and statistical graphics: different goals, different looks (with Discussion). Journal of Computational and Graphical Statistics, 22, 2–49. • Heuer, R. J. (1999). Psychology of Intelligence Analysis. Center for the Study of Intelligence, U.S. Central Intelligence Agency. • Hoerl, R. W., Snee, R. D. & De Veaux, R. D. (2014). Applying statistical thinking to ‘big data’ problems. Wiley Interdisciplinary Reviews: Computational Statistics, 6, 222–232. • Kirk, A., Timms, S., Rininsland, Æ. & Teller, S. (2016). Data Visualization: Representing Information on Modern Web. Birmingham, UK: Packt Publishing. • Shneiderman, B. (1996). The eyes have it: a task by data type taxonomy for information visualizations. In Proceedings of the IEEE Symposium on Visual Languages. Los Alamos, CA: IEEE, 336–343.
  • 75. • Theus, M. & Urbanek, S. (2009). Interactive Graphics for Data Analysis: Principles and Examples. Boca Raton, FL: Chapman & Hall/CRC. • Thomas, J. J. & Cook, K. A., Eds. (2005). Illuminating the Path — The Research and Development Agenda for Visual Analytics. U.S. National Visualization and Analytics Center. • Tufte, E. R. (1990). Envisioning Information. Cheshire, CT: Graphics Press. • Tufte, E. R. (1997). Visual Explanations. Cheshire, CT: Graphics Press. • Tufte, E. R. (2001). The Visual Display of Quantitative Information (2nd edition). Cheshire, CT: Graphics Press. • Tukey, J. W. (1962). The future of data analysis. Annals of Mathematical Statistics, 33, 1–67. • Tukey, J. W. (1977). Exploratory Data Analysis. Reading, MT: Addison-Wesley. • Tukey, J. W. (1980). We need both exploratory and confirmatory. The American Statistician, 34, 23–25. • Tukey, J. W. & Wilk, M. B. (1966). Data analysis and statistics: an expository overview. In The Collected Works of John W. Tukey, L. V. Jones (Ed.). Monterey, CA: Wadsworth & Brooks/Cole, 549–578. • Unwin, A., Theus, M. & Hofmann, H. (2006). Graphics of Large Datasets: Visualizing a Million. New York: Springer. • Wallgren, A., Wallgren, B., Persson, R., Jorner, U., & and Haaland, J. (1996). Graphing Statistics & Data. London: Sage Publications. • Ware, C. (2008). Visual Thinking for Design. Morgan Kaufmann Publishers.
  • 76. Copyright c 2001–2017 by Statoo Consulting, Switzerland. All rights reserved. No part of this presentation may be reprinted, reproduced, stored in, or introduced into a retrieval system or transmitted, in any form or by any means (electronic, mechanical, photocopying, recording, scanning or otherwise), without the prior written permission of Statoo Consulting, Switzerland. Warranty: none. Trademarks: Statoo is a registered trademark of Statoo Consulting, Switzerland. Other product names, company names, marks, logos and symbols referenced herein may be trademarks or registered trademarks of their respective owners. Presentation code: ‘Dataviz.HKB.BFH.2017’. Typesetting: LATEX, version 2 . PDF producer: pdfTEX, version 3.141592-1.40.3-2.2 (Web2C 7.5.6). Compilation date: 07.07.2017.