Slides for my talk on our paper titled "Enhancing a Social Science Model-building Workflow with Interactive Visualisation by Turkay, C., Slingsby, A., Lahtinen, K., Butt, S., & Dykes, J., presented at ESANN 2016 in Brugge on April 2016." The talk gives the details of our collaborative work as a team of social scientists and visualisation researchers investigating novel ways to improve the model building process through interactive approaches. Related publication can be found on this link: http://openaccess.city.ac.uk/14232/
The dark energy paradox leads to a new structure of spacetime.pptx
Enhancing Social Science Models with Interactive Visualization
1. Enhancing a Social Science
Model-building Workflow with
InteractiveVisualisation
CagatayTurkay, Aidan Slingsby,
Kaisa Lahtinen, Sarah Butt and Jason Dykes
giCentre & Centre for Comparative Social Surveys at City University London
ESANN 2016, 29 April 2016
2. “We (social scientists) need (data-based)
models that we can understand and
explain so that we can defend them to
our peers in full confidence.”
A quote that motivates this work (from collaborators within our AddResponse project)
Image from: Lahtinen, K. et al. (2015). Informing
Non-Response Bias Model Creation in Social
Surveys with Visualisation. Poster VIS 2015
3. Numerical models to predict phenomena or, act as a
simulation of the phenomena being investigated
Good predictive power is often desired in models, BUT, (in
some fields) explanatory power is also crucial (Shmueli, 2010 for a detailed
[*] Shmueli, Galit. "To explain or to predict?." Statistical science (2010): 289-310.
discussion)
4.
5. AddResponse Project -- https://blogs.city.ac.uk/addresponse/
… utilise organically generated auxiliary data (from commercial
transactions, public administration and other sources) to understand propensity
to respond and eventually tackle nonresponse bias (i.e.,
respondents differ from nonrespondents ).
6. AddResponse - Details
• European Social Survey (ESS) UK 2012 - 13
• 4,520 households
• linked to auxiliary data from:
• administrative sources
• commercial consumer profiling
• open-source data
• 401 auxiliary variables
• 32 survey response variables
(only for the respondents)
e.g., Proportion
of house
sharing adults
e.g., Sports
facilities
within walking
distance
7.
8. Existing workflow
• Iteratively add and/or removing variables from a
logistic regression model
• Assess the changes through model fitness metrics
(e.g.,AIC, McFadden)
• Put up a sticker !
• Highly manual but involved!
9. Key roles for interactive visualisation
• Incorporating Theory
• Exploring variables
• Interactively building models
• Considering Geography
• Recording the model-building process, i.e., provenance
VarXplorer ModelBuilder
12. Exploring variables – 1: Investigate Covariation
- Compute pairwise correlation within all
401 variables
- Use this as a distance matrix and
project to 2D (using MDS)
- Visualise on a scatterplot where each
point is a variable
13. Exploring variables – 2: Correlation with indicators
- Compute correlations within all 32
response variables + response rate
- Use this as meta-data on variables to
check whether they relate to indicators
14. Incorporating Theory-related data
- Associate variables to social-science
concepts and theory
- Concepts relate to theories
- Variables act as proxies for concepts
- Use these as meta-data on variables
and visualise through histograms
Concepts, e.g.,
deprivation or quality
of life
Theories, e.g., social
isolation or social
disorganisation
17. Interactively building models & evaluating them
- R scripts are called with the variable
selections and the variable to predict
(response or ESS variable)
- Quality metrics (AIC, McFadden) &
variables weights visualised
Interactive model building
also in VarXplorer
with variable weights
19. Model provenance & annotations
- Save and analyse the model-building
trail
- Mark dead-ends and good models
- Attach notes to models
20. A brief example of the modelling process
1. Select two
concepts ,
economic
circumstances and
quality of life
21. A brief example of the modelling process
2. Select variables
that are distinct
and relevant
22. A brief example of the modelling process
3. Select variables
that correlate
with an ESS
indicator
(happiness)
3.1 Observe that
they relate to
“Social Isolation”
23. A brief example of the modelling process
4. Use these variables as a
starting point, check local
variations and plug into
existing scripts
4.1 Model performs
“better” in South-East UK
and in Greater London
24. Lessons learned
• Enhanced analysis through informed use of computation
• Interactive visual methods improve reliability and
interpretability
• Improved trust in models
• Tight integration enables quick hypothesis prototyping
• Important to communicate the certainty of the findings
25. Looking into the future
• Explanatory models not only predictive models
• Incorporating more complex methods (already
incorporated random forests)
• Other ways to make models more accessible?
• Use models & findings as scientific evidence ?
26. Acknowledgments
• giCentre team @ City
• ADDResponse project funded by the UK Economic
and Social Research Council (grant ES/L013118/1)