Enhancing Social Science Models with Interactive Visualization

Enhancing a Social Science
Model-building Workflow with
InteractiveVisualisation
CagatayTurkay, Aidan Slingsby,
Kaisa Lahtinen, Sarah Butt and Jason Dykes
giCentre & Centre for Comparative Social Surveys at City University London
ESANN 2016, 29 April 2016

“We (social scientists) need (data-based)
models that we can understand and
explain so that we can defend them to
our peers in full confidence.”
A quote that motivates this work (from collaborators within our AddResponse project)
Image from: Lahtinen, K. et al. (2015). Informing
Non-Response Bias Model Creation in Social
Surveys with Visualisation. Poster VIS 2015

Numerical models to predict phenomena or, act as a
simulation of the phenomena being investigated
Good predictive power is often desired in models, BUT, (in
some fields) explanatory power is also crucial (Shmueli, 2010 for a detailed
[*] Shmueli, Galit. "To explain or to predict?." Statistical science (2010): 289-310.
discussion)

AddResponse Project -- https://blogs.city.ac.uk/addresponse/
… utilise organically generated auxiliary data (from commercial
transactions, public administration and other sources) to understand propensity
to respond and eventually tackle nonresponse bias (i.e.,
respondents differ from nonrespondents ).

AddResponse - Details
• European Social Survey (ESS) UK 2012 - 13
• 4,520 households
• linked to auxiliary data from:
• administrative sources
• commercial consumer profiling
• open-source data
• 401 auxiliary variables
• 32 survey response variables
(only for the respondents)
e.g., Proportion
of house
sharing adults
e.g., Sports
facilities
within walking
distance

Existing workflow
• Iteratively add and/or removing variables from a
logistic regression model
• Assess the changes through model fitness metrics
(e.g.,AIC, McFadden)
• Put up a sticker !
• Highly manual but involved!

Key roles for interactive visualisation
• Incorporating Theory
• Exploring variables
• Interactively building models
• Considering Geography
• Recording the model-building process, i.e., provenance
VarXplorer ModelBuilder

Prototype-1:VarXplorer
Co-variation plot
Correlations with
indicators
Theory-related
meta-data
Interactive
modelling

Link to the Video: http://goo.gl/XNiOIX

Exploring variables – 1: Investigate Covariation
- Compute pairwise correlation within all
401 variables
- Use this as a distance matrix and
project to 2D (using MDS)
- Visualise on a scatterplot where each
point is a variable

Exploring variables – 2: Correlation with indicators
- Compute correlations within all 32
response variables + response rate
- Use this as meta-data on variables to
check whether they relate to indicators

Incorporating Theory-related data
- Associate variables to social-science
concepts and theory
- Concepts relate to theories
- Variables act as proxies for concepts
- Use these as meta-data on variables
and visualise through histograms
Concepts, e.g.,
deprivation or quality
of life
Theories, e.g., social
isolation or social
disorganisation

Prototype-2: ModelBuilder
Variable selection
Model provenance
Interactive modelling
(through R)
Model quality
metrics

Prototype-2: ModelBuilder
Link to the Video: http://goo.gl/itUlm2

Interactively building models & evaluating them
- R scripts are called with the variable
selections and the variable to predict
(response or ESS variable)
- Quality metrics (AIC, McFadden) &
variables weights visualised
Interactive model building
also in VarXplorer
with variable weights

Considering Geography
- Facet data (geographically) into 12 regions
- Build local models
- Evaluate locally

Model provenance & annotations
- Save and analyse the model-building
trail
- Mark dead-ends and good models
- Attach notes to models

A brief example of the modelling process
1. Select two
concepts ,
economic
circumstances and
quality of life

2. Select variables
that are distinct
and relevant

3. Select variables
that correlate
with an ESS
indicator
(happiness)
3.1 Observe that
they relate to
“Social Isolation”

4. Use these variables as a
starting point, check local
variations and plug into
existing scripts
4.1 Model performs
“better” in South-East UK
and in Greater London

Lessons learned
• Enhanced analysis through informed use of computation
• Interactive visual methods improve reliability and
interpretability
• Improved trust in models
• Tight integration enables quick hypothesis prototyping
• Important to communicate the certainty of the findings

Looking into the future
• Explanatory models not only predictive models
• Incorporating more complex methods (already
incorporated random forests)
• Other ways to make models more accessible?
• Use models & findings as scientific evidence ?

Acknowledgments
• giCentre team @ City
• ADDResponse project funded by the UK Economic
and Social Research Council (grant ES/L013118/1)

Thank you !
Cagatay.Turkay.1@city.ac.uk
@cagatay_turkay
http://staff.city.ac.uk/cagatay.turkay.1/
https://blogs.city.ac.uk/addresponse/
http://www.gicentre.net/
!!We are hiring !!
* Researcher in visualisation of cyber-security data
(H2020 funded RIA)
* PhD studentships
Deadlines in late May and June
check giCentre.net

Enhancing Social Science Models with Interactive Visualization

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (12)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Enhancing Social Science Models with Interactive Visualization

Ähnlich wie Enhancing Social Science Models with Interactive Visualization (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Enhancing Social Science Models with Interactive Visualization