sience 2.0 : an illustration of good research practices in a real study

science 2.0
an illustration of good research
practices in a real study
wolf vanpaemel kortrijk, march 9 2015

Why can we definitively say that? Because psychology often does not meet the
five basic requirements for a field to be considered scientifically rigorous:
clearly defined terminology, quantifiability, highly controlled experimental
conditions, reproducibility and, finally, predictability and testability.

- mundane 'regular' misbehaviours present greater threats to the scientific
enterprise than those caused by high-profile misconduct cases such as
fraud.
- first assessment of questionable research practices (QRP)
- 2002 assessment: NIH funded research
1768 mid-career (52% response rate)
1479 early-career(43% response rate)

- first assessment of QRP in psychology
- 2155 respondents (36% response rate)

the problems of QRP are widespread, and have very severe
consequences
why is that the case?
“never attribute to malice what can be adequately explained
by incompetence”
the main reasons are lack of guidelines, and the high
publication pressure

i’m not interested in fraud (e.g., diederik stapel who made
up his own data)
preventing fraud requires a different approach

a new way of doing science that aims to increase the
confidence in research results
not one, single, coherent whole

a demonstration of science 2.0 with a real study
reference:
Steegen, S., Dewitte, L., Tuerlinckx, F., & Vanpaemel, W.
(2014). Measuring the crowd within again: A pre-registered
replication study. Frontiers in Psychology, 5, 786, 1-8.
doi:10.3389/fpsyg.2014.00786
paper:
http://ppw.kuleuven.be/okp/_pdf/Steegen2014MTCWA.pdf
OSF page:
https://osf.io/ivfu6/

based on some recommendations on good research practices made in the
literature

based on some recommendations on good research practices made in the
literature
• not exhaustive
• non-directive examples
• for inspiration
most recommendations can be implemented separately from each other
• not an all or none package deal

crowd within effect (vul & pashler, 2008)
• averaging multiple guesses from one
person provides a better estimate than
either guess alone

crowd within effect (vul & pashler, 2008)
• averaging multiple guesses from one
person provides a better estimate than
either guess alone
experiment
• 8 general knowledge questions
e.g., what percent of the world's roads
are in India?
• guess 1
guess 2

1. replication
2. registration
3. high power
4. bayesian statistics
5. alpha level
6. estimations
7. co-pilot multi-software approach
8. distinction between confirmatory and exploratory analyses
9. open science
what? how? why?
features of science 2.0
before data collection
after data collection/during data analysis
after data analysis

replication
what?
do the same, following the experimental
and analytical procedure as closely as
possible
 direct replication study

replication
what?
things can never always the same
 indicate the known differences

replication
how?
communicate with the original authors; ask information; and
feedback
ideal for masterproef
not much focus on creativity but more on skill building

replication
why?
- lots of variability between studied phenomena
- lots of variability between labs/replications
- what can we learn from a single study?

registration
what?
we specified all research details before data
collection

registration
what?
collection
data collection
• sample size planning (stopping rule; see
below)

registration
what?
collection
data collection
below)
• recruitment: how to recruit participants
(e.g., pool)

registration
what?
collection
data collection
below)
(e.g., pool)
data analysis
• data cleaning plan (when to delete data)

registration
what?
collection
data collection
below)
(e.g., pool)
data analysis
• analysis plan

registration
what?
collection
data collection
below)
(e.g., pool)
data analysis
• analysis plan
- which exact hypotheses to test

registration
what?
collection
data collection
below)
(e.g., pool)
data analysis
• analysis plan
- which variables to use

registration
what?
collection
data collection
below)
(e.g., pool)
data analysis
• analysis plan
- analyses for testing the hypotheses

registration
what?
collection
data collection
below)
(e.g., pool)
data analysis
• analysis plan
- analyses for testing the hypotheses
• code for the analyses

registration
what?
collection
experimental details (optional)
• experimental materials
- stimuli (questions)
- exact instructions

registration
what?
collection
experimental details (optional)
• experimental materials
- stimuli (questions)
- exact instructions
• experimental procedure
- randomization etc

registration
how?
• Registered Report
- new format of publishing
- review prior to data collection
- accepted papers then are (almost)
guaranteed publication if the authors
follow through with the registered
methodology
 AIMS Neuroscience; Attention,
Perception & Psychophysics; Cortex;
Drug and Alcohol Dependence;
Experimental Psychology, Frontiers in
Cognition; Perspectives on Psychological
Science; Social Psychology; …

registration
how?
• Registered Report
• “independent” pre-registration
e.g., Open Science Framework (OSF)
- open source software project
- free

registration
why?
prevent readers from thinking you might have exploited your
researchers degrees of freedom
extreme flexibility in
• data collection
• eg data peeking
• data analysis
• what is an outlier ?
• when to add covariates ?
• when to transform the data ?
• reporting
• did you report all variables, conditions, experiments, analyses
?

registration
why?
prevent readers from thinking you might have exploited your
exploiting researchers degrees of freedom can lead to an increase in
false positives
-- without adjustment, a true hypothesis will always be
rejected if sampling continues long enough
if you can convince readers that you didn’t exploit the researchers
degrees of freedom, they will put more confidence in your result; it
will be seen as more trustworthy

high power
what?
among the decisions you have to make and
register in advance is when you’ll stop
collecting data
our stopping rule was based on fixing the
sample size
fixing the sample size was based on a
power calculation
power = P(reject null hypothesis | null
hypothesis is false)

high power
what?
as far as constraining the researchers
degrees of freedom is concerned, low power
is as good as high power
we aimed for high power (95%)

high power
how?
compute sample size needed to achieve
desired power level
- given the statistical test
- given the significance level
- given the effect size (e.g., based on previous
studies)

high power
how?
compute sample size needed to achieve
desired power level
- given the statistical test
- given the significance level
- given the effect size (e.g., based on previous
studies)
G*Power, R packages (pwr), …

high power
why?
• low power reduces the probability of discovering effects that are
there
• low power reduces the probability that a significant result reflects a
true effect (button et al., 2013)
• low power leads to an inflation of estimated effect sizes
• only overestimates will be significant

there are other stopping rules!
sources for how to do decide when to stop collecting data
-when I have a participant with the name of my mother
-availability
---when the day/testweek is over
-when I have a fixed number of participants
---100
--- based on power calculations
--- based on accuracy in parameter estimation

in general, the most important thing is that you do it, more
than how to do it
all these stopping rules are equally valid to constrain the
but some will lead to better, research than other
---more informative
---more precise and less biased estimates of e.g.
effect size

NHST & Bayesian testing
what?
we did not just use Null Hypothesis
Significance Testing (NHST i.e. p-values) but
also Bayes factors (the p-value of Bayesian
statistics)
the core of bayesian statistics is bayes’ rule
𝑝 𝑎 𝑏 =
𝑝 𝑏 𝑎 𝑝(𝑎)
𝑝(𝑏)
bayes treats probabilities as degrees of
belief

what?
we can use bayes to compute the belief in
our hypothesis H, given the data d
𝑝 𝐻 𝑑 =
𝑝 𝑑 𝐻 𝑝(𝐻)
𝑝(𝑑)
bayes rule tells us how we should update
our belief about H after observing data

how?
• several online tools (e.g., Rouder’s
website)
• BayesFactor package in R (Morey &
Rouder, 2014)

why?
• p(H|d) seems exactly what science
needs
• evidence for null hypothesis
• intuitive to interpret
• consistent: correct answer in large
sample limit
• exact for small sample size
• clear interpretation of evidence
• based on the observed data, not on
hypothetical replications of experiments

probabilityofH1
1
.99
.97
.90
.75
.50

NHST & estimation
what?
we did not just use p-values and Bayes
factors but also effect size estimates and
their confidence intervals
how?
Matlab, R, SPPS, ESCI (Cumming, 2013), …
why?
diverts focus from the presence of an effect
to the more informative size of an effect
and its precision

co-pilot multi-software approach
what/how?
• two people independently processed and
analyzed the same data …
• … using different software (MATLAB,
SPSS)
why?
decreases the likelihood of errors
errors are easily made:
50% of published papers in psychology
contain reporting errors (bakker &
wicherts, 2011)
e.g, error sample size planning (G*Power)

2.8 distinguish between confirmatory and
exploratory

clear distinction between confirmatory and
exploratory (post hoc) analyses
what?
we indicated whether the analyses where
specified before seeing the data, or based
on the data (see registration)
how?
be transparent
easy when having registered
why?
you still want to report analyses you
thought about too late! they can be useful
for generating hypotheses

open science
what?
we made our full research output
publicly available to everybody
- experimental materials (stimuli,
questionnaire items, instructions, and so
on)
- raw data
- processed data
- code for data processing
- code for confirmatory analyses
- code for post-hoc analyses
- paper

open science
how?
Open Science Framework (public)
-online repository
-free
-under development
goal: share and find research materials
make study materials (experimental
material, data, code, …) public so that
other researchers can find, use and cite
them
several other sharing possibilities

open science
how?
Open Science Framework (public)
make sure OSF is not the only place
where your stuff is!
who knows what will happen with these
servers in 20 years?
unclear what the best data format is

open science
why?
• the current standards of what is
considered research output (paper with
summary statistics and conclusion) are
not inspired by desiderata for good
science, but rather by arbitrary and
outdated technical constraints (paper +
publishing costs)
•if we would start doing science right
now, in the computer and internet age,
we would probably set a completely
different standard

open science
why?
• facilitates
- replication studies
- follow up studies (e.g., use same
stimuli)
- new or re-analyses
- meta-analyses
- accumulation of scientific
knowledge
- detection of errors or fraud
• yields useful teaching material

open science
why?
• increases visibility
• increases citability
• decreases number of emails about
experiments, data or analyses, …
• is a moral obligation to tax payer
(publicly funded research is a public
good)

1. replication
2. registration
3. high power
4. bayesian statistics
5. alpha level
6. estimations
7. co-pilot multi-software approach
8. distinction between confirmatory and exploratory analyse
9. open science
what? how? why?
why not?
features of science 2.0
before data collection
after data collection/during data analysis
after data analysis

replication
why not
-it is impossible!
---things can never always the same (e.g. population)
---the details of the original study are lost (e.g., which questions
used in a post experimental interview)
-it is a waste of time and resources!
---should we value novelty more than truth?
-it is not good for my career
---can I publish this?

registration
why not?
• it takes time, thought and effort
• it is harder than it seems!
• writing the code help a lot
• exploration might be the only possibility
• domain specific (qualitative studies? complex studies?)

high power
why not?
• can be hard to guess expected effect size or trust published effect
size
• often requires large sample size
• collaborate!
• restricted to NHST framework

Bayes it
why not?
• priors
• education?
• Bayes factors are hard to compute

Bayes it
why not?
• priors
• education?
• Bayes factors were are hard to compute

Open up
why not?
sharing data takes time
sharing data might jeopardize a potential future publication
but: embargo period

Other
(co-pilot, alpha, confirmation vs exploration, estimation)
why not?
lack of education
old habits
takes time and is not rewarded

this illustration used a very simple study
• replication study
• easily administered 8-item questionnaire
• basic t test
this made pre-registration, sample size planning, high power,
estimation, bayesian statistics, sharing protocol, code and data,
co-pilot multi software, etc probably much easier than in most
other studies
but everything is also possible (though harder) for non-
replication studies!
feasibility will depend on the type and scope of your research

science 2.0 is no package deal
---you can register, but not share
---you can share, but not use bayes
some practices are graded
--- you can register without code
--- you can estimate without reporting CI

• the (psychological) literature is littered with spurious
findings
• which results can you trust?
– has this result been replicated?
– did the researchers exploit their researchers degrees of
freedom?
– is the evidence based on NHST with a liberal alpha level?
– was the analysis correct (e.g., at least, check dfs; better do
the analysis yourself with the shared data and code)
– ???

3.4 is there a crowd within effect?

Is there a crowd within effect?
successful replication
• error guess 1 > error average
• error guess 2 > error average

sience 2.0 : an illustration of good research practices in a real study

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to sience 2.0 : an illustration of good research practices in a real study

Similar to sience 2.0 : an illustration of good research practices in a real study (20)

Recently uploaded

Recently uploaded (20)

sience 2.0 : an illustration of good research practices in a real study

Editor's Notes