Presentation given by David Leon, Professor of Epidemiology at the London School of Hygiene and Tropical Medicine in January 2012. Subsequently reused at various internal events
2. Sharing Data imperative
• Consensus among Research Funders (MRC, Wellcome
Trust, ESRC, NIH etc. etc.) that studies they fund should
make their data accessible to wider scientific
community
• This does NOT mean data is “dumped” onto the web
and made freely accessible without restriction
• Emphasis on establishing principle, mechanisms and
transparent procedures
• Underpinned by belief that replication, pooling and
“new ideas for old data” advance scientific knowledge
3. Perceived and real costs of data sharing
(the researchers perspective)
• Intellectual property
• Motivation of researchers (why bother if other
researchers feed off my life’s work)
• Huge time commitment to educate 3rd party
users and tell them about what data there is,
what’s its strengths and weaknesses are etc.
• My team’s time taken up providing datasets
and servicing endless requests and queries
4. LSHTM initiative on
research data (2009‐10)
David Leon (Chair)
Taane Clark, Paul Fine, Judy Green
(Faculty representatives )
Carolyn Lloyd (Librarian)
Victoria Cranna (Archivist)
Sheena Wakefield (NST)
7. Key recommendations
• Develop portal/gateway for discovery of key data assets;
• Develop web‐based resources for researchers;
• Develop policies/guidance on :
– obtaining appropriately wide consent to permit data sharing,
– maintenance of confidentiality and minimising risk of disclosure of identities,
– establishment of data access processes and model data sharing agreements,
– inclusion of adequate budget lines on new grant applications (reflected in pFACT),
– best practice and minimal standards for data documentation;
• Review institutional incentives for developing meta‐data;
• Review career pathways for information specialists;
• Introduce staff, taught course and doctoral training on
documentation and meta‐data – principles and practice;
• Encourage flag ship data sets/resources to develop high
standard meta‐data and access procedures
13. What is meta‐data ?
• It’s data about data
• Two levels :
– Study‐level description
• Setting, numbers of subjects, endpoints, exposure
variables, biological samples collected
• Who to contact to find out more etc.
– Variable level description
• Instrument/questionnaire used
• Frequencies/means/missing values
• Comments on validity/utility
14. Web‐based applications provide
high level of functionality
• Enables discovery of data
• Easy to navigate (hyperlinks are great
strength)
• Can combine access to meta‐data with
documentation of instruments including
protocols, questionnaires etc.
• As appropriate allows “drill‐down” from study
level to variable‐level
15. Up‐sides of good (web‐based)
variable‐level meta‐data
• Facilitate analysis by existing researchers
• Reduce induction of new researchers in own
group or visitors
• Reduce costs of providing data to bona fide 3rd
party researchers
• Easy to edit, add to and update
• Can have “shopping‐basket” facility
16. Down‐sides of web‐based variable‐
level meta‐data
• Requires investment
– Funders claim they will pay
• Not appropriate for all studies (scale, duration
of future use)
• Lack of clarity about best platform
– DDI3 – open source looks very promising
• Limited experience at LSHTM
22. Research Data Management
Steering Group
• Established January 2012 by the Senior
Leadership Team
• Chaired by Professor Anne Mills (Deputy
Director Research)
• Time limited
• Links to other initiatives (eg LSHTM Research
Online)
• Resources obtained from WT (> Gareth Knight)