Hughes, V. and Wormald, J. (2018) Data sharing: solutions. Paper presented at WYRED Project: Data Sharing Satellite Event, University of Huddersfield. 2 August 2018.
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Data sharing: solutions
1. WYRED Project: Data Sharing Satellite Event
2nd August 2018
Data Sharing: Solutions
Vincent Hughes and Jessica Wormald
2. Re-cap
• Increasing focus on data sharing across sciences
(‘open science’)
• Especially useful for sociolinguistics…
oexpands scale of analyses (reveal subtle effects)
oreplication
• …and forensics
ofor empirical estimation of typicality/ validation
omove away from experience-based approaches
• Importance of collaboration
2
3. What is data?
• Anything is useful (and better than nothing):
oknowledge
oliterature
orecordings
oraw data
oplatforms/ scripts for data extraction
ocode (statistical modelling)
3
4. Solutions: Quantitative
General
• Recordings / raw data sharing
• Online platforms – transcriptions + recordings
Forensics
• Different types of features
• Different types of analysis
• Forensic-friendly data collection
4
9. WikiDialects
• How would it be used?
• Casework in forensics – assessing typicality
• Research by academics – useful resource for finding out
what is out there / sharing research
• But other beneficiaries / users too
• Speech and language therapy – assessing typicality
• L2 English studies – understanding variation
• Students – resource for studies
• Lay audience – general interest
9
Platforms:
SLAAP, ONZE, FAVE suite, SPADE
Forced alignment
Searching for internal and external sources of variation
Easy extraction of large amounts of data
Continually updated (longitudinal resource)
Forensic-specific features:
e.g. MFCCs/ LTFDs from across an entire recording
means of summarising data easily
incorporating large-scale analyses into case reports
Forensic-specific collection methods
capture more real world variation
forensically realistic conditions
e.g. multiple recordings per speaker, technical factors…
Platforms:
SLAAP, ONZE, FAVE suite, SPADE
Forced alignment
Searching for internal and external sources of variation
Easy extraction of large amounts of data
Continually updated (longitudinal resource)
Forensic-specific features:
e.g. MFCCs/ LTFDs from across an entire recording
means of summarising data easily
incorporating large-scale analyses into case reports
Forensic-specific collection methods
capture more real world variation
forensically realistic conditions
e.g. multiple recordings per speaker, technical factors…
Combined qualitative and quantitative resource – qualitative collation of research as a ‘first port of call’
Community in this case = researchers interested in accents (sociolinguists, forensic speech scientists, speech and language therapists)
Not – lay individuals with a view on language patterns (‘the youth of today can’t say th anymore – they all think it’s f !)
Academics from across disciplines (e.g. sociolinguistics, phonetics) could contribute descriptions of linguistic features for different regional and social groups
Could use the repository to search about features in a given region to find out about what’s been done about that feature – e.g….
Select feature and region
Pulls up resources (and summaries) on that feature – resources could be project platforms, larger platforms (e.g. SPADE), individual publications, academic website / blog
Would include metadata (how many speakers, how old were they?)
Provides a quick starting point to find out info available (useful for all – caseworkers / researchers / students)
Investment in the enterprise
requires us to explain the importance
both theoretical and practical benefits
Becoming part of ‘standard practice’
General
Forensics
Understandable concerns about how data is used (especially where the term forensic is used)
Difficult to get funding for tools (unless some specific call from RCs)
Especially if no direct research question associated
Impact funding is one potential avenue
Also need funding for long-term maintenance of resources
certainly for data platforms
maybe less so for a wiki (?)