SlideShare ist ein Scribd-Unternehmen logo
1 von 29
never trust a

scientist

datajournalist

dataset

	
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
missing data, no value stored	
“I need to solve this”	
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
missing data, no value stored	
“I need to solve this”	
missing data, no value stored	
“I need to write a story about this”	
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
forreporters.com/andrew-lehren/	
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
scientist to journalist: “You twist everything”	
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
journalist to scientist: “Your articles are useless”	
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
 
	
  
“I am right”	
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
can I trust (and use) this dataset?	
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
“Trustworthiness and data
management are vital to the success of
qualitative studies … There is a lack of
scientific literature regarding the
structures and processes for managing
large qualitative data sets.”	
	
(White, Oelken, Friesen, 2012)	
	
	
   Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
“A simple answer to objective reporting
is the kind of reporting that uses relevant
and reliable sources which is not bias or
slanted to a certain party.”	
	
Ibrahim, Pawanteh, Kee (2011)	
	
	
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
question:	
how to validate	
a dataset?	
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
check the data source	
	
what are his/her/its intentions?	
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
what is the citation index	
of the data owner?	
	
	
do other journalists	
cite the data owner?	
	
	
   Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
check the data	
	
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
benefit	
	
do I need this?	
	
	
	
do I need to use it?	
	
	
  
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
check	
	
data gathering?	
	
	
	
clarification of the data?	
	
	
  
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
check	
	
data gathering? 	
is this correct?	
	
	
clarification of the data?
do I understand?	
	
	
   Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
missing data	
	
what is wrong? 	
	
	
	
what is the story?	
	
  
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
missing data	
	
what is wrong? 	
I need to solve	
	
	
what is the story?	
I need to write	
	
  
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
trouble?	
	
TEST!	
	
	
	
CALL!	
	
  
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
I need more sources! (do I?)	
	
give me data	
	
	
	
give me humans	
	
  
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
I need more sources! (do I?)	
	
give me data	
check consistency	
	
	
give me humans	
check my story	
	
  
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
same steps	
different interpretation	
	
  
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
“Dear datajournalist,	
	
Please take a look at the
research method yourself
and act a bit more like a
scientist.”	
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
“Dear scientist,	
	
Try to avoid intellectual
arrogance. There are
other people who are just
as smart.”	
	
   Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
“practice what you preach”	
	
  
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
scientists	
check the
source
(citation)	
check the
data	
check
benefit	
check data
gathering	
TEST!	
more data
sources	
data journalists	
check the
source
(citation)	
check the
data	
check
benefit	
check
clarification	
CALL!	
more
human
sources	
Tilburg	
  University	
  -­‐	
  data	
  journalism	
  
@Hillevanderkaa	
Tilburg University

Weitere ähnliche Inhalte

Ähnlich wie How to validate a dataset? Six steps.

Ethics and Privacy in the Application of Learning Analytics (#EP4LA)
Ethics and Privacy in the Application of Learning Analytics (#EP4LA)Ethics and Privacy in the Application of Learning Analytics (#EP4LA)
Ethics and Privacy in the Application of Learning Analytics (#EP4LA)
Hendrik Drachsler
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
Thinkful
 

Ähnlich wie How to validate a dataset? Six steps. (20)

Etmaal
EtmaalEtmaal
Etmaal
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
 
Open Data Journalism
Open Data JournalismOpen Data Journalism
Open Data Journalism
 
BioASQ and BDE in SC1.1
BioASQ and BDE in SC1.1BioASQ and BDE in SC1.1
BioASQ and BDE in SC1.1
 
Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1
 
Watching the workers: researching information behaviours in, and for, workplaces
Watching the workers: researching information behaviours in, and for, workplacesWatching the workers: researching information behaviours in, and for, workplaces
Watching the workers: researching information behaviours in, and for, workplaces
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social Sciences
 
Data Journalism - Introduction
Data Journalism - IntroductionData Journalism - Introduction
Data Journalism - Introduction
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
 
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
 
Science as an Open Enterprise – Geoffrey Boulton
Science as an Open Enterprise – Geoffrey BoultonScience as an Open Enterprise – Geoffrey Boulton
Science as an Open Enterprise – Geoffrey Boulton
 
Data science and good questions eric kostello
Data science and good questions eric kostelloData science and good questions eric kostello
Data science and good questions eric kostello
 
Ethics and Privacy in the Application of Learning Analytics (#EP4LA)
Ethics and Privacy in the Application of Learning Analytics (#EP4LA)Ethics and Privacy in the Application of Learning Analytics (#EP4LA)
Ethics and Privacy in the Application of Learning Analytics (#EP4LA)
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
 
Critical issues in the collection, analysis and use of student (digital) data
Critical issues in the collection, analysis and use of student (digital) dataCritical issues in the collection, analysis and use of student (digital) data
Critical issues in the collection, analysis and use of student (digital) data
 
Data Science-1 (1).ppt
Data Science-1 (1).pptData Science-1 (1).ppt
Data Science-1 (1).ppt
 
Data sharing in the age of the Social Machine
Data sharing in the age of the Social MachineData sharing in the age of the Social Machine
Data sharing in the age of the Social Machine
 
How is Data Made? From Dataset Literacy to Data Infrastructure Literacy
How is Data Made? From Dataset Literacy to Data Infrastructure LiteracyHow is Data Made? From Dataset Literacy to Data Infrastructure Literacy
How is Data Made? From Dataset Literacy to Data Infrastructure Literacy
 
An Obligatory Introduction to Data Science
An Obligatory Introduction to Data ScienceAn Obligatory Introduction to Data Science
An Obligatory Introduction to Data Science
 
Data science and ethics in fundraising
Data science and ethics in fundraisingData science and ethics in fundraising
Data science and ethics in fundraising
 

Mehr von Hille van der Kaa MA MBA

Brand storytelling introduction @iemes fontys
Brand storytelling   introduction @iemes fontysBrand storytelling   introduction @iemes fontys
Brand storytelling introduction @iemes fontys
Hille van der Kaa MA MBA
 

Mehr von Hille van der Kaa MA MBA (11)

Er was eens... een goed ondernemersverhaal
Er was eens... een goed ondernemersverhaalEr was eens... een goed ondernemersverhaal
Er was eens... een goed ondernemersverhaal
 
Robot Reporters or Human Journalists: Who Do You Trust More?
Robot Reporters or Human Journalists: Who Do You Trust More?Robot Reporters or Human Journalists: Who Do You Trust More?
Robot Reporters or Human Journalists: Who Do You Trust More?
 
Storytelling in a digital age - challenges of a Data Journalist
Storytelling in a digital age - challenges of a Data JournalistStorytelling in a digital age - challenges of a Data Journalist
Storytelling in a digital age - challenges of a Data Journalist
 
Location based Apps for journalists
Location based Apps for journalistsLocation based Apps for journalists
Location based Apps for journalists
 
Brand storytelling introduction @iemes fontys
Brand storytelling   introduction @iemes fontysBrand storytelling   introduction @iemes fontys
Brand storytelling introduction @iemes fontys
 
'Happiness on 13'
'Happiness on 13''Happiness on 13'
'Happiness on 13'
 
The Rise of Guerilla Journalism - and the implications for journalism education
The Rise of Guerilla Journalism - and the implications for journalism educationThe Rise of Guerilla Journalism - and the implications for journalism education
The Rise of Guerilla Journalism - and the implications for journalism education
 
Toekomst Van Media
Toekomst Van MediaToekomst Van Media
Toekomst Van Media
 
Storytelling
StorytellingStorytelling
Storytelling
 
Keynote Syntens 'Crossmediaal in 2010'
Keynote Syntens 'Crossmediaal in 2010'Keynote Syntens 'Crossmediaal in 2010'
Keynote Syntens 'Crossmediaal in 2010'
 
Keynote Syntens 'Crossmediaal in 2010'
Keynote Syntens 'Crossmediaal in 2010'Keynote Syntens 'Crossmediaal in 2010'
Keynote Syntens 'Crossmediaal in 2010'
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

How to validate a dataset? Six steps.

Hinweis der Redaktion

  1. NameWork at university – work as a writer / data journalistSomewhere in between – I do research something with a scientific goals and soething with a journalustic aim
  2. If you are in between – it is interesting that the worlds of social science and datajournalism in the field are sometimes really different – but sometimes notIf we take fo example this dataset – which is the dataset Andrew Lehren from te New York Times used in Pullitzer prize winner story about the New York Marathon you can see a blind spot
  3. … if a scientist sees this, in gereneral his first reponse it that the dataset is technically not right. There us some missing data. A problem which needs to be solved
  4. While, if a journalist sees a white spot, he is really interested in the story behind the missing data. Why is the data missing?
  5. In this case, both appriaches were all right; some runners missed checkpointBut also some technical flaws
  6. If I talk about journalists with scientists not always as ethustaistic as they could be- They can’t de al with data – they use data in a superficial
  7. Journalists – scietists are really egocentric – and their stories are not useful for the real world. They just do research to please themselves and their collegues at university
  8. At least o eon thing they agree; they assume they aee both right
  9. Because I live in both worlds, I am interested to see the real differences or notAnd one of the differences or not, is how scnetists as well astdatajournalists decide if they trust and use a dataset or not. And what I would like to discuss today is really just a startig point of this topic
  10. So if you dig into the literature of the trustworthiness of data from the perspective of a scientists – you will find a broad variety of articles in different different scietif field. Anf it’s not easy to dtect a specific line in the ariety of articles n all these different field. And there is a lack in specific guidelines how scinetists determine the trustworthiness a scientist
  11. And if you readscientifartciles about what makes a datasettrustworthy for journalists – you will find nothinhYou will only find general readings about the trustwothiness of a news source and general. Like the main principles of Gans. And a dataset could simply be one of these news sources. But on a literature level. Its is hard to compare
  12. So, with no clear starting oint, it seemed right to start with a very general question. And that’s what I did. I asked ten of me scirntif as well a
  13. Are the intentions of any influence on the dataset?
  14. So they both use their collegues as peers
  15. Using a dataaet from another source is not really common in social science -
  16. Experiments – case study