SlideShare ist ein Scribd-Unternehmen logo
AI and investigative journalism
Josh Nicholas
Data journalist
The Guardian
Agenda
● Introduction
● What is AI
○ Different forms
○ More than a black box
● Case studies
○ Extracting useful info from text
○ Fuzzy matching between datasets
○ Finding a needle in a haystack
● Homework
● Q + A
Code for all examples is on my Github
More resources in HANDOUT
After the session:
● Recording
● Handout
● Homework in our LinkedIn Group
● LINK to join
What is AI?
1
● Many AI terms are used
interchangeably
● We are going to focus on machine
learning models
● These are algorithms that can learn
their own rules from data
Artificial intelligence is catch-all
This graphic was adapted from Build a Large Language Model by Sebastian Raschka
What are ‘rules’?
Learning from the data
● Machines are great at identifying patterns that aren’t obvious to humans
● Given some examples to learn from, an algorithm can find more
AI and newsgathering
● Machine-learning algorithms are trained on large datasets
○ They can be fine-tuned on smaller datasets
● They are useful for “fuzzy” problems, when it’s hard to write explicit
rules/instructions
● You can access many pre-trained algorithms for free e.g.
○ Huggingface.co
○ Google, OpenAI, Mistral, Facebook etc.
and…
● If we can’t find an algorithm that fits our purpose, we can fine-tune an existing one
Examples we can steal from borrow
• Email spam filters
• Recommendation systems (Netflix, Spotify etc.)
• Language translation
• Audio transcription
• Facial recognition
• Object detection
• Predictive text
• Search engines
■ Google BERT etc.
Case studies
2
1) Extraction
The problem:
● Extracting names, locations and
dollar amounts from thousands of
text documents:
○ 34k+ Facebook posts
○ 2.4k media releases
● What if we don’t know the names
they’ll use?
● What if they say something vague
like a “a million for x”?
● We scraped thousands of Facebook
posts and media releases from official
websites
● We used a pre-trained model from
Spacy, a common Python library
● The model identified names, locations
and references to money in the texts
● Since 2022 these tools have become even easier to use
● You can also achieve similar results with GenAI tools ike ChatGPT
2) Fuzzy matching
The problem:
● We need to connect datasets that are
slightly different
○ Josh Nicholas vs Joshua Nicholas
● Previously we used a method called
Levenshtein Distance
○ Matching every name against every
other name
○ It took ages!!
Making use of the AI ecosystem
● When you input text into a chatbot it
turns the text into a series of numbers
● We can use this same technique to
match names
• Find the numbers that are most
similar
● This same technique can be scaled to
full sentences or even entire documents
● Can also be run in reverse - what things
are least similar
3) Finding a needle in a haystack
The problem:
● Who poses most with dogs, babies,
hi vis etc.?
● We need to search through
thousands of images, many of them
not captioned
● There are loads of models that are
immediately useful
• E.g. ones for workplace safety, that can
identify hard hats etc.
• Also lots of free datasets online
● We manually created a training dataset
with novelty cheques and hi vis vests
Training a detection model
● Machine learning models can learn their own rules from the patterns in
data
● This helps us when we need to work with fuzzier/unlabelled data
○ Images, entire documents etc.
● There are thousands of models available for free online
● We can fine tune them for specific tasks if necessary
● They can be run directly or built into interfaces for common problems
● GenAI tools can often do the same tasks, but harder to scale
Quick summary
● Homework 1 (if you can code),
○ Open the Huggingface MODELS tab and choose a model that
would solve an editorial problem for you
○ Try out the tool and share your results in the LinkedIn Group
■ Why/what did you choose?
● Homework 2 (If you can't code yet):
○ Open the Huggingface SPACES tab and choose one of the tools
○ Give it a prompt and share your results in the LinkedIn Group
■ Why/what did you choose?
● How would this help in a journalism context?
Homework
1. Join the Closed LinkedIn Group
2. Post your work for trainer feedback within 4 weeks
3. Leave constructive feedback on at least one other
person’s post - within 2 weeks
4. Follow the Group Rules!
How homework works
Any questions?
?
Josh Nicholas
Data journalist
The Guardian
josh.nicholas@theguardian.com
Thank you!

Weitere ähnliche Inhalte

Ähnlich wie Webinar 3 - AI & Investigative Journalism - Training Slidedeck

DocGPT
DocGPTDocGPT
ChatGPT in academic settings H2.de
ChatGPT in academic settings H2.deChatGPT in academic settings H2.de
ChatGPT in academic settings H2.de
David Döring
 
Big Data & Social Analytics presentation
Big Data & Social Analytics presentationBig Data & Social Analytics presentation
Big Data & Social Analytics presentation
gustavosouto
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
Charmi Chokshi
 
Let's talk FOSS!
Let's talk FOSS!Let's talk FOSS!
Let's talk FOSS!
AditiSaxena72
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Ramiro Aduviri Velasco
 
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
TheFamily
 
Data science unit 1 By: Professor Lili Saghafi
Data science unit 1 By: Professor Lili Saghafi Data science unit 1 By: Professor Lili Saghafi
Data science unit 1 By: Professor Lili Saghafi
Professor Lili Saghafi
 
Tensorflow a brief introduction (1).pptx
Tensorflow a brief introduction (1).pptxTensorflow a brief introduction (1).pptx
Tensorflow a brief introduction (1).pptx
AnandMenon54
 
Binary crosswords
Binary crosswordsBinary crosswords
Binary crosswords
Laurent Cerveau
 
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
Alexander Borzunov
 
Getting a Data Science Job
Getting a Data Science JobGetting a Data Science Job
Getting a Data Science Job
Alexey Grigorev
 
Software Engineering Primer
Software Engineering PrimerSoftware Engineering Primer
Software Engineering Primer
Georg Buske
 
Machine Learning Product Managers Meetup Event
Machine Learning Product Managers Meetup EventMachine Learning Product Managers Meetup Event
Machine Learning Product Managers Meetup Event
Benjamin Schulte
 
What Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PMWhat Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PM
Product School
 
Take the Open Source road: learn, share, grow
Take the Open Source road: learn, share, growTake the Open Source road: learn, share, grow
Take the Open Source road: learn, share, grow
NaLUG
 
Getting it Built
Getting it BuiltGetting it Built
Getting it Built
Andrew Gassen
 
Curtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahooCurtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahoo
羽祈 張
 
Introduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLabIntroduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLab
CloudxLab
 
PETE&C 2018: Let's Get Digital: Problem solving that is!
PETE&C 2018: Let's Get Digital: Problem solving that is!PETE&C 2018: Let's Get Digital: Problem solving that is!
PETE&C 2018: Let's Get Digital: Problem solving that is!
The Source for Learning, Inc.
 

Ähnlich wie Webinar 3 - AI & Investigative Journalism - Training Slidedeck (20)

DocGPT
DocGPTDocGPT
DocGPT
 
ChatGPT in academic settings H2.de
ChatGPT in academic settings H2.deChatGPT in academic settings H2.de
ChatGPT in academic settings H2.de
 
Big Data & Social Analytics presentation
Big Data & Social Analytics presentationBig Data & Social Analytics presentation
Big Data & Social Analytics presentation
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
 
Let's talk FOSS!
Let's talk FOSS!Let's talk FOSS!
Let's talk FOSS!
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
 
Data science unit 1 By: Professor Lili Saghafi
Data science unit 1 By: Professor Lili Saghafi Data science unit 1 By: Professor Lili Saghafi
Data science unit 1 By: Professor Lili Saghafi
 
Tensorflow a brief introduction (1).pptx
Tensorflow a brief introduction (1).pptxTensorflow a brief introduction (1).pptx
Tensorflow a brief introduction (1).pptx
 
Binary crosswords
Binary crosswordsBinary crosswords
Binary crosswords
 
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
 
Getting a Data Science Job
Getting a Data Science JobGetting a Data Science Job
Getting a Data Science Job
 
Software Engineering Primer
Software Engineering PrimerSoftware Engineering Primer
Software Engineering Primer
 
Machine Learning Product Managers Meetup Event
Machine Learning Product Managers Meetup EventMachine Learning Product Managers Meetup Event
Machine Learning Product Managers Meetup Event
 
What Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PMWhat Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PM
 
Take the Open Source road: learn, share, grow
Take the Open Source road: learn, share, growTake the Open Source road: learn, share, grow
Take the Open Source road: learn, share, grow
 
Getting it Built
Getting it BuiltGetting it Built
Getting it Built
 
Curtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahooCurtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahoo
 
Introduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLabIntroduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLab
 
PETE&C 2018: Let's Get Digital: Problem solving that is!
PETE&C 2018: Let's Get Digital: Problem solving that is!PETE&C 2018: Let's Get Digital: Problem solving that is!
PETE&C 2018: Let's Get Digital: Problem solving that is!
 

Mehr von walkleys

List of Pacific Media Outlets and Sources 2024
List of Pacific Media Outlets and Sources 2024List of Pacific Media Outlets and Sources 2024
List of Pacific Media Outlets and Sources 2024
walkleys
 
Sean Dorney Grant Frequently Asked Questions - Slide Deck
Sean Dorney Grant Frequently Asked Questions - Slide DeckSean Dorney Grant Frequently Asked Questions - Slide Deck
Sean Dorney Grant Frequently Asked Questions - Slide Deck
walkleys
 
PNG's Women in Waiting, Essay by Jo Chandler
PNG's Women in Waiting, Essay by Jo ChandlerPNG's Women in Waiting, Essay by Jo Chandler
PNG's Women in Waiting, Essay by Jo Chandler
walkleys
 
Climate justice in the Pacific, by Jo Chandler
Climate justice in the Pacific, by Jo ChandlerClimate justice in the Pacific, by Jo Chandler
Climate justice in the Pacific, by Jo Chandler
walkleys
 
Webinar 2 - Slides_Making the business case for solutions journalism.pdf
Webinar 2 - Slides_Making the business case for solutions journalism.pdfWebinar 2 - Slides_Making the business case for solutions journalism.pdf
Webinar 2 - Slides_Making the business case for solutions journalism.pdf
walkleys
 
SLIDE PDF - Learn about AI for Text Journalism.pdf
SLIDE PDF - Learn about AI for Text Journalism.pdfSLIDE PDF - Learn about AI for Text Journalism.pdf
SLIDE PDF - Learn about AI for Text Journalism.pdf
walkleys
 

Mehr von walkleys (6)

List of Pacific Media Outlets and Sources 2024
List of Pacific Media Outlets and Sources 2024List of Pacific Media Outlets and Sources 2024
List of Pacific Media Outlets and Sources 2024
 
Sean Dorney Grant Frequently Asked Questions - Slide Deck
Sean Dorney Grant Frequently Asked Questions - Slide DeckSean Dorney Grant Frequently Asked Questions - Slide Deck
Sean Dorney Grant Frequently Asked Questions - Slide Deck
 
PNG's Women in Waiting, Essay by Jo Chandler
PNG's Women in Waiting, Essay by Jo ChandlerPNG's Women in Waiting, Essay by Jo Chandler
PNG's Women in Waiting, Essay by Jo Chandler
 
Climate justice in the Pacific, by Jo Chandler
Climate justice in the Pacific, by Jo ChandlerClimate justice in the Pacific, by Jo Chandler
Climate justice in the Pacific, by Jo Chandler
 
Webinar 2 - Slides_Making the business case for solutions journalism.pdf
Webinar 2 - Slides_Making the business case for solutions journalism.pdfWebinar 2 - Slides_Making the business case for solutions journalism.pdf
Webinar 2 - Slides_Making the business case for solutions journalism.pdf
 
SLIDE PDF - Learn about AI for Text Journalism.pdf
SLIDE PDF - Learn about AI for Text Journalism.pdfSLIDE PDF - Learn about AI for Text Journalism.pdf
SLIDE PDF - Learn about AI for Text Journalism.pdf
 

Kürzlich hochgeladen

Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
NgcHiNguyn25
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Diana Rendina
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 

Kürzlich hochgeladen (20)

Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 

Webinar 3 - AI & Investigative Journalism - Training Slidedeck

  • 1. AI and investigative journalism Josh Nicholas Data journalist The Guardian
  • 2. Agenda ● Introduction ● What is AI ○ Different forms ○ More than a black box ● Case studies ○ Extracting useful info from text ○ Fuzzy matching between datasets ○ Finding a needle in a haystack ● Homework ● Q + A Code for all examples is on my Github More resources in HANDOUT After the session: ● Recording ● Handout ● Homework in our LinkedIn Group ● LINK to join
  • 4. ● Many AI terms are used interchangeably ● We are going to focus on machine learning models ● These are algorithms that can learn their own rules from data Artificial intelligence is catch-all This graphic was adapted from Build a Large Language Model by Sebastian Raschka
  • 6. Learning from the data ● Machines are great at identifying patterns that aren’t obvious to humans ● Given some examples to learn from, an algorithm can find more
  • 7. AI and newsgathering ● Machine-learning algorithms are trained on large datasets ○ They can be fine-tuned on smaller datasets ● They are useful for “fuzzy” problems, when it’s hard to write explicit rules/instructions ● You can access many pre-trained algorithms for free e.g. ○ Huggingface.co ○ Google, OpenAI, Mistral, Facebook etc. and… ● If we can’t find an algorithm that fits our purpose, we can fine-tune an existing one
  • 8. Examples we can steal from borrow • Email spam filters • Recommendation systems (Netflix, Spotify etc.) • Language translation • Audio transcription • Facial recognition • Object detection • Predictive text • Search engines ■ Google BERT etc.
  • 10. 1) Extraction The problem: ● Extracting names, locations and dollar amounts from thousands of text documents: ○ 34k+ Facebook posts ○ 2.4k media releases ● What if we don’t know the names they’ll use? ● What if they say something vague like a “a million for x”?
  • 11. ● We scraped thousands of Facebook posts and media releases from official websites ● We used a pre-trained model from Spacy, a common Python library ● The model identified names, locations and references to money in the texts ● Since 2022 these tools have become even easier to use ● You can also achieve similar results with GenAI tools ike ChatGPT
  • 12. 2) Fuzzy matching The problem: ● We need to connect datasets that are slightly different ○ Josh Nicholas vs Joshua Nicholas ● Previously we used a method called Levenshtein Distance ○ Matching every name against every other name ○ It took ages!!
  • 13. Making use of the AI ecosystem ● When you input text into a chatbot it turns the text into a series of numbers ● We can use this same technique to match names • Find the numbers that are most similar ● This same technique can be scaled to full sentences or even entire documents ● Can also be run in reverse - what things are least similar
  • 14. 3) Finding a needle in a haystack The problem: ● Who poses most with dogs, babies, hi vis etc.? ● We need to search through thousands of images, many of them not captioned
  • 15. ● There are loads of models that are immediately useful • E.g. ones for workplace safety, that can identify hard hats etc. • Also lots of free datasets online ● We manually created a training dataset with novelty cheques and hi vis vests Training a detection model
  • 16. ● Machine learning models can learn their own rules from the patterns in data ● This helps us when we need to work with fuzzier/unlabelled data ○ Images, entire documents etc. ● There are thousands of models available for free online ● We can fine tune them for specific tasks if necessary ● They can be run directly or built into interfaces for common problems ● GenAI tools can often do the same tasks, but harder to scale Quick summary
  • 17. ● Homework 1 (if you can code), ○ Open the Huggingface MODELS tab and choose a model that would solve an editorial problem for you ○ Try out the tool and share your results in the LinkedIn Group ■ Why/what did you choose? ● Homework 2 (If you can't code yet): ○ Open the Huggingface SPACES tab and choose one of the tools ○ Give it a prompt and share your results in the LinkedIn Group ■ Why/what did you choose? ● How would this help in a journalism context? Homework
  • 18. 1. Join the Closed LinkedIn Group 2. Post your work for trainer feedback within 4 weeks 3. Leave constructive feedback on at least one other person’s post - within 2 weeks 4. Follow the Group Rules! How homework works
  • 19. Any questions? ? Josh Nicholas Data journalist The Guardian josh.nicholas@theguardian.com