Data Science for Social Good and Ushahidi

•Als PPT, PDF herunterladen•

1 gefällt mir•3,524 views

The Eric and Wendy Schmidt Data Science for Social Good - Summer Fellowship 2013 Preliminary Update July 2013 About the DSSG Rock stars: http://dssg.io/ https://twitter.com/datascifellows/ Their project: http://dssg.io/2013/07/15/ushahidi-machine-learning-for-human-rights.html More @ ushahidi.com / wiki.ushahidi.com / blog.ushahidi.com

Technologie

Project Update - July 11, 2013
The Eric & Wendy Schmidt
Data Science
for Social Good
Summer Fellowship 2013
www.dssg.io | dssg-ushahidi@googlegroups.com

Data Sets
23,000 reports from 20 datasets
• 22% English
• 35% non-English
• 43% mixed languages
Each report includes text, category, location,
sometimes more data

Data Sets
Additional
unusable
datasets for
various reasons
(e.g. overly
formulaic
language)
What is the
quality of the
existing "gold
standard"
annotation?
Working on
translations of

Afghanistan election
(peaceful)
Kenyan election
(less peaceful)
Data Set Differences

Current Task Status [July 11]
1) Suggest categories.......................
2) Extract named entities...................
(especially locations)
3) Detect language............................
End of presentation has more extensive technical details

Toy Demo
http://ec2-54-218-196-140.us-west-2.compute.amazonaws.com/home
Note this is ONLY a basic "toy" user interface to demonstrate the current prototype functionality.
Our plan is to deliver an open-source code library,
which Ushahidi will incorporate into the existing user interface.
If link doesn't work -- just look at the screenshots in the next slides. :)

Secondary Project Ideas
1. Detect private info to strip
2. Urgency assessment
3. Filtering irrelevant reports (not strictly spam)
4. Automatically proposing new [sub-]categories
5. Cluster similar (non-identical) reports
6. Hierarchical topic modelling / visualization

Evaluation Plans
• Tap into Ushahidi and crisis mapping
communities for feedback
• Simulate past event with our system
• Success metrics:
o Increased annotator speed
o Increased annotator categorization accuracy
o Decreased annotator frustration/tedium

Feedback welcome!
Contact us at dssg-
ushahidi@googlegroups.com
We would love your input!
See next 4 slides for technical details on our 4 tasks...
or skip if you're happy to stay unaware... :)

1) Suggest categories
Currently:
• Simple bag-of-words unigram features
• 1-vs.-all classification (scikit-learn)
• Little categories fewer big categories
• Performance uninspiring :(
Future:
Bigrams... word frequency filter...

2) Extract named entities
Currently:
• NLTK's Named Entity Recognizer
• Eval: pretty good
Future:
• Train location-recognizer on datasets
• Merge types for non-location NEs

3) Detect Language
Currently:
• Existing packages (Bing, python, ...)
Future:
• Evaluate quality
• Allow event-specific language bias

4) Near-Duplicate
Detection
Currently:
• SimHash compares distances of message
text hashes efficiently
Future:
• Evaluate quality more rigorously
• Explore other methods

Empfohlen

Around the Globe Corruption Mapping (part 1)Ushahidi

Around the Globe Corruption Mapping (part 2)Ushahidi

Anti-Corruption Mapping (April 2013, part 1)Ushahidi

Corruption mapping (april 2013, part 2)Ushahidi

Ushahidi and Crowdmap trainingAnahi Iacucci

Data Science for Social Good and Ushahidi - Final PresentationUshahidi

Ushahdi 3.0 Design Framework Ushahidi

Ushahidi Toolbox - Real-time EvaluationUshahidi

Empfohlen

Around the Globe Corruption Mapping (part 1)Ushahidi

Around the Globe Corruption Mapping (part 2)Ushahidi

Anti-Corruption Mapping (April 2013, part 1)Ushahidi

Corruption mapping (april 2013, part 2)Ushahidi

Ushahidi and Crowdmap trainingAnahi Iacucci

Data Science for Social Good and Ushahidi - Final PresentationUshahidi

Ushahdi 3.0 Design Framework Ushahidi

Ushahidi Toolbox - Real-time EvaluationUshahidi

Ushahidi Toolbox - ImplementationUshahidi

Ushahidi Toolbox - AssessmentUshahidi

Kenya Ushahidi Evaluation: Unsung Peace Heros/Building BridgesUshahidi

Kenya Ushahidi Evaluation: UchaguziUshahidi

Kenya Ushahidi Evaluation: Blog SeriesUshahidi

Pivoting An African Open Source ProjectUshahidi

Ushahidi esri julianaUshahidi

Ushahidi personas scenariosUshahidi

Citizen pollution mapping made easy Ushahidi

TestimonyUshahidi

Map it, Change itUshahidi

Map it, Make it, Hack itUshahidi

What if Citizens Mapped Health?Ushahidi

Re-imagining Citizen EngagementUshahidi

Ushahidi Research Seminar 11.11.11Ushahidi

Ihub ResearchUshahidi

What's in the toolkit (Ushahidi at ETHz)Ushahidi

Volunteer Mappers: Building community resilience with citizen mediaUshahidi

Ushahidi Deployment - Output ToolboxUshahidi

Ushahidi Deployment - Implementation ToolboxUshahidi

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Weitere ähnliche Inhalte

Mehr von Ushahidi

Ushahidi Toolbox - ImplementationUshahidi

Ushahidi Toolbox - AssessmentUshahidi

Kenya Ushahidi Evaluation: Unsung Peace Heros/Building BridgesUshahidi

Kenya Ushahidi Evaluation: UchaguziUshahidi

Kenya Ushahidi Evaluation: Blog SeriesUshahidi

Pivoting An African Open Source ProjectUshahidi

Ushahidi esri julianaUshahidi

Ushahidi personas scenariosUshahidi

Citizen pollution mapping made easy Ushahidi

TestimonyUshahidi

Map it, Change itUshahidi

Map it, Make it, Hack itUshahidi

What if Citizens Mapped Health?Ushahidi

Re-imagining Citizen EngagementUshahidi

Ushahidi Research Seminar 11.11.11Ushahidi

Ihub ResearchUshahidi

What's in the toolkit (Ushahidi at ETHz)Ushahidi

Volunteer Mappers: Building community resilience with citizen mediaUshahidi

Ushahidi Deployment - Output ToolboxUshahidi

Ushahidi Deployment - Implementation ToolboxUshahidi

Mehr von Ushahidi (20)

Ushahidi Toolbox - Implementation

Ushahidi Toolbox - Assessment

Kenya Ushahidi Evaluation: Unsung Peace Heros/Building Bridges

Kenya Ushahidi Evaluation: Uchaguzi

Kenya Ushahidi Evaluation: Blog Series

Pivoting An African Open Source Project

Ushahidi esri juliana

Ushahidi personas scenarios

Citizen pollution mapping made easy

Testimony

Map it, Change it

Map it, Make it, Hack it

What if Citizens Mapped Health?

Re-imagining Citizen Engagement

Ushahidi Research Seminar 11.11.11

Ihub Research

What's in the toolkit (Ushahidi at ETHz)

Volunteer Mappers: Building community resilience with citizen media

Ushahidi Deployment - Output Toolbox

Ushahidi Deployment - Implementation Toolbox

Kürzlich hochgeladen

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

Key Features Of Token Development (1).pptxLBM Solutions

AI as an Interface for Commercial BuildingsMemoori

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies

Scaling API-first – The story of a global engineering organizationRadu Cotescu

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Install Stable Diffusion in windows machinePadma Pradeep

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4

How to Remove Document Management Hurdles with X-Docs?XfilesPro

Kürzlich hochgeladen (20)

Human Factors of XR: Using Human Factors to Design XR Systems

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

Key Features Of Token Development (1).pptx

AI as an Interface for Commercial Buildings

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

Benefits Of Flutter Compared To Other Frameworks

Scaling API-first – The story of a global engineering organization

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

Install Stable Diffusion in windows machine

Unblocking The Main Thread Solving ANRs and Frozen Frames

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi

The 7 Things I Know About Cyber Security After 25 Years | April 2024

My Hashitalk Indonesia April 2024 Presentation

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Azure Monitor & Application Insight to monitor Infrastructure & Application

How to Remove Document Management Hurdles with X-Docs?

Data Science for Social Good and Ushahidi

1. Project Update - July 11, 2013 The Eric & Wendy Schmidt Data Science for Social Good Summer Fellowship 2013 www.dssg.io | dssg-ushahidi@googlegroups.com

2. Ushahidi Workflow

3. Ushahidi Workflow + DSSG

4. Data Sets 23,000 reports from 20 datasets • 22% English • 35% non-English • 43% mixed languages Each report includes text, category, location, sometimes more data

5. Data Sets Additional unusable datasets for various reasons (e.g. overly formulaic language) What is the quality of the existing "gold standard" annotation? Working on translations of

6. Afghanistan election (peaceful) Kenyan election (less peaceful) Data Set Differences

7. Current Task Status [July 11] 1) Suggest categories....................... 2) Extract named entities................... (especially locations) 3) Detect language............................ End of presentation has more extensive technical details

8. Toy Demo http://ec2-54-218-196-140.us-west-2.compute.amazonaws.com/home Note this is ONLY a basic "toy" user interface to demonstrate the current prototype functionality. Our plan is to deliver an open-source code library, which Ushahidi will incorporate into the existing user interface. If link doesn't work -- just look at the screenshots in the next slides. :)

9. Demo: Example #1

10. Demo: Example #2

11. Secondary Project Ideas 1. Detect private info to strip 2. Urgency assessment 3. Filtering irrelevant reports (not strictly spam) 4. Automatically proposing new [sub-]categories 5. Cluster similar (non-identical) reports 6. Hierarchical topic modelling / visualization

12. Evaluation Plans • Tap into Ushahidi and crisis mapping communities for feedback • Simulate past event with our system • Success metrics: o Increased annotator speed o Increased annotator categorization accuracy o Decreased annotator frustration/tedium

13. Feedback welcome! Contact us at dssg- ushahidi@googlegroups.com We would love your input! See next 4 slides for technical details on our 4 tasks... or skip if you're happy to stay unaware... :)

14. 1) Suggest categories Currently: • Simple bag-of-words unigram features • 1-vs.-all classification (scikit-learn) • Little categories fewer big categories • Performance uninspiring :( Future: Bigrams... word frequency filter...

15. 2) Extract named entities Currently: • NLTK's Named Entity Recognizer • Eval: pretty good Future: • Train location-recognizer on datasets • Merge types for non-location NEs

16. 3) Detect Language Currently: • Existing packages (Bing, python, ...) Future: • Evaluate quality • Allow event-specific language bias

17. 4) Near-Duplicate Detection Currently: • SimHash compares distances of message text hashes efficiently Future: • Evaluate quality more rigorously • Explore other methods

Hinweis der Redaktion

We're happy to give an update on our Ushahidi project's . [Abe Gong]
Citizens submit reports (via SMS, twitter, and the web) which are reviewed by annotators. It's a slow manual process -- to categorize, geolocate, strip private info, etc.
We're building a data wizardry system to support the manual annotation process
Since Ushahidi reports are mostly public, private info should be hidden. example: names, phone numbers, and addresses 4. example: in Haiti earthquake, we might observe unexpected robbery reports arising. 5. This is mainly for a better workflow, because annotators can work better when they process similar reports altogether. 6. To see which topics are commonly occurring in Election in general, and which topics only occur in Kenyan election specifically.