SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Project Update - July 11, 2013
The Eric & Wendy Schmidt
Data Science
for Social Good
Summer Fellowship 2013
www.dssg.io | dssg-ushahidi@googlegroups.com
Ushahidi Workflow
Ushahidi Workflow +
DSSG
Data Sets
23,000 reports from 20 datasets
• 22% English
• 35% non-English
• 43% mixed languages
Each report includes text, category, location,
sometimes more data
Data Sets
Additional
unusable
datasets for
various reasons
(e.g. overly
formulaic
language)
What is the
quality of the
existing "gold
standard"
annotation?
Working on
translations of
Afghanistan election
(peaceful)
Kenyan election
(less peaceful)
Data Set Differences
Current Task Status [July 11]
1) Suggest categories.......................
2) Extract named entities...................
(especially locations)
3) Detect language............................
End of presentation has more extensive technical details
Toy Demo
http://ec2-54-218-196-140.us-west-2.compute.amazonaws.com/home
Note this is ONLY a basic "toy" user interface to demonstrate the current prototype functionality.
Our plan is to deliver an open-source code library,
which Ushahidi will incorporate into the existing user interface.
If link doesn't work -- just look at the screenshots in the next slides. :)
Demo: Example #1
Demo: Example #2
Secondary Project Ideas
1. Detect private info to strip
2. Urgency assessment
3. Filtering irrelevant reports (not strictly spam)
4. Automatically proposing new [sub-]categories
5. Cluster similar (non-identical) reports
6. Hierarchical topic modelling / visualization
Evaluation Plans
• Tap into Ushahidi and crisis mapping
communities for feedback
• Simulate past event with our system
• Success metrics:
o Increased annotator speed
o Increased annotator categorization accuracy
o Decreased annotator frustration/tedium
Feedback welcome!
Contact us at dssg-
ushahidi@googlegroups.com
We would love your input!
See next 4 slides for technical details on our 4 tasks...
or skip if you're happy to stay unaware... :)
1) Suggest categories
Currently:
• Simple bag-of-words unigram features
• 1-vs.-all classification (scikit-learn)
• Little categories fewer big categories
• Performance uninspiring :(
Future:
Bigrams... word frequency filter...
2) Extract named entities
Currently:
• NLTK's Named Entity Recognizer
• Eval: pretty good
Future:
• Train location-recognizer on datasets
• Merge types for non-location NEs
3) Detect Language
Currently:
• Existing packages (Bing, python, ...)
Future:
• Evaluate quality
• Allow event-specific language bias
4) Near-Duplicate
Detection
Currently:
• SimHash compares distances of message
text hashes efficiently
Future:
• Evaluate quality more rigorously
• Explore other methods

Weitere ähnliche Inhalte

Mehr von Ushahidi

Ushahidi Toolbox - Implementation
Ushahidi Toolbox - ImplementationUshahidi Toolbox - Implementation
Ushahidi Toolbox - ImplementationUshahidi
 
Ushahidi Toolbox - Assessment
Ushahidi Toolbox - AssessmentUshahidi Toolbox - Assessment
Ushahidi Toolbox - AssessmentUshahidi
 
Kenya Ushahidi Evaluation: Unsung Peace Heros/Building Bridges
Kenya Ushahidi Evaluation: Unsung Peace Heros/Building BridgesKenya Ushahidi Evaluation: Unsung Peace Heros/Building Bridges
Kenya Ushahidi Evaluation: Unsung Peace Heros/Building BridgesUshahidi
 
Kenya Ushahidi Evaluation: Uchaguzi
Kenya Ushahidi Evaluation: UchaguziKenya Ushahidi Evaluation: Uchaguzi
Kenya Ushahidi Evaluation: UchaguziUshahidi
 
Kenya Ushahidi Evaluation: Blog Series
Kenya Ushahidi Evaluation: Blog SeriesKenya Ushahidi Evaluation: Blog Series
Kenya Ushahidi Evaluation: Blog SeriesUshahidi
 
Pivoting An African Open Source Project
Pivoting An African Open Source ProjectPivoting An African Open Source Project
Pivoting An African Open Source ProjectUshahidi
 
Ushahidi esri juliana
Ushahidi esri julianaUshahidi esri juliana
Ushahidi esri julianaUshahidi
 
Ushahidi personas scenarios
Ushahidi personas scenariosUshahidi personas scenarios
Ushahidi personas scenariosUshahidi
 
Citizen pollution mapping made easy
Citizen pollution mapping made easy Citizen pollution mapping made easy
Citizen pollution mapping made easy Ushahidi
 
Map it, Change it
Map it, Change itMap it, Change it
Map it, Change itUshahidi
 
Map it, Make it, Hack it
Map it, Make it, Hack itMap it, Make it, Hack it
Map it, Make it, Hack itUshahidi
 
What if Citizens Mapped Health?
What if Citizens Mapped Health?What if Citizens Mapped Health?
What if Citizens Mapped Health?Ushahidi
 
Re-imagining Citizen Engagement
Re-imagining Citizen EngagementRe-imagining Citizen Engagement
Re-imagining Citizen EngagementUshahidi
 
Ushahidi Research Seminar 11.11.11
Ushahidi Research Seminar 11.11.11Ushahidi Research Seminar 11.11.11
Ushahidi Research Seminar 11.11.11Ushahidi
 
Ihub Research
Ihub ResearchIhub Research
Ihub ResearchUshahidi
 
What's in the toolkit (Ushahidi at ETHz)
What's in the toolkit (Ushahidi at ETHz)What's in the toolkit (Ushahidi at ETHz)
What's in the toolkit (Ushahidi at ETHz)Ushahidi
 
Volunteer Mappers: Building community resilience with citizen media
Volunteer Mappers: Building community resilience with citizen mediaVolunteer Mappers: Building community resilience with citizen media
Volunteer Mappers: Building community resilience with citizen mediaUshahidi
 
Ushahidi Deployment - Output Toolbox
Ushahidi Deployment - Output ToolboxUshahidi Deployment - Output Toolbox
Ushahidi Deployment - Output ToolboxUshahidi
 
Ushahidi Deployment - Implementation Toolbox
Ushahidi Deployment - Implementation ToolboxUshahidi Deployment - Implementation Toolbox
Ushahidi Deployment - Implementation ToolboxUshahidi
 

Mehr von Ushahidi (20)

Ushahidi Toolbox - Implementation
Ushahidi Toolbox - ImplementationUshahidi Toolbox - Implementation
Ushahidi Toolbox - Implementation
 
Ushahidi Toolbox - Assessment
Ushahidi Toolbox - AssessmentUshahidi Toolbox - Assessment
Ushahidi Toolbox - Assessment
 
Kenya Ushahidi Evaluation: Unsung Peace Heros/Building Bridges
Kenya Ushahidi Evaluation: Unsung Peace Heros/Building BridgesKenya Ushahidi Evaluation: Unsung Peace Heros/Building Bridges
Kenya Ushahidi Evaluation: Unsung Peace Heros/Building Bridges
 
Kenya Ushahidi Evaluation: Uchaguzi
Kenya Ushahidi Evaluation: UchaguziKenya Ushahidi Evaluation: Uchaguzi
Kenya Ushahidi Evaluation: Uchaguzi
 
Kenya Ushahidi Evaluation: Blog Series
Kenya Ushahidi Evaluation: Blog SeriesKenya Ushahidi Evaluation: Blog Series
Kenya Ushahidi Evaluation: Blog Series
 
Pivoting An African Open Source Project
Pivoting An African Open Source ProjectPivoting An African Open Source Project
Pivoting An African Open Source Project
 
Ushahidi esri juliana
Ushahidi esri julianaUshahidi esri juliana
Ushahidi esri juliana
 
Ushahidi personas scenarios
Ushahidi personas scenariosUshahidi personas scenarios
Ushahidi personas scenarios
 
Citizen pollution mapping made easy
Citizen pollution mapping made easy Citizen pollution mapping made easy
Citizen pollution mapping made easy
 
Testimony
TestimonyTestimony
Testimony
 
Map it, Change it
Map it, Change itMap it, Change it
Map it, Change it
 
Map it, Make it, Hack it
Map it, Make it, Hack itMap it, Make it, Hack it
Map it, Make it, Hack it
 
What if Citizens Mapped Health?
What if Citizens Mapped Health?What if Citizens Mapped Health?
What if Citizens Mapped Health?
 
Re-imagining Citizen Engagement
Re-imagining Citizen EngagementRe-imagining Citizen Engagement
Re-imagining Citizen Engagement
 
Ushahidi Research Seminar 11.11.11
Ushahidi Research Seminar 11.11.11Ushahidi Research Seminar 11.11.11
Ushahidi Research Seminar 11.11.11
 
Ihub Research
Ihub ResearchIhub Research
Ihub Research
 
What's in the toolkit (Ushahidi at ETHz)
What's in the toolkit (Ushahidi at ETHz)What's in the toolkit (Ushahidi at ETHz)
What's in the toolkit (Ushahidi at ETHz)
 
Volunteer Mappers: Building community resilience with citizen media
Volunteer Mappers: Building community resilience with citizen mediaVolunteer Mappers: Building community resilience with citizen media
Volunteer Mappers: Building community resilience with citizen media
 
Ushahidi Deployment - Output Toolbox
Ushahidi Deployment - Output ToolboxUshahidi Deployment - Output Toolbox
Ushahidi Deployment - Output Toolbox
 
Ushahidi Deployment - Implementation Toolbox
Ushahidi Deployment - Implementation ToolboxUshahidi Deployment - Implementation Toolbox
Ushahidi Deployment - Implementation Toolbox
 

Kürzlich hochgeladen

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 

Kürzlich hochgeladen (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 

Data Science for Social Good and Ushahidi

  • 1. Project Update - July 11, 2013 The Eric & Wendy Schmidt Data Science for Social Good Summer Fellowship 2013 www.dssg.io | dssg-ushahidi@googlegroups.com
  • 4. Data Sets 23,000 reports from 20 datasets • 22% English • 35% non-English • 43% mixed languages Each report includes text, category, location, sometimes more data
  • 5. Data Sets Additional unusable datasets for various reasons (e.g. overly formulaic language) What is the quality of the existing "gold standard" annotation? Working on translations of
  • 7. Current Task Status [July 11] 1) Suggest categories....................... 2) Extract named entities................... (especially locations) 3) Detect language............................ End of presentation has more extensive technical details
  • 8. Toy Demo http://ec2-54-218-196-140.us-west-2.compute.amazonaws.com/home Note this is ONLY a basic "toy" user interface to demonstrate the current prototype functionality. Our plan is to deliver an open-source code library, which Ushahidi will incorporate into the existing user interface. If link doesn't work -- just look at the screenshots in the next slides. :)
  • 11. Secondary Project Ideas 1. Detect private info to strip 2. Urgency assessment 3. Filtering irrelevant reports (not strictly spam) 4. Automatically proposing new [sub-]categories 5. Cluster similar (non-identical) reports 6. Hierarchical topic modelling / visualization
  • 12. Evaluation Plans • Tap into Ushahidi and crisis mapping communities for feedback • Simulate past event with our system • Success metrics: o Increased annotator speed o Increased annotator categorization accuracy o Decreased annotator frustration/tedium
  • 13. Feedback welcome! Contact us at dssg- ushahidi@googlegroups.com We would love your input! See next 4 slides for technical details on our 4 tasks... or skip if you're happy to stay unaware... :)
  • 14. 1) Suggest categories Currently: • Simple bag-of-words unigram features • 1-vs.-all classification (scikit-learn) • Little categories fewer big categories • Performance uninspiring :( Future: Bigrams... word frequency filter...
  • 15. 2) Extract named entities Currently: • NLTK's Named Entity Recognizer • Eval: pretty good Future: • Train location-recognizer on datasets • Merge types for non-location NEs
  • 16. 3) Detect Language Currently: • Existing packages (Bing, python, ...) Future: • Evaluate quality • Allow event-specific language bias
  • 17. 4) Near-Duplicate Detection Currently: • SimHash compares distances of message text hashes efficiently Future: • Evaluate quality more rigorously • Explore other methods

Hinweis der Redaktion

  1. We're happy to give an update on our Ushahidi project's . [Abe Gong]
  2. Citizens submit reports (via SMS, twitter, and the web) which are reviewed by annotators. It's a slow manual process -- to categorize, geolocate, strip private info, etc.
  3. We're building a data wizardry system to support the manual annotation process
  4. Since Ushahidi reports are mostly public, private info should be hidden. example: names, phone numbers, and addresses 4. example: in Haiti earthquake, we might observe unexpected robbery reports arising. 5. This is mainly for a better workflow, because annotators can work better when they process similar reports altogether. 6. To see which topics are commonly occurring in Election in general, and which topics only occur in Kenyan election specifically.