SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Big Data and Me: experiences
from the front line
Sara-Jayne Farmer
Change Assembly
April 22nd 2013
ME
Me
• Data Scientist
• Using data to:
– connect communities
– improve access to information
– so people can make better decisions
– on both small and large scales
• It’s all about people:
– Local people: know their needs; need more information
– Local technologists: have skills; need connections
– Large organisations: have resources; need guidance
Some of those People
(smart, talented, dedicated hackers in Haiti, January 2013)
My Personal Three Vs
• Variety
– Data all over the place
– Csv, json, xml, excel, pdf, text, webpages, rss, scanned
pages, images, videos, audiofiles, maps, proprietary. Etc.
• Velocity
– Streams updating too fast for a mapping team (100-200 people)
to handle
– Pages updating too frequently to check by hand
• Volume
– Can’t open the data in a spreadsheet
– Can’t fit the data on my laptop
– Maxes out my credit card (thank you Amazon!)
VARIETY
“more people have mobile phones than toilets”
– UN, March 2013
But… but… there are always data issues…
• Datasets were difficult to find
• No data available after 2010
• Hard to track provenance – e.g. what decisions did
the people creating these datasets make? What
assumptions?
• Data was rounded up
• Countrynames didn’t match between sets
• Multiple charactersets (e.g. Å, A, Ԇ)
• Messy formatting (merges, ‘explanations’ etc)
e.g. Country Names
DR Congo in Data.UN.Org:
• “Congo, Democratic Republic of the”, “Congo
Democratic”, “Democratic Republic of the Congo”, “Congo
(Democratic Republic of the)”, “Congo, Dem. Rep.”, “Congo
Dem. Rep.”, “Congo, Democratic Republic of”, “Dem. Rep.
of Congo”, “Dem. Rep. of the Congo”
DR Congo in common standards:
• “Democratic Republic of the Congo” (UN
Stats), “Congo, The Democratic Republic of the”
(ISO3166), “Congo, Democratic Republic of the”
(FIPS10, Stanag), “180” (UN Stats), “COD”
(ISO3166, Stanag), “CG” (FIPS10)
And coding
And interpretation
• Hang on… don’t some people have more than one
phone?
• And how do you count the people without toilets?
• What if the cities have lots of phones and toilets, and
the rural areas don’t?
• Where does my composting toilet fit in this?
• How big were these surveys?
• What do we do with the zeros?
• Etc…
And purpose
And Communication
And Alternative Data Sources
And alternative alternatives…
• Social media proxies
• Grassroots maps
• Etc.
VELOCITY AND VOLUME
2013 Boston bombings
The Humans+Tools Solution: Crisismapping
Find…
Listen…
Estimate…
Geolocate…
Create maps…
Analyse
Explain
Use
BUT WE NEED MORE DATA
SCIENTISTS…
Build and Connect Communities
Train Non-Techies
Create Higher-level Tools
Big Data and Me: experiences
from the front line
Sara-Jayne Farmer
http://www.changeassembly.com/
@bodaceacat
MORE REFERENCES
strataconf.com
datasciencecentral.com
analytictalent.com
Tools
Formal (Free) Training
NYC Meetups (see meetup.com)
Volunteering: datakind.org

Weitere ähnliche Inhalte

Was ist angesagt?

Libraries in the Gigabit World
Libraries in the Gigabit WorldLibraries in the Gigabit World
Libraries in the Gigabit WorldNate Hill
 
Hyperlocal data journalism - Andy Dickinson
Hyperlocal data journalism - Andy DickinsonHyperlocal data journalism - Andy Dickinson
Hyperlocal data journalism - Andy DickinsonDataJournalismUK
 
Building Data-centric Media Organizations
Building Data-centric Media OrganizationsBuilding Data-centric Media Organizations
Building Data-centric Media OrganizationsJ T "Tom" Johnson
 
Towards a critical data journalism practice
Towards a critical data journalism practiceTowards a critical data journalism practice
Towards a critical data journalism practiceLiliana Bounegru
 
Mapping the Australian Twittersphere
Mapping the Australian TwittersphereMapping the Australian Twittersphere
Mapping the Australian TwittersphereAxel Bruns
 
Data! Action! Data journalism issues to watch in the next 10 years
Data! Action! Data journalism issues to watch in the next 10 yearsData! Action! Data journalism issues to watch in the next 10 years
Data! Action! Data journalism issues to watch in the next 10 yearsPaul Bradshaw
 

Was ist angesagt? (7)

Libraries in the Gigabit World
Libraries in the Gigabit WorldLibraries in the Gigabit World
Libraries in the Gigabit World
 
Hyperlocal data journalism - Andy Dickinson
Hyperlocal data journalism - Andy DickinsonHyperlocal data journalism - Andy Dickinson
Hyperlocal data journalism - Andy Dickinson
 
Building Data-centric Media Organizations
Building Data-centric Media OrganizationsBuilding Data-centric Media Organizations
Building Data-centric Media Organizations
 
Open Data Journalism
Open Data JournalismOpen Data Journalism
Open Data Journalism
 
Towards a critical data journalism practice
Towards a critical data journalism practiceTowards a critical data journalism practice
Towards a critical data journalism practice
 
Mapping the Australian Twittersphere
Mapping the Australian TwittersphereMapping the Australian Twittersphere
Mapping the Australian Twittersphere
 
Data! Action! Data journalism issues to watch in the next 10 years
Data! Action! Data journalism issues to watch in the next 10 yearsData! Action! Data journalism issues to watch in the next 10 years
Data! Action! Data journalism issues to watch in the next 10 years
 

Ähnlich wie Big Data Experiences from the Front Line

Evolution of the Humanitarian Data Ecosystem
Evolution of the Humanitarian Data EcosystemEvolution of the Humanitarian Data Ecosystem
Evolution of the Humanitarian Data EcosystemSara-Jayne Terp
 
Data visualization for development
Data visualization for developmentData visualization for development
Data visualization for developmentSara-Jayne Terp
 
Open Data Islands and Communities
Open Data Islands and CommunitiesOpen Data Islands and Communities
Open Data Islands and CommunitiesAlan Dix
 
2013 10-22 humanitarian data talk to data kind
2013 10-22 humanitarian data talk to data kind2013 10-22 humanitarian data talk to data kind
2013 10-22 humanitarian data talk to data kindSara-Jayne Terp
 
Digital divide and computer assisted reporting
Digital divide and computer assisted reportingDigital divide and computer assisted reporting
Digital divide and computer assisted reportingAnna Polud
 
2013 01-21 open itp crisis and development data
2013 01-21 open itp crisis and development data2013 01-21 open itp crisis and development data
2013 01-21 open itp crisis and development dataSara-Jayne Terp
 
Icc2013 country names
Icc2013 country namesIcc2013 country names
Icc2013 country namessirf13
 
Getting comfortable with Data
Getting comfortable with DataGetting comfortable with Data
Getting comfortable with DataRitvvij Parrikh
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptxDennicaRivera
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Thinkful
 
Big data and development
Big data and developmentBig data and development
Big data and developmentSimone Sala
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data Sharjeel Imtiaz
 
Big Data, Open Data, Big Costs - tim willoughby
Big Data, Open Data, Big Costs  - tim willoughbyBig Data, Open Data, Big Costs  - tim willoughby
Big Data, Open Data, Big Costs - tim willoughbyTim Willoughby
 
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014StampedeCon
 
The big story of small data.
The big story of small data. The big story of small data.
The big story of small data. Alan Dix
 
Designing for the Prime Interface
Designing for the Prime InterfaceDesigning for the Prime Interface
Designing for the Prime InterfaceBen Taylor
 

Ähnlich wie Big Data Experiences from the Front Line (20)

Evolution of the Humanitarian Data Ecosystem
Evolution of the Humanitarian Data EcosystemEvolution of the Humanitarian Data Ecosystem
Evolution of the Humanitarian Data Ecosystem
 
Data visualization for development
Data visualization for developmentData visualization for development
Data visualization for development
 
Open Data Islands and Communities
Open Data Islands and CommunitiesOpen Data Islands and Communities
Open Data Islands and Communities
 
2013 10-22 humanitarian data talk to data kind
2013 10-22 humanitarian data talk to data kind2013 10-22 humanitarian data talk to data kind
2013 10-22 humanitarian data talk to data kind
 
Digital divide and computer assisted reporting
Digital divide and computer assisted reportingDigital divide and computer assisted reporting
Digital divide and computer assisted reporting
 
2013 01-21 open itp crisis and development data
2013 01-21 open itp crisis and development data2013 01-21 open itp crisis and development data
2013 01-21 open itp crisis and development data
 
Icc2013 country names
Icc2013 country namesIcc2013 country names
Icc2013 country names
 
Getting comfortable with Data
Getting comfortable with DataGetting comfortable with Data
Getting comfortable with Data
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptx
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Gettind data used
Gettind data usedGettind data used
Gettind data used
 
Making data more human
Making data more humanMaking data more human
Making data more human
 
Big data and development
Big data and developmentBig data and development
Big data and development
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data
 
Big Data, Open Data, Big Costs - tim willoughby
Big Data, Open Data, Big Costs  - tim willoughbyBig Data, Open Data, Big Costs  - tim willoughby
Big Data, Open Data, Big Costs - tim willoughby
 
Open Data Journalism
Open Data JournalismOpen Data Journalism
Open Data Journalism
 
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
 
The big story of small data.
The big story of small data. The big story of small data.
The big story of small data.
 
Designing for the Prime Interface
Designing for the Prime InterfaceDesigning for the Prime Interface
Designing for the Prime Interface
 

Mehr von Sara-Jayne Terp

Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...Sara-Jayne Terp
 
Risk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of ageRisk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of ageSara-Jayne Terp
 
disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...Sara-Jayne Terp
 
Cognitive security: all the other things
Cognitive security: all the other thingsCognitive security: all the other things
Cognitive security: all the other thingsSara-Jayne Terp
 
The Business(es) of Disinformation
The Business(es) of DisinformationThe Business(es) of Disinformation
The Business(es) of DisinformationSara-Jayne Terp
 
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umaryland2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umarylandSara-Jayne Terp
 
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...Sara-Jayne Terp
 
2021-02-10_CogSecCollab_UBerkeley
2021-02-10_CogSecCollab_UBerkeley2021-02-10_CogSecCollab_UBerkeley
2021-02-10_CogSecCollab_UBerkeleySara-Jayne Terp
 
Using AMITT and ATT&CK frameworks
Using AMITT and ATT&CK frameworksUsing AMITT and ATT&CK frameworks
Using AMITT and ATT&CK frameworksSara-Jayne Terp
 
2020 12 nyu-workshop_cog_sec
2020 12 nyu-workshop_cog_sec2020 12 nyu-workshop_cog_sec
2020 12 nyu-workshop_cog_secSara-Jayne Terp
 
2019 11 terp_mansonbulletproof_master copy
2019 11 terp_mansonbulletproof_master copy2019 11 terp_mansonbulletproof_master copy
2019 11 terp_mansonbulletproof_master copySara-Jayne Terp
 
BSidesLV 2018 talk: social engineering at scale, a community guide
BSidesLV 2018 talk: social engineering at scale, a community guideBSidesLV 2018 talk: social engineering at scale, a community guide
BSidesLV 2018 talk: social engineering at scale, a community guideSara-Jayne Terp
 
Social engineering at scale
Social engineering at scaleSocial engineering at scale
Social engineering at scaleSara-Jayne Terp
 
engineering misinformation
engineering misinformationengineering misinformation
engineering misinformationSara-Jayne Terp
 
Online misinformation: they're coming for our brainz now
Online misinformation: they're coming for our brainz nowOnline misinformation: they're coming for our brainz now
Online misinformation: they're coming for our brainz nowSara-Jayne Terp
 
Sj terp ciwg_nyc2017_credibility_belief
Sj terp ciwg_nyc2017_credibility_beliefSj terp ciwg_nyc2017_credibility_belief
Sj terp ciwg_nyc2017_credibility_beliefSara-Jayne Terp
 
Belief: learning about new problems from old things
Belief: learning about new problems from old thingsBelief: learning about new problems from old things
Belief: learning about new problems from old thingsSara-Jayne Terp
 
Session 10 handling bigger data
Session 10 handling bigger dataSession 10 handling bigger data
Session 10 handling bigger dataSara-Jayne Terp
 
Session 09 learning relationships.pptx
Session 09 learning relationships.pptxSession 09 learning relationships.pptx
Session 09 learning relationships.pptxSara-Jayne Terp
 

Mehr von Sara-Jayne Terp (20)

Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...
 
Risk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of ageRisk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of age
 
disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...
 
Cognitive security: all the other things
Cognitive security: all the other thingsCognitive security: all the other things
Cognitive security: all the other things
 
The Business(es) of Disinformation
The Business(es) of DisinformationThe Business(es) of Disinformation
The Business(es) of Disinformation
 
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umaryland2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
 
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
 
2021-02-10_CogSecCollab_UBerkeley
2021-02-10_CogSecCollab_UBerkeley2021-02-10_CogSecCollab_UBerkeley
2021-02-10_CogSecCollab_UBerkeley
 
Using AMITT and ATT&CK frameworks
Using AMITT and ATT&CK frameworksUsing AMITT and ATT&CK frameworks
Using AMITT and ATT&CK frameworks
 
2020 12 nyu-workshop_cog_sec
2020 12 nyu-workshop_cog_sec2020 12 nyu-workshop_cog_sec
2020 12 nyu-workshop_cog_sec
 
2020 09-01 disclosure
2020 09-01 disclosure2020 09-01 disclosure
2020 09-01 disclosure
 
2019 11 terp_mansonbulletproof_master copy
2019 11 terp_mansonbulletproof_master copy2019 11 terp_mansonbulletproof_master copy
2019 11 terp_mansonbulletproof_master copy
 
BSidesLV 2018 talk: social engineering at scale, a community guide
BSidesLV 2018 talk: social engineering at scale, a community guideBSidesLV 2018 talk: social engineering at scale, a community guide
BSidesLV 2018 talk: social engineering at scale, a community guide
 
Social engineering at scale
Social engineering at scaleSocial engineering at scale
Social engineering at scale
 
engineering misinformation
engineering misinformationengineering misinformation
engineering misinformation
 
Online misinformation: they're coming for our brainz now
Online misinformation: they're coming for our brainz nowOnline misinformation: they're coming for our brainz now
Online misinformation: they're coming for our brainz now
 
Sj terp ciwg_nyc2017_credibility_belief
Sj terp ciwg_nyc2017_credibility_beliefSj terp ciwg_nyc2017_credibility_belief
Sj terp ciwg_nyc2017_credibility_belief
 
Belief: learning about new problems from old things
Belief: learning about new problems from old thingsBelief: learning about new problems from old things
Belief: learning about new problems from old things
 
Session 10 handling bigger data
Session 10 handling bigger dataSession 10 handling bigger data
Session 10 handling bigger data
 
Session 09 learning relationships.pptx
Session 09 learning relationships.pptxSession 09 learning relationships.pptx
Session 09 learning relationships.pptx
 

Kürzlich hochgeladen

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Kürzlich hochgeladen (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Big Data Experiences from the Front Line

  • 1. Big Data and Me: experiences from the front line Sara-Jayne Farmer Change Assembly April 22nd 2013
  • 2. ME
  • 3. Me • Data Scientist • Using data to: – connect communities – improve access to information – so people can make better decisions – on both small and large scales • It’s all about people: – Local people: know their needs; need more information – Local technologists: have skills; need connections – Large organisations: have resources; need guidance
  • 4. Some of those People (smart, talented, dedicated hackers in Haiti, January 2013)
  • 5. My Personal Three Vs • Variety – Data all over the place – Csv, json, xml, excel, pdf, text, webpages, rss, scanned pages, images, videos, audiofiles, maps, proprietary. Etc. • Velocity – Streams updating too fast for a mapping team (100-200 people) to handle – Pages updating too frequently to check by hand • Volume – Can’t open the data in a spreadsheet – Can’t fit the data on my laptop – Maxes out my credit card (thank you Amazon!)
  • 7. “more people have mobile phones than toilets” – UN, March 2013
  • 8. But… but… there are always data issues… • Datasets were difficult to find • No data available after 2010 • Hard to track provenance – e.g. what decisions did the people creating these datasets make? What assumptions? • Data was rounded up • Countrynames didn’t match between sets • Multiple charactersets (e.g. Å, A, Ԇ) • Messy formatting (merges, ‘explanations’ etc)
  • 9. e.g. Country Names DR Congo in Data.UN.Org: • “Congo, Democratic Republic of the”, “Congo Democratic”, “Democratic Republic of the Congo”, “Congo (Democratic Republic of the)”, “Congo, Dem. Rep.”, “Congo Dem. Rep.”, “Congo, Democratic Republic of”, “Dem. Rep. of Congo”, “Dem. Rep. of the Congo” DR Congo in common standards: • “Democratic Republic of the Congo” (UN Stats), “Congo, The Democratic Republic of the” (ISO3166), “Congo, Democratic Republic of the” (FIPS10, Stanag), “180” (UN Stats), “COD” (ISO3166, Stanag), “CG” (FIPS10)
  • 11. And interpretation • Hang on… don’t some people have more than one phone? • And how do you count the people without toilets? • What if the cities have lots of phones and toilets, and the rural areas don’t? • Where does my composting toilet fit in this? • How big were these surveys? • What do we do with the zeros? • Etc…
  • 15. And alternative alternatives… • Social media proxies • Grassroots maps • Etc.
  • 18. The Humans+Tools Solution: Crisismapping
  • 26. Use
  • 27. BUT WE NEED MORE DATA SCIENTISTS…
  • 28. Build and Connect Communities
  • 31. Big Data and Me: experiences from the front line Sara-Jayne Farmer http://www.changeassembly.com/ @bodaceacat
  • 36. Tools
  • 38. NYC Meetups (see meetup.com)