SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
EXPOSING ALGORITHMS
COMPUTATIONAL
JOURNALISM LAB,
UNIVERSITY OF MARYLAND
COMPUTATIONAL
JOURNALISM
▸ Develop tools for Newsrooms
▸ Data gathering
▸ Story tracking
▸ Personalized news
▸ Comment moderation
▸ Using computational methods to
investigate a story
▸ Algorithmic accountability and
transparency
Applying computer
science to journalism
http://www.wordclouds.com
ALGORITHM: POWER, AUTHORITY
GOOGLE
CASE STUDY
GOOGLE AUTOCOMPLETE FAQ
▸ “…we exclude a narrow class of search queries related to
pornography, violence, hate speech, and copyright
infringement.”
GOOGLE AUTOCOMPLETE FAQ
▸ “…we exclude a narrow class of search queries related to
pornography, violence, hate speech, and copyright
infringement.”
▸ Criteria: Boundaries of censorship; Differences among
search engines; Mistakes?
INPUT - OUTPUT STUDY
OutputInput
Warning!
This presentation contains explicit language.
N. Diakopoulos. Sex, Violence, and Autocomplete Algorithms. Slate. 2013.
What are the criteria?
SEARCH ENGINES ARE COMPLICATED!
▸ Are we using search terms
that people in real life use?
▸ Personalization (IP, profile,
history)
▸ Randomization, A/B tests
▸ …not to mention Google
doesn't want people
scraping their results (ack!)
UBER
CASE STUDY
▸ Discriminatory/unfair
▸ Mistake that denies a service
▸ Censorship
▸ Breaks law or social norm
▸ False prediction
▸ Violation of privacy
PREVIOUS
WORK
▸ Surge pricing triggered by
car requests outnumbering
available cars (demand >
supply)
▸ Goal of surge pricing:
▸ Encourage more drivers
on the road
▸ Redistribute current
drivers to areas of high
demand
▸ Surge pricing triggered by
car requests outnumbering
available cars (demand >
supply)
▸ Goal of surge pricing:
▸ Encourage more drivers
on the road
▸ Redistribute current
drivers to areas of high
demand
PREVIOUS
WORK
CURRENT
▸ Propose service quality
may not be the same
across D.C.
▸ Expected Wait Time proxy
for service: combines car
availability, current and
historical surge pricing,
other hidden factors.
▸ If true, can this be
predicted by census data?
APPROACHES, TOOLS
▸ Data sources
▸ Uber API,	`uber.py`, census.gov resources (tons, free)
▸ Spatial sampling across the District
▸ Python GIS-related libraries (`geopy`,	`address`,	`cenpy`)
▸ The http://data.fcc.gov/ API returns an address when given an latitude and longitude
▸ Sample grid-style, averaged to census tracts
▸ Data wrangling and statistics
▸ `pandas`,	`numpy`,	`statsmodels`
▸ Visualization
▸ CARTO for mapping (3 maps for free) + Adobe Illustrator
▸ `matplotlib` or `seaborn` for graphs
▸ with touch of Adobe Illustrator
APPROACH - BASICALLY ALL PYTHON
COLLECTION
▸ Determine our sampling locations:
▸ Spatial sampling DC -> grid (how dense?)
▸ Temporal sampling -> 3 min (why?)
▸ Uber API rate limits,
▸ #API key access
▸ Address validation
▸ https://github.com/comp-journalism/2016-03-wapo-uber/
blob/master/Mapping_points_across_DC.ipynb
TEXT
LOCATIONS PASSED TO UBER API
UBER DATA
▸ Expected Wait Time from
Uber API for each location
every 3 minutes over 4 weeks
▸ Calculated as mean
expected wait time per
tract (MEWT)
▸ Proportion calculated as
percentage time each tract
spent with a surge price
multiplier > 1
AMERICAN
COMMUNITY
SURVEY 2014
▸ % People of Color (POC)
▸ % Poverty
▸ Population Density
▸ Median Household
Income
▸ Z-score normalized
APPROACH - STILL BASICALLY ALL PYTHON
DATA PROCESSING
▸ Collapse data across time (4 weeks in February 2016)
▸ Average data within census tracts
▸ Select only uberX “product_types”
▸ One “ETA” and one “Surge Price Multiplier” value per tract
▸ Census / American Community Survey data:
▸ Poverty -> Calculate % in each tract
▸ Income -> Median income per tract
▸ Race/Ethnicity -> Dichotomized %
▸ Population density (population x tract land mass)
▸ Normalized to z-scores
ESTIMATED WAIT TIMES FOR UBERX
Map showing
average ETA for
an uberX.
Northwest DC
has a mostly
white racial
demographic,
whereas
southeast is
mostly people of
color.
Tract 92.03.
75% POC, Short wait times
Universities, restaurants, bars…
APPROACH - PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON
REGRESSION (GLM, STATSMODELS)
% POC***
Population Density***
Median Income
% Poverty
% POC : % Poverty**
% POC : IncomeExplanatory Variables:
WHAT NEXT - MORE DATA
▸ Does it reflect differences in
Supply/Demand? -> Taxi FOIA
▸ Crime stats -> perception vs facts
▸ Banked / unbanked stats (~14%
in DC)
▸ Smart phone ownership
▸ Would the results differ in a
different month or city?
DESIGNING FOR TRANSPARENCY AND ACCESSIBILITY
WHAT NEXT - DESIGN?
▸ What if:
▸ Taxi demand is high in census tracts underserved
by Uber in DC?
▸ Difference in price? Accessibility? Marketing?
▸ Unbanked people with no bank accounts or smart
phones could hail via voice? Pay with cash?
▸ Crime perception is different from real life?
▸ Could we indicate crime stats in-app?
▸ Should we?
▸ TRANSPARENCY! https://github.com/comp-
journalism/2016-03-wapo-uber
▸ datalensdc.com, Houston, Georgetown, UBER,
AARP…
ALGORITHMIC
ACCOUNTABILITY
IN JOURNALISM
▸ Opportunity for UBER to
check our work
▸ Opportunity for
audience to check
▸ Spurs us to write better,
documented code,
check our conclusions
and assumptions
▸ Others can use code /
data for other stories
https://github.com/comp-journalism
▸ Code: GitHub
▸ IPython Notebook
▸ Documentation:
README.md
▸ Data: Google Drive
▸ Save wrangled data at
intervals in .csv files
▸ Programmatic solutions
where possible
https://github.com/comp-journalism
Free
Open Source
ALGORITHMIC
ACCOUNTABILITY
IN JOURNALISM
QUESTIONS?
COLLABORATIONS?
Jennifer A. Stark
@_JAStark
starkja@umd.edu
https://github.com/comp-journalism

Weitere ähnliche Inhalte

Ähnlich wie Exposing algorithms pydatadc2016

2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...
2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...
2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...Azavea
 
Bright talk dovu
Bright talk dovuBright talk dovu
Bright talk dovuArwen Smit
 
Transit 2.0 - World Intelligent Transportation Systems Congress
Transit 2.0 - World Intelligent Transportation Systems CongressTransit 2.0 - World Intelligent Transportation Systems Congress
Transit 2.0 - World Intelligent Transportation Systems CongressAaron Antrim
 
Machine Learning statistical model using Transportation data
Machine Learning statistical model using Transportation dataMachine Learning statistical model using Transportation data
Machine Learning statistical model using Transportation datajagan477830
 
Ordnance Survey and Linked Data
Ordnance Survey and Linked Data Ordnance Survey and Linked Data
Ordnance Survey and Linked Data Talis Consulting
 
Creating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeCreating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeTyrone Grandison
 
Equality & Technology_Gregory_2018
Equality & Technology_Gregory_2018 Equality & Technology_Gregory_2018
Equality & Technology_Gregory_2018 karengregory2000
 
A predictive model for mapping crime using big data analytics
A predictive model for mapping crime using big data analyticsA predictive model for mapping crime using big data analytics
A predictive model for mapping crime using big data analyticseSAT Journals
 
Access to markets, technologies, and services (Carlo Azzarri, IFPRI)
Access to markets, technologies, and services (Carlo Azzarri, IFPRI)Access to markets, technologies, and services (Carlo Azzarri, IFPRI)
Access to markets, technologies, and services (Carlo Azzarri, IFPRI)ExternalEvents
 
Analytic Journalism: Investing in an Intellectual Portfolio to Secure Journal...
Analytic Journalism: Investing in an Intellectual Portfolio to Secure Journal...Analytic Journalism: Investing in an Intellectual Portfolio to Secure Journal...
Analytic Journalism: Investing in an Intellectual Portfolio to Secure Journal...J T "Tom" Johnson
 
Analytics-Based Crime Prediction
Analytics-Based Crime PredictionAnalytics-Based Crime Prediction
Analytics-Based Crime PredictionProdapt Solutions
 
Orchestrating Collective Intelligence
Orchestrating Collective IntelligenceOrchestrating Collective Intelligence
Orchestrating Collective IntelligenceTuri, Inc.
 
Google Insights and public data
Google Insights and public data Google Insights and public data
Google Insights and public data Digital Leaders
 
Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...
Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...
Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...Tarun Amarnath
 
Webinar: Using R for Advanced Analytics with MongoDB
Webinar: Using R for Advanced Analytics with MongoDBWebinar: Using R for Advanced Analytics with MongoDB
Webinar: Using R for Advanced Analytics with MongoDBMongoDB
 
Not all data is born equal - B.C Open Data Summit 2013
Not all data is born equal - B.C Open Data Summit 2013Not all data is born equal - B.C Open Data Summit 2013
Not all data is born equal - B.C Open Data Summit 2013Stéphane Guidoin
 

Ähnlich wie Exposing algorithms pydatadc2016 (20)

2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...
2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...
2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...
 
Bright talk dovu
Bright talk dovuBright talk dovu
Bright talk dovu
 
GIS@NIH
GIS@NIHGIS@NIH
GIS@NIH
 
Transit 2.0 - World Intelligent Transportation Systems Congress
Transit 2.0 - World Intelligent Transportation Systems CongressTransit 2.0 - World Intelligent Transportation Systems Congress
Transit 2.0 - World Intelligent Transportation Systems Congress
 
Machine Learning statistical model using Transportation data
Machine Learning statistical model using Transportation dataMachine Learning statistical model using Transportation data
Machine Learning statistical model using Transportation data
 
Ordnance Survey and Linked Data
Ordnance Survey and Linked Data Ordnance Survey and Linked Data
Ordnance Survey and Linked Data
 
Creating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeCreating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With Purpose
 
Equality & Technology_Gregory_2018
Equality & Technology_Gregory_2018 Equality & Technology_Gregory_2018
Equality & Technology_Gregory_2018
 
A predictive model for mapping crime using big data analytics
A predictive model for mapping crime using big data analyticsA predictive model for mapping crime using big data analytics
A predictive model for mapping crime using big data analytics
 
Access to markets, technologies, and services (Carlo Azzarri, IFPRI)
Access to markets, technologies, and services (Carlo Azzarri, IFPRI)Access to markets, technologies, and services (Carlo Azzarri, IFPRI)
Access to markets, technologies, and services (Carlo Azzarri, IFPRI)
 
Analytic Journalism: Investing in an Intellectual Portfolio to Secure Journal...
Analytic Journalism: Investing in an Intellectual Portfolio to Secure Journal...Analytic Journalism: Investing in an Intellectual Portfolio to Secure Journal...
Analytic Journalism: Investing in an Intellectual Portfolio to Secure Journal...
 
Analytics-Based Crime Prediction
Analytics-Based Crime PredictionAnalytics-Based Crime Prediction
Analytics-Based Crime Prediction
 
Orchestrating Collective Intelligence
Orchestrating Collective IntelligenceOrchestrating Collective Intelligence
Orchestrating Collective Intelligence
 
Google Insights and public data
Google Insights and public data Google Insights and public data
Google Insights and public data
 
Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...
Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...
Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...
 
Purdue IronHacks
Purdue IronHacksPurdue IronHacks
Purdue IronHacks
 
Webinar: Using R for Advanced Analytics with MongoDB
Webinar: Using R for Advanced Analytics with MongoDBWebinar: Using R for Advanced Analytics with MongoDB
Webinar: Using R for Advanced Analytics with MongoDB
 
PPT.pptx
PPT.pptxPPT.pptx
PPT.pptx
 
Technical Seminar
Technical SeminarTechnical Seminar
Technical Seminar
 
Not all data is born equal - B.C Open Data Summit 2013
Not all data is born equal - B.C Open Data Summit 2013Not all data is born equal - B.C Open Data Summit 2013
Not all data is born equal - B.C Open Data Summit 2013
 

Kürzlich hochgeladen

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 

Kürzlich hochgeladen (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 

Exposing algorithms pydatadc2016

  • 2. COMPUTATIONAL JOURNALISM ▸ Develop tools for Newsrooms ▸ Data gathering ▸ Story tracking ▸ Personalized news ▸ Comment moderation ▸ Using computational methods to investigate a story ▸ Algorithmic accountability and transparency Applying computer science to journalism
  • 5.
  • 7. GOOGLE AUTOCOMPLETE FAQ ▸ “…we exclude a narrow class of search queries related to pornography, violence, hate speech, and copyright infringement.”
  • 8. GOOGLE AUTOCOMPLETE FAQ ▸ “…we exclude a narrow class of search queries related to pornography, violence, hate speech, and copyright infringement.” ▸ Criteria: Boundaries of censorship; Differences among search engines; Mistakes?
  • 9. INPUT - OUTPUT STUDY OutputInput
  • 11. N. Diakopoulos. Sex, Violence, and Autocomplete Algorithms. Slate. 2013.
  • 12. What are the criteria?
  • 13. SEARCH ENGINES ARE COMPLICATED! ▸ Are we using search terms that people in real life use? ▸ Personalization (IP, profile, history) ▸ Randomization, A/B tests ▸ …not to mention Google doesn't want people scraping their results (ack!)
  • 15. ▸ Discriminatory/unfair ▸ Mistake that denies a service ▸ Censorship ▸ Breaks law or social norm ▸ False prediction ▸ Violation of privacy
  • 16. PREVIOUS WORK ▸ Surge pricing triggered by car requests outnumbering available cars (demand > supply) ▸ Goal of surge pricing: ▸ Encourage more drivers on the road ▸ Redistribute current drivers to areas of high demand
  • 17. ▸ Surge pricing triggered by car requests outnumbering available cars (demand > supply) ▸ Goal of surge pricing: ▸ Encourage more drivers on the road ▸ Redistribute current drivers to areas of high demand PREVIOUS WORK
  • 18. CURRENT ▸ Propose service quality may not be the same across D.C. ▸ Expected Wait Time proxy for service: combines car availability, current and historical surge pricing, other hidden factors. ▸ If true, can this be predicted by census data?
  • 19. APPROACHES, TOOLS ▸ Data sources ▸ Uber API, `uber.py`, census.gov resources (tons, free) ▸ Spatial sampling across the District ▸ Python GIS-related libraries (`geopy`, `address`, `cenpy`) ▸ The http://data.fcc.gov/ API returns an address when given an latitude and longitude ▸ Sample grid-style, averaged to census tracts ▸ Data wrangling and statistics ▸ `pandas`, `numpy`, `statsmodels` ▸ Visualization ▸ CARTO for mapping (3 maps for free) + Adobe Illustrator ▸ `matplotlib` or `seaborn` for graphs ▸ with touch of Adobe Illustrator
  • 20. APPROACH - BASICALLY ALL PYTHON COLLECTION ▸ Determine our sampling locations: ▸ Spatial sampling DC -> grid (how dense?) ▸ Temporal sampling -> 3 min (why?) ▸ Uber API rate limits, ▸ #API key access ▸ Address validation ▸ https://github.com/comp-journalism/2016-03-wapo-uber/ blob/master/Mapping_points_across_DC.ipynb
  • 22. UBER DATA ▸ Expected Wait Time from Uber API for each location every 3 minutes over 4 weeks ▸ Calculated as mean expected wait time per tract (MEWT) ▸ Proportion calculated as percentage time each tract spent with a surge price multiplier > 1
  • 23. AMERICAN COMMUNITY SURVEY 2014 ▸ % People of Color (POC) ▸ % Poverty ▸ Population Density ▸ Median Household Income ▸ Z-score normalized
  • 24. APPROACH - STILL BASICALLY ALL PYTHON DATA PROCESSING ▸ Collapse data across time (4 weeks in February 2016) ▸ Average data within census tracts ▸ Select only uberX “product_types” ▸ One “ETA” and one “Surge Price Multiplier” value per tract ▸ Census / American Community Survey data: ▸ Poverty -> Calculate % in each tract ▸ Income -> Median income per tract ▸ Race/Ethnicity -> Dichotomized % ▸ Population density (population x tract land mass) ▸ Normalized to z-scores
  • 25. ESTIMATED WAIT TIMES FOR UBERX Map showing average ETA for an uberX. Northwest DC has a mostly white racial demographic, whereas southeast is mostly people of color. Tract 92.03. 75% POC, Short wait times Universities, restaurants, bars…
  • 26. APPROACH - PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON REGRESSION (GLM, STATSMODELS) % POC*** Population Density*** Median Income % Poverty % POC : % Poverty** % POC : IncomeExplanatory Variables:
  • 27. WHAT NEXT - MORE DATA ▸ Does it reflect differences in Supply/Demand? -> Taxi FOIA ▸ Crime stats -> perception vs facts ▸ Banked / unbanked stats (~14% in DC) ▸ Smart phone ownership ▸ Would the results differ in a different month or city?
  • 28. DESIGNING FOR TRANSPARENCY AND ACCESSIBILITY WHAT NEXT - DESIGN? ▸ What if: ▸ Taxi demand is high in census tracts underserved by Uber in DC? ▸ Difference in price? Accessibility? Marketing? ▸ Unbanked people with no bank accounts or smart phones could hail via voice? Pay with cash? ▸ Crime perception is different from real life? ▸ Could we indicate crime stats in-app? ▸ Should we? ▸ TRANSPARENCY! https://github.com/comp- journalism/2016-03-wapo-uber ▸ datalensdc.com, Houston, Georgetown, UBER, AARP…
  • 29. ALGORITHMIC ACCOUNTABILITY IN JOURNALISM ▸ Opportunity for UBER to check our work ▸ Opportunity for audience to check ▸ Spurs us to write better, documented code, check our conclusions and assumptions ▸ Others can use code / data for other stories https://github.com/comp-journalism
  • 30. ▸ Code: GitHub ▸ IPython Notebook ▸ Documentation: README.md ▸ Data: Google Drive ▸ Save wrangled data at intervals in .csv files ▸ Programmatic solutions where possible https://github.com/comp-journalism Free Open Source ALGORITHMIC ACCOUNTABILITY IN JOURNALISM