SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Building AI That Works for Everyone
AI Ethics for Technical People
About Me
• Ph.D. Statistician
• Labor Economist
• Software Developer
• Artist
• Midwest Farm Girl
• Pronouns: she, her, hers
3/1/20XX SAMPLE FOOTER TEXT 2
About This Talk
Focused on “high-stakes AI”.
• Defined by Smbasivan, Highball, Akron, Parish, and Aroyo
(2021)
• I do recommend these exercises for everyone.
AI Ethics problems require input from technical people.
Many of our biggest issues come from manual verification of
automated systems.
When I say “AI that works for everyone,” I mean everyone.
• People using the model
• People affected by the model
• Data labelers
• Data engineers
• Machine learning engineers
• Data scientists
3/1/20XX SAMPLE FOOTER TEXT 3
An Actual LinkedIn Poll from an AI Ethics Expert
3/1/20XX SAMPLE FOOTER TEXT 4
Predicted
Cancer
Predicted No
Cancer
Has Cancer TP (True
Positive)
FP (False
Negative)
Does Not Have
Cancer
FP (False
Positive)
TN (True
Negative)
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠
𝑇𝑜𝑡𝑎𝑙 𝑃𝑎𝑡𝑖𝑒𝑛𝑡𝑠
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠
Which model would you rather have?
A black box cancer screening model with 99% accuracy?
An explainable cancer screening model with 90% accuracy?
This is the wrong question!
Predicted
Cancer
Predicted
No Cancer
Row
Percents
Has Cancer Has Cancer
More
Screening
Has Cancer
and Does
Not Know
1% of
Patients
Does Not
Have
Cancer
No Cancer
More
Screening
No Extra
Screening
No Cancer
99% of
Patients
Predicted
Cancer
Predicted No
Cancer
Row
Percents
Has Cancer TP (True
Positive)
FN (False
Negative)
1% of
Patients
Does Not
Have Cancer
FP (False
Positive)
TN (True
Negative)
99% of
Patients
Typical AI/ML Pipeline
Failure Analysis
Fairness Analysis
Impact Analysis
Feedback on model
performance in production is
the cornerstone of an AI Ethics
practice.
People
affected by
Decisions give
feedback
Operator
reviews
Decisions
Training Data and
Code produce a
model
Scoring Data
and Model
produce decisions
P.Yes P.No
A.Yes TrP FN
A.No FP TM
Scoring Data
and Model
produce decisions
Typical AI/ML Pipeline In practice, anything that isn’t
model training or scoring is:
• Ad hoc
• Manual
• Prone to data errors
People
affected by
Decisions give
feedback
Operator
reviews
Decisions
Training Data and
Code produce a
model
P.Yes P.No
A.Yes TrP FN
A.No FP TM
Failure
Analysis
Fairness Analysis
Impact Analysis
Human Agency
and Oversight
Fairness
Accountability
Prevention
of Harm
Social and
Environmental
Well-Being
Technical
Robustness and
Safety
Privacy and
Data
Governance
Technical Pillars of Trustworthy AI
Does this model work
for everyone?
Human Agency
and Oversight
Fairness
Accountability
Prevention
of Harm
Social and
Environmental
Well-Being
Technical
Robustness and
Safety
Privacy and
Data
Governance
Prevention
of Harm
How often does the model
fail, and what is the
impact?
Are model failures the
same for everyone?
How do we know the
model is failing?
Fairness Analysis
Failure Analysis Failure Monitoring Impact Analysis
Typical AI/ML Pipeline
Failure Analysis
Fairness Analysis
Impact Analysis
Technical leaders and
individual contributors have a
role in each of these pillars.
People affected
by Decisions
give feedback
Operator
reviews
Decisions
Training Data and
Code produce a
model
Scoring Data and
Model produce
decisions
P.Yes P.No
A.Yes TrP FN
A.No FP TM
Human Agency
and Oversight
Prevention
of Harm
Fairness
Social and
Environmental
Well-Being
Privacy and
Data
Governance
Privacy and
Data
Governance
Accountability
Technical
Robustness and
Safety
Technical Pillars of Trustworthy AI
Does this model work
for everyone?
Human Agency
and Oversight
Fairness
Accountability
Prevention
of Harm
Social and
Environmental
Well-Being
Technical
Robustness and
Safety
Privacy and
Data
Governance
Prevention
of Harm
How often does the model
fail, and what is the
impact?
Are model failures the
same for everyone?
How do we know the
model is failing?
Fairness Analysis
Failure Analysis Failure Monitoring Impact Analysis
Failure Analysis
Cancer
Screening
Predicted
Cancer
Predicted
No Cancer
Has Cancer Has Cancer
More
Screening
Has Cancer
and Does
Not Know
Does Not
Have
Cancer
No Cancer
More
Screening
No Extra
Screening
No Cancer
1. Find the cell in the confusion
matrix that causes the most harm
to the least advantaged group.
2. Analyze rates and outcomes for
that cell.
Fairness
Prevention
of Harm
Fraud
Screening
Predicted
Fraud
Predicted
No Fraud
Fraudulent
Account
Audit,
Model
Makes $
Fraud and
No Audit,
Model
Loses $
Honest
Account
No Fraud,
Customer
Audit
No Fraud
No Audit
Aequitas Fairness Tree
Is being predicted positive punitive or assistive?
Which group is harmed most by mistakes?
Can you intervene with most
people or just a subset?
Which group is harmed most by mistakes?
# 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
𝐺𝑟𝑜𝑢𝑝 𝑆𝑖𝑧𝑒
False
Discovery
Rate (FDR)
False
Positive
Rate
True Positive
Rate
# 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠
𝐺𝑟𝑜𝑢𝑝 𝑆𝑖𝑧𝑒
False Negative
Rate (Recall)
False
Omission
Rate
Fairness Tree: Data Science and Public Policy, Carnegie Mellon University
http://www.datasciencepublicpolicy.org/our-work/tools-guides/aequitas/
Everyone
People who get
intervention
People who
do not get
intervention
Most
Subset
Everyone
People Not
Assisted
People with
Actual Need
Accountability
Technical
Robustness and
Safety
Failure Analysis: Pre-Deployment
• Failure analysis is often ad-hoc and depends heavily on
the data sources available.
• e.g. We may not know how many cancers human screeners
miss.
• Deployment should include automating failure analysis.
• Deployment should include plans for cadence of failure
analysis.
3/1/20XX SAMPLE FOOTER TEXT 13
Accountability
Technical
Robustness and
Safety
Tools for Failure Analysis
• Every model will produce the statistics listed in the
fairness tree. (e.g. sklearn.metrics)
• It is up to the modeling team to decide which statistics
are the most important and to display them in a way that
communicates impact to stakeholders.
• Deciding on a set of metrics that should be monitored
post-deployment is part of the analysis.
• Once the analysis is done, it should be automated so it
can be re-done at regular intervals. These scripts are
usually tailored to the business problem.
• AWS Clarify has a nice set of tools for calculating and displaying
statistics.
Accountability
Technical
Robustness and
Safety
Failure Analysis Depends on Good Data
Failure
Analysis
Fairness Analysis
Impact Analysis
People
affected by
Decisions give
feedback
Operator
reviews
Decisions
Training Data and
Code produce a
model
Scoring Data
and Model
produce decisions
P.Yes P.No
A.Yes TrP FN
A.No FP TM
Accountability
Technical
Robustness and
Safety
"Everyone wants to do the model work, not the data work": Data Cascades in
High-Stakes AI,
Nithya Sambasivan and Shivani Kapania and Hannah Highfill and Diana Akrong and
Praveen Kumar Paritosh and Lora Mois Aroyo
(2021)
Technical Pillars of Trustworthy AI
Does this model work
for everyone?
Human Agency
and Oversight
Fairness
Accountability
Prevention
of Harm
Social and
Environmental
Well-Being
Technical
Robustness and
Safety
Privacy and
Data
Governance
Prevention
of Harm
How often does the model
fail, and what is the
impact?
Are model failures the
same for everyone?
How do we know the
model is failing?
Fairness Analysis
Failure Analysis Failure Monitoring Impact Analysis
Fairness Analysis
1. Focus on cell where most
harm occurs.
2. Compare performance for
underrepresented and/or
unprivileged groups.
Fairness
Prevention
of Harm
Fraud
Screening
Group A
Predicted
Fraud
Predicted
No Fraud
Fraudulent
Account
Audit,
Model
Makes $
Fraud and
No Audit,
Model
Loses $
Honest
Account
No Fraud,
Customer
Audit
No Fraud
No Audit
Fraud
Screening
Group B
Predicted
Fraud
Predicted
No Fraud
Fraudulent
Account
Audit,
Model
Makes $
Fraud and
No Audit,
Model
Loses $
Honest
Account
No Fraud,
Customer
Audit
No Fraud
No Audit
Fraud
Screening
Group C
Predicted
Fraud
Predicted
No Fraud
Fraudulent
Account
Audit,
Model
Makes $
Fraud and
No Audit,
Model
Loses $
Honest
Account
No Fraud,
Customer
Audit
No Fraud
No Audit
Aequitas Fairness Tree
Is being predicted positive punitive or assistive?
Which group is harmed most by mistakes?
Can you intervene with most
people or just a subset?
Which group is harmed most by mistakes?
True Positive
Rate
False Negative
Rate (Recall)
Fairness Tree: Data Science and Public Policy, Carnegie Mellon University
http://www.datasciencepublicpolicy.org/our-work/tools-guides/aequitas/
Everyone
People who get
intervention
People who
do not get
intervention
Most
Subset
Everyone
People Not
Assisted
People with
Actual Need
FP/GS
Parity
FDR Parity FPR Parity
Recall
Parity
FN/GS
Parity
FOR Parity FNR Parity
# 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
𝐺𝑟𝑜𝑢𝑝 𝑆𝑖𝑧𝑒
False
Discovery
Rate (FDR)
False
Positive
Rate
# 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠
𝐺𝑟𝑜𝑢𝑝 𝑆𝑖𝑧𝑒
False
Omission
Rate
Fairness
Prevention
of Harm
Technical Pillars of Trustworthy AI
Does this model work
for everyone?
Human Agency
and Oversight
Fairness
Accountability
Prevention
of Harm
Social and
Environmental
Well-Being
Technical
Robustness and
Safety
Privacy and
Data
Governance
Prevention
of Harm
How often does the model
fail, and what is the
impact?
Are model failures the
same for everyone?
How do we know the
model is failing?
Fairness Analysis
Failure Analysis Failure Monitoring Impact Analysis
Failure Monitoring
Failure
Analysis
Fairness Analysis
Impact Analysis
People
affected by
Decisions give
feedback
Operator
reviews
Decisions
Training Data and
Code produce a
model
Scoring Data
and Model
produce decisions
P.Yes P.No
A.Yes TrP FN
A.No FP TM
Human Agency
and Oversight
Prevention
of Harm
How do we know the model is failing?
• What pipelines exist for people to give feedback on model
performance?
• Experts/Operators who are using the models.
• People who are affected by the model.
• How do we automate monitoring the most critical model
performance metrics?
• What outside data is available as a check against our assumptions
about the model?
• There are no great tools for checking failures.
• Cloud providers do offer some tools if you are using their cloud (e.g. AWS,
Azure, and Google).
3/1/20XX SAMPLE FOOTER TEXT 22
Human Agency
and Oversight
Prevention
of Harm
Lowest Hanging Fruit: Automate All Data Pipelines
dbt Soda and SodaCL great-expectations deequ
Runs code pipeline
and data checks
Data checks only Data checks only Data checks only
Built-in tests and
SQL-based user
defined tests
Built-in tests and
SQL-based user
defined tests
Built-in tests for
Python
Build-in tests and
Spark/PySpark-
based user defined
tests.
SQL based with
open source dbt
core and
subscription-based
cloud option
SQL-based with
open source
SodaCore and
subscription-based
SodaCloud
Python-based Spark/PySpark-
based
Human Agency
and Oversight
Prevention
of Harm
Hardening Pipelines: Obvious Tests for Tabular Data
• Uniqueness: “This column/combination of columns
should be unique by row.”
• Correctness: “Only these values allowed in this column.”
• Missingness: “These columns should be populated for
X% of rows.”
• Range: “Nothing bigger/smaller than [a,b] should be in
this column.”
• … You get the picture.
3/1/20XX SAMPLE FOOTER TEXT 24
Human Agency
and Oversight
Prevention
of Harm
Hardening Pipelines: Less Obvious Tests for Tabular Data
• Feature Drift: Are distributions of inputs changing?
• Model Drift: Are the model predictions changing?
• Kolmogorov-Smirnov: What is the probability of
observing the data we see today (or something weirder)
compared to what we think the data should look like?
• A p-value of 0.05 means this test alarms 5% of the time when
all is normal. Use False Discovery Rate to find true errors.
• KL Divergence (Population Stability Index)
• Sensitive to the bins you pick.
• These tests are sensitive to outliers. Outliers happen all
the time.
Human Agency
and Oversight
Prevention
of Harm
Data Pipelines Hardened? Automate the Workflow
Model-card-
toolkit
Metaflow deepchecks Luigi Airflow
Open source
systems for
creating model
cards.
Runs code
pipeline and
data checks.
Developed
specifically for
data science.
Data checks
and
performance
checks for full
model
pipeline.
Full featured
and let you
automate all of
your scripts for
everything.
Like Luigi but
automates
some of the
more tedious
parts.
Python based Python based Python based Python-based Python based.
DAG: Directed Acyclic Graph
• A collection of tasks and their dependencies.
• Directed: Each task that requires output from previous tasks knows its own
dependencies.
• Acyclic: A graph term. It means there’s no point where a task depends on output
from a task that can’t be performed before the current task.
Model Card
• Simplified explanation of model inputs, outputs, and assumptions.
Human Agency
and Oversight
Prevention
of Harm
Technical Pillars of Trustworthy AI
Does this model work
for everyone?
Human Agency
and Oversight
Fairness
Accountability
Prevention
of Harm
Social and
Environmental
Well-Being
Technical
Robustness and
Safety
Privacy and
Data
Governance
Prevention
of Harm
How often does the model
fail, and what is the
impact?
Are model failures the
same for everyone?
How do we know the
model is failing?
Fairness Analysis
Failure Analysis Failure Monitoring Impact Analysis
Impact Analysis: AI That Works for Everyone
• The least technical part of AI Ethics.
• Arguably the part of AI Ethics that most needs technical
assistance.
• Part of the initial project plan.
• Local Impacts: This model’s impact on its stakeholders.
• Social Impacts: How does this model contribute to AI’s
larger issues?
• Mitigation Analysis: What can we do within the scope of
this project to mitigate negative impacts?
SAMPLE FOOTER TEXT
Social and
Environmental
Well-Being
Privacy and
Data
Governance
Local Impact of an AI Model
• Does this model improve working conditions for the people
who use it?
• e.g. An AI model that requires a lot of data input from nurses and
doctors may increase their job responsibilities without
compensating or rewarding them for extra effort.
• Does this model improve outcomes for people affected by the
model?
• e.g. A fraud detection model may speed payment for most
individuals.
• Does this model make things worse for some individuals?
• e.g. A fraud detection model may speed payment for most
individuals and slow payment for others to an unacceptable level.
• Are we collecting only the data we need? Are we keeping that
data safe?
• e.g. Does my word game really need my location?
Social and
Environmental
Well-Being
Privacy and
Data
Governance
Social Impact of an AI Model
• Environmental cost of an AI model is non-negligible:
https://openai.com/blog/ai-and-compute/
• We need efficient computation, and that is a technical problem.
• Many AI models profit from free or underpaid labor:
https://www.wired.com/story/foundations-ai-riddled-errors/
• Labeling software should be good software.
• Large-scale adoption of AI models has other effects.
• Never mind the trolly problem: Suppose 10% of the cars on the road are self-
driving. Now, suppose there’s a network outage during a heavy traffic period.
Social and
Environmental
Well-Being
Privacy and
Data
Governance
AI Ethics and Model Development
• Pre-Development
• Impact Analysis: Who will use the model and how?
• Failure Analysis: What is the most impactful failure? What is an acceptable
level of failure?
• Fairness Analysis: What are the underrepresented/unpriviledged groups?
• Failure Monitoring: What development is needed for Human-to-Model
feedback?
• Model Development
• Design and hardening of data pipelines, including privacy.
• Model’s ability to meet failure thresholds.
• Deployment
• Does the model meet criteria set during pre-development?
• Are the requirements in place?
Ethical AI is Good AI and Good AI is Ethical AI
• Ethical AI knows when it fails and the impact of those failures.
• Ethical AI fails in the same way for everyone.
• Ethical AI is monitored for failures and has strong feedback loops that
surface problems quickly.
• Ethical AI is designed for positive impact on the communities where it
is implemented and for society as a whole.
Who doesn’t want that?
Ellis-Lee, Mia. (2008) “Accessible Design is Good Design & Good Design is Accessible Design. Flywheel hosted blog.
https://www.flywheelstrategic.com/thinking/post/flywheel-blog/2018/04/06/accessible-design-is-good-design-good-
design-is-accessible-design

Weitere ähnliche Inhalte

Ähnlich wie Data Con LA 2022 - AI Ethics

Risk management planExecutive SummaryThe past.docx
Risk management planExecutive SummaryThe past.docxRisk management planExecutive SummaryThe past.docx
Risk management planExecutive SummaryThe past.docx
SUBHI7
 
CRITERIA DISTINGUISHED Analyze the origins and evolution of th.docx
CRITERIA DISTINGUISHED Analyze the origins and evolution of th.docxCRITERIA DISTINGUISHED Analyze the origins and evolution of th.docx
CRITERIA DISTINGUISHED Analyze the origins and evolution of th.docx
willcoxjanay
 
BDW16 London - Amjad Zaim, Cognitro Analytics: How Deep is Your Learning
BDW16 London - Amjad Zaim, Cognitro Analytics: How Deep is Your Learning BDW16 London - Amjad Zaim, Cognitro Analytics: How Deep is Your Learning
BDW16 London - Amjad Zaim, Cognitro Analytics: How Deep is Your Learning
Big Data Week
 

Ähnlich wie Data Con LA 2022 - AI Ethics (20)

A.I.pptx
A.I.pptxA.I.pptx
A.I.pptx
 
AI in Healthcare: Real-World Machine Learning Use Cases
AI in Healthcare: Real-World Machine Learning Use CasesAI in Healthcare: Real-World Machine Learning Use Cases
AI in Healthcare: Real-World Machine Learning Use Cases
 
The Dark side of AI: Psychology of automation for data scientists - Alex Pall...
The Dark side of AI: Psychology of automation for data scientists - Alex Pall...The Dark side of AI: Psychology of automation for data scientists - Alex Pall...
The Dark side of AI: Psychology of automation for data scientists - Alex Pall...
 
IE_expressyourself_EssayH
IE_expressyourself_EssayHIE_expressyourself_EssayH
IE_expressyourself_EssayH
 
Risk management planExecutive SummaryThe past.docx
Risk management planExecutive SummaryThe past.docxRisk management planExecutive SummaryThe past.docx
Risk management planExecutive SummaryThe past.docx
 
Learning from the People: Responsibly Encouraging Adoption of Contact Tracing...
Learning from the People: Responsibly Encouraging Adoption of Contact Tracing...Learning from the People: Responsibly Encouraging Adoption of Contact Tracing...
Learning from the People: Responsibly Encouraging Adoption of Contact Tracing...
 
Towards Responsible AI - Global AI Student Conference 2022.pptx
Towards Responsible AI - Global AI Student Conference 2022.pptxTowards Responsible AI - Global AI Student Conference 2022.pptx
Towards Responsible AI - Global AI Student Conference 2022.pptx
 
Towards Responsible AI - KC.pptx
Towards Responsible AI - KC.pptxTowards Responsible AI - KC.pptx
Towards Responsible AI - KC.pptx
 
People Analytics_Introduction
People Analytics_IntroductionPeople Analytics_Introduction
People Analytics_Introduction
 
How ml can improve purchase conversions
How ml can improve purchase conversionsHow ml can improve purchase conversions
How ml can improve purchase conversions
 
CRITERIA DISTINGUISHED Analyze the origins and evolution of th.docx
CRITERIA DISTINGUISHED Analyze the origins and evolution of th.docxCRITERIA DISTINGUISHED Analyze the origins and evolution of th.docx
CRITERIA DISTINGUISHED Analyze the origins and evolution of th.docx
 
AI Governance – The Responsible Use of AI
AI Governance – The Responsible Use of AIAI Governance – The Responsible Use of AI
AI Governance – The Responsible Use of AI
 
Regression and correlation
Regression and correlationRegression and correlation
Regression and correlation
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New Boss
 
Calculating a Sample Size
Calculating a Sample SizeCalculating a Sample Size
Calculating a Sample Size
 
Pragmatic Device Risk Management
Pragmatic Device Risk Management Pragmatic Device Risk Management
Pragmatic Device Risk Management
 
Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it? Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it?
 
ANIn Kolkata April 2024 |Ethics of AI by Abhishek Nandy
ANIn Kolkata April 2024 |Ethics of AI by Abhishek NandyANIn Kolkata April 2024 |Ethics of AI by Abhishek Nandy
ANIn Kolkata April 2024 |Ethics of AI by Abhishek Nandy
 
Supporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationSupporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentation
 
BDW16 London - Amjad Zaim, Cognitro Analytics: How Deep is Your Learning
BDW16 London - Amjad Zaim, Cognitro Analytics: How Deep is Your Learning BDW16 London - Amjad Zaim, Cognitro Analytics: How Deep is Your Learning
BDW16 London - Amjad Zaim, Cognitro Analytics: How Deep is Your Learning
 

Mehr von Data Con LA

Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 

Mehr von Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
 

Kürzlich hochgeladen

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 

Kürzlich hochgeladen (20)

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 

Data Con LA 2022 - AI Ethics

  • 1. Building AI That Works for Everyone AI Ethics for Technical People
  • 2. About Me • Ph.D. Statistician • Labor Economist • Software Developer • Artist • Midwest Farm Girl • Pronouns: she, her, hers 3/1/20XX SAMPLE FOOTER TEXT 2
  • 3. About This Talk Focused on “high-stakes AI”. • Defined by Smbasivan, Highball, Akron, Parish, and Aroyo (2021) • I do recommend these exercises for everyone. AI Ethics problems require input from technical people. Many of our biggest issues come from manual verification of automated systems. When I say “AI that works for everyone,” I mean everyone. • People using the model • People affected by the model • Data labelers • Data engineers • Machine learning engineers • Data scientists 3/1/20XX SAMPLE FOOTER TEXT 3
  • 4. An Actual LinkedIn Poll from an AI Ethics Expert 3/1/20XX SAMPLE FOOTER TEXT 4 Predicted Cancer Predicted No Cancer Has Cancer TP (True Positive) FP (False Negative) Does Not Have Cancer FP (False Positive) TN (True Negative) 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠 𝑇𝑜𝑡𝑎𝑙 𝑃𝑎𝑡𝑖𝑒𝑛𝑡𝑠 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠 Which model would you rather have? A black box cancer screening model with 99% accuracy? An explainable cancer screening model with 90% accuracy? This is the wrong question! Predicted Cancer Predicted No Cancer Row Percents Has Cancer Has Cancer More Screening Has Cancer and Does Not Know 1% of Patients Does Not Have Cancer No Cancer More Screening No Extra Screening No Cancer 99% of Patients Predicted Cancer Predicted No Cancer Row Percents Has Cancer TP (True Positive) FN (False Negative) 1% of Patients Does Not Have Cancer FP (False Positive) TN (True Negative) 99% of Patients
  • 5. Typical AI/ML Pipeline Failure Analysis Fairness Analysis Impact Analysis Feedback on model performance in production is the cornerstone of an AI Ethics practice. People affected by Decisions give feedback Operator reviews Decisions Training Data and Code produce a model Scoring Data and Model produce decisions P.Yes P.No A.Yes TrP FN A.No FP TM
  • 6. Scoring Data and Model produce decisions Typical AI/ML Pipeline In practice, anything that isn’t model training or scoring is: • Ad hoc • Manual • Prone to data errors People affected by Decisions give feedback Operator reviews Decisions Training Data and Code produce a model P.Yes P.No A.Yes TrP FN A.No FP TM Failure Analysis Fairness Analysis Impact Analysis
  • 7. Human Agency and Oversight Fairness Accountability Prevention of Harm Social and Environmental Well-Being Technical Robustness and Safety Privacy and Data Governance
  • 8. Technical Pillars of Trustworthy AI Does this model work for everyone? Human Agency and Oversight Fairness Accountability Prevention of Harm Social and Environmental Well-Being Technical Robustness and Safety Privacy and Data Governance Prevention of Harm How often does the model fail, and what is the impact? Are model failures the same for everyone? How do we know the model is failing? Fairness Analysis Failure Analysis Failure Monitoring Impact Analysis
  • 9. Typical AI/ML Pipeline Failure Analysis Fairness Analysis Impact Analysis Technical leaders and individual contributors have a role in each of these pillars. People affected by Decisions give feedback Operator reviews Decisions Training Data and Code produce a model Scoring Data and Model produce decisions P.Yes P.No A.Yes TrP FN A.No FP TM Human Agency and Oversight Prevention of Harm Fairness Social and Environmental Well-Being Privacy and Data Governance Privacy and Data Governance Accountability Technical Robustness and Safety
  • 10. Technical Pillars of Trustworthy AI Does this model work for everyone? Human Agency and Oversight Fairness Accountability Prevention of Harm Social and Environmental Well-Being Technical Robustness and Safety Privacy and Data Governance Prevention of Harm How often does the model fail, and what is the impact? Are model failures the same for everyone? How do we know the model is failing? Fairness Analysis Failure Analysis Failure Monitoring Impact Analysis
  • 11. Failure Analysis Cancer Screening Predicted Cancer Predicted No Cancer Has Cancer Has Cancer More Screening Has Cancer and Does Not Know Does Not Have Cancer No Cancer More Screening No Extra Screening No Cancer 1. Find the cell in the confusion matrix that causes the most harm to the least advantaged group. 2. Analyze rates and outcomes for that cell. Fairness Prevention of Harm Fraud Screening Predicted Fraud Predicted No Fraud Fraudulent Account Audit, Model Makes $ Fraud and No Audit, Model Loses $ Honest Account No Fraud, Customer Audit No Fraud No Audit
  • 12. Aequitas Fairness Tree Is being predicted positive punitive or assistive? Which group is harmed most by mistakes? Can you intervene with most people or just a subset? Which group is harmed most by mistakes? # 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 𝐺𝑟𝑜𝑢𝑝 𝑆𝑖𝑧𝑒 False Discovery Rate (FDR) False Positive Rate True Positive Rate # 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠 𝐺𝑟𝑜𝑢𝑝 𝑆𝑖𝑧𝑒 False Negative Rate (Recall) False Omission Rate Fairness Tree: Data Science and Public Policy, Carnegie Mellon University http://www.datasciencepublicpolicy.org/our-work/tools-guides/aequitas/ Everyone People who get intervention People who do not get intervention Most Subset Everyone People Not Assisted People with Actual Need Accountability Technical Robustness and Safety
  • 13. Failure Analysis: Pre-Deployment • Failure analysis is often ad-hoc and depends heavily on the data sources available. • e.g. We may not know how many cancers human screeners miss. • Deployment should include automating failure analysis. • Deployment should include plans for cadence of failure analysis. 3/1/20XX SAMPLE FOOTER TEXT 13 Accountability Technical Robustness and Safety
  • 14. Tools for Failure Analysis • Every model will produce the statistics listed in the fairness tree. (e.g. sklearn.metrics) • It is up to the modeling team to decide which statistics are the most important and to display them in a way that communicates impact to stakeholders. • Deciding on a set of metrics that should be monitored post-deployment is part of the analysis. • Once the analysis is done, it should be automated so it can be re-done at regular intervals. These scripts are usually tailored to the business problem. • AWS Clarify has a nice set of tools for calculating and displaying statistics. Accountability Technical Robustness and Safety
  • 15. Failure Analysis Depends on Good Data Failure Analysis Fairness Analysis Impact Analysis People affected by Decisions give feedback Operator reviews Decisions Training Data and Code produce a model Scoring Data and Model produce decisions P.Yes P.No A.Yes TrP FN A.No FP TM Accountability Technical Robustness and Safety
  • 16. "Everyone wants to do the model work, not the data work": Data Cascades in High-Stakes AI, Nithya Sambasivan and Shivani Kapania and Hannah Highfill and Diana Akrong and Praveen Kumar Paritosh and Lora Mois Aroyo (2021)
  • 17. Technical Pillars of Trustworthy AI Does this model work for everyone? Human Agency and Oversight Fairness Accountability Prevention of Harm Social and Environmental Well-Being Technical Robustness and Safety Privacy and Data Governance Prevention of Harm How often does the model fail, and what is the impact? Are model failures the same for everyone? How do we know the model is failing? Fairness Analysis Failure Analysis Failure Monitoring Impact Analysis
  • 18. Fairness Analysis 1. Focus on cell where most harm occurs. 2. Compare performance for underrepresented and/or unprivileged groups. Fairness Prevention of Harm Fraud Screening Group A Predicted Fraud Predicted No Fraud Fraudulent Account Audit, Model Makes $ Fraud and No Audit, Model Loses $ Honest Account No Fraud, Customer Audit No Fraud No Audit Fraud Screening Group B Predicted Fraud Predicted No Fraud Fraudulent Account Audit, Model Makes $ Fraud and No Audit, Model Loses $ Honest Account No Fraud, Customer Audit No Fraud No Audit Fraud Screening Group C Predicted Fraud Predicted No Fraud Fraudulent Account Audit, Model Makes $ Fraud and No Audit, Model Loses $ Honest Account No Fraud, Customer Audit No Fraud No Audit
  • 19. Aequitas Fairness Tree Is being predicted positive punitive or assistive? Which group is harmed most by mistakes? Can you intervene with most people or just a subset? Which group is harmed most by mistakes? True Positive Rate False Negative Rate (Recall) Fairness Tree: Data Science and Public Policy, Carnegie Mellon University http://www.datasciencepublicpolicy.org/our-work/tools-guides/aequitas/ Everyone People who get intervention People who do not get intervention Most Subset Everyone People Not Assisted People with Actual Need FP/GS Parity FDR Parity FPR Parity Recall Parity FN/GS Parity FOR Parity FNR Parity # 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 𝐺𝑟𝑜𝑢𝑝 𝑆𝑖𝑧𝑒 False Discovery Rate (FDR) False Positive Rate # 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠 𝐺𝑟𝑜𝑢𝑝 𝑆𝑖𝑧𝑒 False Omission Rate Fairness Prevention of Harm
  • 20. Technical Pillars of Trustworthy AI Does this model work for everyone? Human Agency and Oversight Fairness Accountability Prevention of Harm Social and Environmental Well-Being Technical Robustness and Safety Privacy and Data Governance Prevention of Harm How often does the model fail, and what is the impact? Are model failures the same for everyone? How do we know the model is failing? Fairness Analysis Failure Analysis Failure Monitoring Impact Analysis
  • 21. Failure Monitoring Failure Analysis Fairness Analysis Impact Analysis People affected by Decisions give feedback Operator reviews Decisions Training Data and Code produce a model Scoring Data and Model produce decisions P.Yes P.No A.Yes TrP FN A.No FP TM Human Agency and Oversight Prevention of Harm
  • 22. How do we know the model is failing? • What pipelines exist for people to give feedback on model performance? • Experts/Operators who are using the models. • People who are affected by the model. • How do we automate monitoring the most critical model performance metrics? • What outside data is available as a check against our assumptions about the model? • There are no great tools for checking failures. • Cloud providers do offer some tools if you are using their cloud (e.g. AWS, Azure, and Google). 3/1/20XX SAMPLE FOOTER TEXT 22 Human Agency and Oversight Prevention of Harm
  • 23. Lowest Hanging Fruit: Automate All Data Pipelines dbt Soda and SodaCL great-expectations deequ Runs code pipeline and data checks Data checks only Data checks only Data checks only Built-in tests and SQL-based user defined tests Built-in tests and SQL-based user defined tests Built-in tests for Python Build-in tests and Spark/PySpark- based user defined tests. SQL based with open source dbt core and subscription-based cloud option SQL-based with open source SodaCore and subscription-based SodaCloud Python-based Spark/PySpark- based Human Agency and Oversight Prevention of Harm
  • 24. Hardening Pipelines: Obvious Tests for Tabular Data • Uniqueness: “This column/combination of columns should be unique by row.” • Correctness: “Only these values allowed in this column.” • Missingness: “These columns should be populated for X% of rows.” • Range: “Nothing bigger/smaller than [a,b] should be in this column.” • … You get the picture. 3/1/20XX SAMPLE FOOTER TEXT 24 Human Agency and Oversight Prevention of Harm
  • 25. Hardening Pipelines: Less Obvious Tests for Tabular Data • Feature Drift: Are distributions of inputs changing? • Model Drift: Are the model predictions changing? • Kolmogorov-Smirnov: What is the probability of observing the data we see today (or something weirder) compared to what we think the data should look like? • A p-value of 0.05 means this test alarms 5% of the time when all is normal. Use False Discovery Rate to find true errors. • KL Divergence (Population Stability Index) • Sensitive to the bins you pick. • These tests are sensitive to outliers. Outliers happen all the time. Human Agency and Oversight Prevention of Harm
  • 26. Data Pipelines Hardened? Automate the Workflow Model-card- toolkit Metaflow deepchecks Luigi Airflow Open source systems for creating model cards. Runs code pipeline and data checks. Developed specifically for data science. Data checks and performance checks for full model pipeline. Full featured and let you automate all of your scripts for everything. Like Luigi but automates some of the more tedious parts. Python based Python based Python based Python-based Python based. DAG: Directed Acyclic Graph • A collection of tasks and their dependencies. • Directed: Each task that requires output from previous tasks knows its own dependencies. • Acyclic: A graph term. It means there’s no point where a task depends on output from a task that can’t be performed before the current task. Model Card • Simplified explanation of model inputs, outputs, and assumptions. Human Agency and Oversight Prevention of Harm
  • 27. Technical Pillars of Trustworthy AI Does this model work for everyone? Human Agency and Oversight Fairness Accountability Prevention of Harm Social and Environmental Well-Being Technical Robustness and Safety Privacy and Data Governance Prevention of Harm How often does the model fail, and what is the impact? Are model failures the same for everyone? How do we know the model is failing? Fairness Analysis Failure Analysis Failure Monitoring Impact Analysis
  • 28. Impact Analysis: AI That Works for Everyone • The least technical part of AI Ethics. • Arguably the part of AI Ethics that most needs technical assistance. • Part of the initial project plan. • Local Impacts: This model’s impact on its stakeholders. • Social Impacts: How does this model contribute to AI’s larger issues? • Mitigation Analysis: What can we do within the scope of this project to mitigate negative impacts? SAMPLE FOOTER TEXT Social and Environmental Well-Being Privacy and Data Governance
  • 29. Local Impact of an AI Model • Does this model improve working conditions for the people who use it? • e.g. An AI model that requires a lot of data input from nurses and doctors may increase their job responsibilities without compensating or rewarding them for extra effort. • Does this model improve outcomes for people affected by the model? • e.g. A fraud detection model may speed payment for most individuals. • Does this model make things worse for some individuals? • e.g. A fraud detection model may speed payment for most individuals and slow payment for others to an unacceptable level. • Are we collecting only the data we need? Are we keeping that data safe? • e.g. Does my word game really need my location? Social and Environmental Well-Being Privacy and Data Governance
  • 30. Social Impact of an AI Model • Environmental cost of an AI model is non-negligible: https://openai.com/blog/ai-and-compute/ • We need efficient computation, and that is a technical problem. • Many AI models profit from free or underpaid labor: https://www.wired.com/story/foundations-ai-riddled-errors/ • Labeling software should be good software. • Large-scale adoption of AI models has other effects. • Never mind the trolly problem: Suppose 10% of the cars on the road are self- driving. Now, suppose there’s a network outage during a heavy traffic period. Social and Environmental Well-Being Privacy and Data Governance
  • 31. AI Ethics and Model Development • Pre-Development • Impact Analysis: Who will use the model and how? • Failure Analysis: What is the most impactful failure? What is an acceptable level of failure? • Fairness Analysis: What are the underrepresented/unpriviledged groups? • Failure Monitoring: What development is needed for Human-to-Model feedback? • Model Development • Design and hardening of data pipelines, including privacy. • Model’s ability to meet failure thresholds. • Deployment • Does the model meet criteria set during pre-development? • Are the requirements in place?
  • 32. Ethical AI is Good AI and Good AI is Ethical AI • Ethical AI knows when it fails and the impact of those failures. • Ethical AI fails in the same way for everyone. • Ethical AI is monitored for failures and has strong feedback loops that surface problems quickly. • Ethical AI is designed for positive impact on the communities where it is implemented and for society as a whole. Who doesn’t want that? Ellis-Lee, Mia. (2008) “Accessible Design is Good Design & Good Design is Accessible Design. Flywheel hosted blog. https://www.flywheelstrategic.com/thinking/post/flywheel-blog/2018/04/06/accessible-design-is-good-design-good- design-is-accessible-design

Hinweis der Redaktion

  1. Deloitte:’s Trustworthy AI Framework: https://www2.deloitte.com/us/en/pages/deloitte-analytics/solutions/ethics-of-ai-framework.html, https://www.technologyreview.com/2020/03/25/950291/trustworthy-ai-is-a-framework-to-help-manage-unique-risk/ US ai.gov: https://www.ai.gov/strategic-pillars/advancing-trustworthy-ai/ OECD Publishing (2021) “Trustworthy AI: A Framework to Compare Implementation Tools for Trustworthy AI Systems”. https://www.oecd.org/science/tools-for-trustworthy-ai-008232ec-en.htm
  2. Fang, Huanming, Hui Miao (2020) “Introducing the Model Card Toolkit for Easier Model Transparency and Reporting.” Google AI Blog. https://ai.googleblog.com/2020/07/introducing-model-card-toolkit-for.html Tagliabue, J., Tuulos, V., Greco, C. and Dave, V., 2021. DAG Card is the new Model Card. arXiv preprint arXiv:2110.13601. https://arxiv.org/pdf/2110.13601.pdf
  3. “Where State Farm Sees ‘a Lot of Fraud,’ Black Customers See Discrimination” https://www.nytimes.com/2022/03/18/business/state-farm-fraud-black-customers.html “Aiming for truth, fairness, and equity in your company’s use of AI” https://www.ftc.gov/business-guidance/blog/2021/04/aiming-truth-fairness-equity-your-companys-use-ai “Weighing Big Tech’s Promise to Black America” https://www.wired.com/story/big-techs-promise-to-black-america/
  4. Self-driving cars will make you forget how to drive: Javadi, AH., Emo, B., Howard, L. et al. Hippocampal and prefrontal processing of network topology to simulate the future. Nat Commun 8, 14652 (2017). https://doi.org/10.1038/ncomms14652