Más contenido relacionado

Similar a Towards Responsible AI - NY.pptx(20)

Más de Luis775803(20)


Towards Responsible AI - NY.pptx

  1. REGISTER & WIN GLOBAL AI DEVELOPER DAYS - TRI-STATE Register & Win Exciting Smart Goodies, T-shirts & Amazon Gift cards
  2. YOUR FEEDBACK GLOBAL AI DEVELOPER DAYS - TRI-STATE Provide us a Feedback & Win Exciting Smart Goodies, T-shirts & Amazon Gift cards
  4. Keynote Suzanne George Partner - Modern Workplace & Experience Lead Peter Ward CEO @ SoHo Dragon
  5. Stream 1 Nov 2nd, 8 a.m. EST GLOBAL AI DEVELOPER DAYS - TRI-STATE
  6. Towards Responsible AI Luis Beltrán
  7. Luis Beltrán @darkicebeam
  8. 35 % 65 % Trust Trust in AI Don't trust in AI How confident are consumers with how organizations implement AI? 77 % 23 % Responsibility Yes No Should organizations be held accountable for AI misuse? Sources: Accenture 2022 Tech Vision Research Accenture 2019 Global Risk Study
  9. What is Responsible AI?
  10. Responsible AI It is an approach to evaluating, developing and implementing AI systems in a safe, reliable and ethical manner, and making responsible decisions and actions. Generally speaking, Responsible AI is the practice of upholding the principles of AI when designing, building, and using artificial intelligence systems.
  11. Responsible AI Principles
  12. 1. Privacy
  13. A Data Privacy problem
  14. Differential Privacy
  15. SmartNoise
  16. Differential privacy adds noise so the maximum impact of an individual on the outcome of an aggregative analysis is at most epsilon (ϵ)​ • The incremental privacy risk between opting out vs participation for any individual is governed by ϵ • Lower ϵ values result in greater privacy but lower accuracy​ • Higher ϵ values result in greater accuracy with higher risk of individual identification
  17. 2. Fairness Absence of negative impact on groups based on: Ethnicity Gender Age Physical disability Other sensitive features
  18. Mitigating Unfairness Create models with parity constraints: Algorithms: • Exponentiated Gradient - A *reduction* technique that applies a cost- minimization approach to learning the optimal trade-off of overall predictive performance and fairness disparity (Binary classification and regression)​ • Grid Search - A simplified version of the Exponentiated Gradient algorithm that works efficiently with small numbers of constraints (Binary classification and regression)​ • Threshold Optimizer - A *post-processing* technique that applies a constraint to an existing classifier, transforming the prediction as appropriate (Binary classification)​
  19. Mitigating Unfairness Constraints: • Demographic parity: Minimize disparity in the selection rate across sensitive feature groups. • True positive rate parity: Minimize disparity in true positive rate across sensitive feature groups​ • False positive rate parity: Minimize disparity in false positive rate across sensitive feature groups​ • Equalized odds: Minimize disparity in combined true positive rate and false positive rate across sensitive feature groups​ • Error rate parity: Ensure that the error for each sensitive feature group does not deviate from the overall error rate by more than a specified amount​ • Bounded group loss: Restrict the loss for each sensitive feature group in a regression model​
  20. 3. Transparency Packages that contribute to the explainability and transparency of a model Interpret-Community InterpretML Fairlearn
  21. Feature Importance • Global Feature Importance General importance of features for all test data Indicates the relative influence of each feature on the prediction tag • Local Feature Importance Importance of features for an individual prediction In the classification, this shows the relative support for each possible class per feature
  22. Follow-up / Feedback Problem statement Building datasets Algorithm selection Training process Evaluation / testing process Deployment Is an algorithm an ethical solution to the problem? Is the training data representative of different groups? Are there biases in the labels or features? Do I need to modify the data to migrate biases? Is it necessary to include equity constraints in the objective function? Are there side effects among users? Is the model used in a population for which it has not been trained or evaluated? Has the model been evaluated using relevant equity metrics? Does the model encourage feedback loops that can produce increasingly unfair results? 4. Reliability
  23. 5. Inclusiveness
  24. 6. Accountability
  25. Responsible AI Toolbox
  26. General recommendations towards Responsible AI Clarify what the intelligent system is going to do. Clarify system performance. Display relevant contextual information Mitigate social bias.
  27. Consider ignoring undesirable features. Consider an efficient correction. Clearly explain why the system made a certain decision.
  28. Remember recent interactions. Learn from user behavior. Update and adapt cautiously. Encourage feedback.
  29. Benefits of Responsible AI Minimize unintentional bias Ensuring AI transparency Create opportunities Protect data privacy and security Benefit customers and markets
  30. Reminder: Responsible AI
  31. Q & A
  32. Towards responsible AI Luis Beltrán Thank you for your attention!

Hinweis der Redaktion

  1. When we talk about AI, we usually refer to a machine learning model that is used within a system to automate something. For example, a self-driving car can take images using sensors. A machine learning model can use these images to make predictions (for example, the object in the image is a tree). These predictions are used by the car to make decisions (for example, turn left to avoid the tree). We refer to this whole system as AI. When AI is developed, there are risks that it will be unfair or seen as a black box that makes decisions for humans. For example, another model that analyzes a person's information (such as their salary, nationality, age, etc.) and decides whether to grant them a loan or not. Human participation is limited in those decisions made by the system. This can lead to many potential problems and companies need to define a clear approach to the use of AI. Responsible AI is a governance framework meant to do exactly that.
  2. AI brings unprecedented opportunities to businesses, but it also comes with incredible responsibility.  Its direct impact on people's lives has raised considerable questions around AI ethics, data governance, trust and legality. In fact, Accenture's 2022 Tech Vision research found that only 35% of global consumers trust how organizations implement AI. And 77% think organizations should be held accountable for their misuse of AI. As organizations begin to expand their use of AI to capture business benefits, they need to consider regulations and the steps they need to take to make sure their organizations are compliant. That's where responsible AI comes into play where also data scientists and machine learning engineers have an ethical (and possibly legal) responsibility to create models that don't negatively affect individuals or groups of people.
  3. Responsible AI is the practice of designing, developing, and deploying AI with good intent to empower employees and businesses, and impact customers and society fairly, safely, and ethically, enabling organizations to build trust and scale AI more securely. They are the product of many decisions made by those who develop and implement them. From the purpose of the system to the way people interact with AI systems, responsible AI can help proactively guide decisions toward more beneficial and equitable outcomes. That means keeping people and their goals at the center of system design decisions and respecting enduring values like fairness, reliability, and transparency. Evaluating and researching ML models before their implementation remains at the core of reliable and responsible AI development.
  4. Microsoft has developed a Responsible AI Standard. It's a framework for building AI systems according to six key principles: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. For Microsoft, these principles are the foundations of a responsible and trustworthy approach to AI, especially as intelligent technology becomes more prevalent in products and services that people use every day. Let’s talk about some of the principles
  5. AI systems like facial recognition or voice tagging can definitely be used to breach an individual's privacy and threaten security. How an individual's online footprint is used to track, deduce and influence someone's preferences or perspectives is a serious concern that needs to be addressed. The way in which "fake news" or "deep fakes" influence public opinion also represents a threat to individual or social security. AI systems are increasingly misused in this domain. There is a pertinent need to establish a framework that protects an individual's privacy and security. Privacy is any data that can identify an individual and/or their location, activities and interests. Such data is generally subject to strict privacy and compliance laws, for example GDPR in Europe. AI systems must comply with privacy laws that require transparency about the collection, use, and storage of data. It should require consumers to have adequate controls in choosing how their data is used.
  6. Data science projects, including machine learning projects, involve analysis of data; and often that data includes sensitive personal details that should be kept private.​ In practice, most reports that are published from the data include aggregations of the data, which you may think would provide some privacy – after all, the aggregated results do not reveal the individual data values.​ ​ However, consider a case where multiple analyses of the data result in reported aggregations that when combined, could be used to  work out information about individuals in the source dataset. In the example on the slide, 10 participants share data about their location and salary. The aggregated salary data tells us the average salary in Seattle; and the location data tells us that 10% of the study participants (in other words, a single person) is based in Seattle – so we can easily determine the specific salary of the Seattle-based participant.​ ​ Anyone reviewing both studies who happens to know a person from Seattle who participated, now knows that person's salary.​
  7. Differential privacy seeks to protect individual data values by adding statistical "noise" to the analysis process. The math involved in adding the noise is quite complex, but the principle is fairly intuitive – the noise ensures that data aggregations stay statistically consistent with the actual data values allowing for some random variation, but make it impossible to work out the individual values from the aggregated data. In addition, the noise is different for each analysis, so the results are non-deterministic – in other words, two analyses that perform the same aggregation may produce slightly different results.​
  8. Two open source packages that can enable further implementation of privacy and security principles: Counterfit: Counterfit is an open-source project comprising a command-line tool and generic automation layer to enable developers to simulate cyberattacks against AI systems and verify their security. SmartNoise: SmartNoise is a project (co-developed by Microsoft) that contains components for building differentially private systems that are global.
  9. You can use SmartNoise to create an analysis in which noise is added to the source data. The underlying mathematics of how the noise is added can be quite complex, but SmartNoise takes care of most of the details for you Built-in support for training simple machine learning models like linear and logistic regression Compatible with open-source training libraries such TensorFlow Privacy
  10. Epsilon: The amount of variation caused by adding noise is configurable through a parameter called epsilon. This value governs the amount of additional risk that your personal data can be identified. The key thing is that it applies this privacy principle for every member in the data. A low epsilon value provides the most privacy, at the expense of less accuracy when aggregating the data. A higher epsilon value results in aggregations that are more true to the actual data distribution, but in which the individual contribution of a single individual to the aggregated value is less obscured by noise.​
  11. However, there are a few concepts it's useful to be aware of. Upper and lower bounds: Clamping is used to set upper and lower bounds on values for a variable. This is required to ensure that the noise generated by SmartNoise is consistent with the expected distribution of the original data. Sample size: To generate consistent differentially private data for some aggregations, SmartNoise needs to know the size of the data sample to be generated.
  12. It's common when analyzing data to examine the distribution of a variable using a histogram. For example, let's look at the true distribution of ages in the diabetes dataset. The histograms are similar enough to ensure that reports based on the differentially private data provide the same insights as reports from the raw data.
  13. Now let's compare that with a differentially private histogram of Age.
  14. Another common goal of analysis is to establish relationships between variables. SmartNoise provides a differentially private covariance function that can help with this. In this case, the covariance between Age and DisatolicBloodPressure is positive, indicating that older patients tend to have higher blood pressure.
  15. In addition to the Analysis functionality, SmartNoise enables you to use SQL queries against data sources to retrieve differentially private aggregated results. First, you need to define the metadata for the tables in your data schema. You can do this in a .yml file, such as the diabetes.yml file in the /metadata folder. The metadata describes the fields in the tables, including data types and minimum and maximum values for numeric fields.
  16. With the metadata defined, you can create readers that you can query. In the following example, we'll create a PandasReader to read the raw data from a Pandas dataframe, and a PrivateReader that adds a differential privacy layer to the PandasReader. Now you can submit a SQL query that returns an aggregated resultset to the private reader. Let's compare the result to the same aggregation from the raw data.
  17. You can customize the behavior of a PrivateReader with the epsilon_per_column parameter. Let's try a reader with a high epsilon (low privacy) value, and another with a low epsilon (high privacy) value. Note that the results of the high epsilon (low privacy) reader are closer to the true results from the raw data than the results from the low epsilon (high privacy) reader.
  18. ML models can encapsulate unintentional biases that result in injustice. In a Responsible AI, you must detect, mitigate, reduce injustice in your models. AI systems must treat everyone fairly and avoid affecting groups of people in the same context. For example, when AI systems provide guidance on medical treatment, loan applications, or employment, they should make the same recommendations to everyone with similar symptoms, financial circumstances, or professional qualifications to increase confidence in the model's predictions and ensure that they do not discriminate for or against subsets of the population based on ethnicity. gender, age, or other factors. ​
  19. The fairness assessment component of the Responsible AI dashboard enables data scientists and developers to assess model fairness across sensitive groups defined in terms of gender, ethnicity, age, and other characteristics. The Responsible AI dashboard provides a single interface to help you implement Responsible AI in practice effectively and efficiently. It brings together several mature Responsible AI tools in the areas of: Model performance and fairness assessment Data exploration Machine learning interpretability Error analysis Counterfactual analysis and perturbations Causal inference The dashboard offers a holistic assessment and debugging of models so you can make informed data-driven decisions. Having access to all of these tools in one interface empowers you to: Evaluate and debug your machine learning models by identifying model errors and fairness issues, diagnosing why those errors are happening, and informing your mitigation steps. Boost your data-driven decision-making abilities by addressing questions such as: "What is the minimum change that users can apply to their features to get a different outcome from the model?" "What is the causal effect of reducing or increasing a feature (for example, red meat consumption) on a real-world outcome (for example, diabetes progression)?"
  20. You can use the Fairlearn package to analyze a model and explore disparity in prediction performance for different subsets of data based on specific features, such as age. Train model After training a model, you can use the Fairlearn package to compare its behavior for different sensitive feature values. A mix of fairlearn and scikit-learn metric functions are used to calculate the performance values. Use scikit-learn metric functions to calculate overall accuracy, recall, and precision metrics. Use the fairlearn selection_rate function to return the selection rate (percentage of positive predictions) for the overall population. Use a MetricFrame to calculate selection rate, accuracy, recall, and precision for each age group in the Age sensitive feature. From these metrics, you should be able to discern that a larger proportion of the older patients are predicted to be diabetic. Accuracy should be more or less equal for the two groups, but a closer inspection of precision and recall indicates some disparity in how well the model predicts for each age group. The model does a better job of this for patients in the older age group than for younger patients.
  21. It's often easier to compare metrics visually. To do this, you'll use the Fairlearn fairness dashboard: When the widget is displayed, use the Get started link to start configuring your visualization. Select the sensitive features you want to compare (in this case, there's only one: Age). Select the model performance metric you want to compare (in this case, it's a binary classification model so the options are Accuracy, Balanced accuracy, Precision, and Recall). Start with Recall. Select the type of fairness comparison you want to view. Start with Demographic parity difference. The choice of parity constraint depends on the technique being used and the specific fairness criteria you want to apply. Constraints include:​ - Demographic parity: Use this constraint with any of the mitigation algorithms to minimize disparity in the selection rate across sensitive feature groups. For example, in a binary classification scenario, this constraint tries to ensure that an equal number of positive predictions are made in each group.​
  22. View the dashboard charts, which show: Selection rate - A comparison of the number of positive cases per subpopulation. False positive and false negative rates - how the selected performance metric compares for the subpopulations, including underprediction (false negatives) and overprediction (false positives). Edit the configuration to compare the predictions based on different performance and fairness metrics. The results show a much higher selection rate for patients over 50 than for younger patients. However, in reality, age is a genuine factor in diabetes, so you would expect more positive cases among older patients. If we base model performance on accuracy (in other words, the percentage of predictions the model gets right), then it seems to work more or less equally for both subpopulations. However, based on the precision and recall metrics, the model tends to perform better for patients who are over 50 years old.
  23. A common approach to mitigation is to use one of the algorithms and constraints to train multiple models, and then compare their performance, selection rate, and disparity metrics to find the optimal model for your needs. Often, the choice of model involves a trade-off between raw predictive performance and fairness. Generally, fairness is measured by reduction in disparity of feature selection or by a reduction in disparity of performance metric. ​ To train the models for comparison, you use mitigation algorithms to create alternative models that apply parity constraints  to produce comparable metrics across sensitive feature groups. Some common algorithms used to optimize models for fairness. GridSearch trains multiple models in an attempt to minimize the disparity of predictive performance for the sensitive features in the dataset (in this case, the age groups) - Exponentiated Gradient - A *reduction* technique that applies a cost-minimization approach to learning the optimal trade-off of overall predictive performance and fairness disparity  (Binary classification and regression)​ - Grid Search - A simplified version of the Exponentiated Gradient algorithm that works efficiently with small numbers of constraints (Binary classification and regression)​ - Threshold Optimizer - A *post-processing* technique that applies a constraint to an existing classifier, transforming the prediction as appropriate (Binary classification)​ ​
  24. ​The choice of parity constraint depends on the technique being used and the specific fairness criteria you want to apply. The EqualizedOdds parity constraint tries to ensure that models that exhibit similar true and false positive rates for each sensitive feature grouping.
  25. The models are shown on a scatter plot. You can compare the models by measuring the disparity in predictions (in other words, the selection rate) or the disparity in the selected performance metric (in this case, recall). In this scenario, we expect disparity in selection rates (because we know that age is a factor in diabetes, with more positive cases in the older age group). What we're interested in is the disparity in predictive performance, so select the option to measure Disparity in recall. The chart shows clusters of models with the overall recall metric on the X axis, and the disparity in recall on the Y axis. Therefore, the ideal model (with high recall and low disparity) would be at the bottom right corner of the plot. You can choose the right balance of predictive performance and fairness for your particular needs, and select an appropriate model to see its details. An important point to reinforce is that applying fairness mitigation to a model is a trade-off between overall predictive performance and disparity across sensitive feature groups - generally you must sacrifice some overall predictive performance to ensure that the model predicts fairly for all segments of the population.
  26. The chart shows clusters of models with the overall recall metric on the X axis, and the disparity in recall on the Y axis. Therefore, the ideal model (with high recall and low disparity) would be at the bottom right corner of the plot. You can choose the right balance of predictive performance and fairness for your particular needs, and select an appropriate model to see its details. An important point to reinforce is that applying fairness mitigation to a model is a trade-off between overall predictive performance and disparity across sensitive feature groups - generally you must sacrifice some overall predictive performance to ensure that the model predicts fairly for all segments of the population.
  27. It is important to be able to understand how machine learning models make predictions; and be able to explain the justification for decisions made by the system by identifying and mitigating biases. Model interpretability has become a key element in helping model predictions to be explainable, not seen as a black box making random decisions. Transparency then makes it possible to explain why a model makes the predictions it does. What characteristics affect the behavior of a model? Why was a specific customer's loan application approved or denied?
  28. Yo Model explainers use statistical techniques to calculate the *importance of features*. This allows you to quantify the relative influence that each characteristic of the training dataset has on prediction. They evaluate a feature case test dataset and the labels that the model predicts for them. Global feature importance The importance of the global feature quantifies the relative importance of each feature in the test dataset as a whole and how each feature influences the prediction. For example, a binary classification model to predict loan default risk could be trained from characteristics such as loan amount, income, marital status, and age to predict a label of 1 for loans likely to be repaid and 0 for loans that have a significant risk of default (and, therefore, they should not be approved). An explainer could then use a sufficiently representative test dataset to produce the following importance values of global characteristics: - Income: 0.98 - Loan amount: 0.67 - Age: 0.54 - Marital status: 0.32 It is clear from these values, that income is the most important characteristic in predicting whether or not a borrower will default on a loan, followed by the loan amount, then age and finally marital status. Local feature importance It measures the influence of each feature value for a specific individual prediction. For example, suppose Sam applies for a loan that the model approves. You can use an explanatory for Sam's application to determine which factors influenced the prediction. You might get a result like the one shown in the second image that indicates the amount of support for each class based on the value of the entity. Since this is a binary classification model, there are only two possible classes (0 and 1). In Sam's case, general support for class 0 is -1.4, and support for class 1 is correspondingly 1.4, and the loan is approved. The most important feature for a class 1 prediction is the loan amount, followed by income: these are the opposite order of its importance values of global characteristics (indicating that income is the most important factor for the data sample as a whole). There could be multiple reasons why local importance for an individual prediction varies from global importance to the overall dataset; for example, Sam might have a lower than average income, but the loan amount in this case might be unusually small.
  29. **Interpret-Community** package is a wrapper around a collection of *explainers* based on proven and emerging model interpretation algorithms, such as Shapely Additive Explanations (SHAP) ( and Local Interpretable Model-agnostic Explanations (LIME) (
  30. AI systems must be secure in order to be trusted. It is important that a system works as originally designed and responds safely to new situations. Their inherent resilience must resist intentional or unintentional manipulation. Rigorous testing and validation for operating conditions must be established to ensure that the system responds safely to extreme cases.   The performance of an AI system can degrade over time, so a robust model monitoring and tracking process must be established to reactively and proactively measure model performance and retrain it, as needed, to modernize it.  
  31. The world around us is diverse. There are people from all walks of life. People with disabilities, nonprofits, government agencies need AI systems as much as any other person or company. The AI system must be inclusive and in tune with the needs of this diverse ecosystem. When AI systems think about inclusion, the following questions must be answered: Was the AI system developed to ensure that it includes different categories of individuals or organizations? Are there any categories of data that need to be handled exceptionally to ensure they are included? Does the expertise provided by the AI system exclude any specific type of categories? If so, is there anything that can be done about it? Inclusive design practices can help developers understand and address potential barriers that might unintentionally exclude people. Whenever possible, speech-to-text, text-to-speech, and visual recognition technology should be used to train people with hearing, vision, and other disabilities.
  32. Accountability is an essential pillar of responsible AI. The people who design and implement the AI system need to be held accountable for their actions and decisions, especially as we move toward more autonomous systems. Organizations should consider establishing an internal review body that provides oversight, information, and guidance on the development and implementation of AI systems. While this guide may vary by company and region, it should reflect an organization's AI journey. Imagine that the algorithm of an autonomous car causes an accident. Who is responsible for this? The driver, the car owner, the creator of AI? 
  33. Responsible AI Toolbox, a set of tools for a personalized and responsible AI experience with unique and complementary functionalities. It allows the exploration and evaluation of models and data that help a better understanding of AI systems.  In addition, it enables developers and stakeholders of AI systems to build and monitor AI more responsibly, and take better data-driven actions. It aims to serve as a collaborative framework for research in the field of responsible AI through the use of interactive visualizations.
  34. The toolkit has several panels available:  Error Analysis dashboard, to identify errors in the model and discover groups of data for which the model performs poorly. Interpretability panel, to understand the predictions of the model. This panel works with InterpretML. Equity panel, to understand the equity issues of the model using various group equity metrics in sensitive characteristics and cohorts. This control panel works with Fairlearn. Responsible AI Dashboard: A panel that brings together various tools for responsible model assessment and debugging and informed business decision-making. To achieve these capabilities, the panel integrates ideas and technologies from various open source toolkits into the areas of Error analysis Model interpretability Counterfactual analysis, which shows disturbed versions of features from the same data point that would have received a different prediction result, for example, a person's loan has been rejected by the model. But he would have received the loan if his income was higher by $10,000. Causal analysis, which focuses on answering What-If questions to apply data-driven decision-making: How would revenue be affected if a corporation pursues a new pricing strategy? Would a new drug improve a patient's condition? Data Balance, which helps users gain a general understanding of their data, identify characteristics that receive the positive result more than others, and visualize feature distributions.
  35. If the AI system uses or generates metrics, it is important to show them all and how they are tracked. It helps users understand that AI will not be completely accurate and sets expectations about when the AI system might make mistakes. It provides visual information related to the user's current context and environment, such as nearby hotels and return details near the destination and destination date. Make sure language and behavior don't introduce unwanted stereotypes or biases. For example, an autocomplete function must recognize multiple genres.
  36. It provides an easy mechanism to ignore or dismiss undesirable features or services. It provides an intuitive way to make it easier to edit, refine, or retrieve models. Optimizes explainable AI to provide insights into AI system decisions.
  37. Keep a history of interactions for future reference. Personalize the interaction based on user behavior. Limit disruptive changes and update based on the user's profile. Collect user feedback from their interactions with the AI system.
  38. It ensures that models are as unbiased and representative as possible. Transparent and explainable AI builds trust among users. Create opportunities without stifling innovation. It ensures that personal and sensitive data is never used unethically. Creating an ethical basis for AI establishes systems that benefit shareholders, employees and society at large.
  39. By way of conclusion, we recall that the principles that are recommended to follow to develop a responsible AI are Reliability: We need to make sure that the systems we develop are consistent with the ideas, values, and design principles so that they don't create any harm in the world. Privacy: Complexity is part of AI systems, more data is needed and our software must ensure that that data is protected, that it is not leaked or disclosed. Inclusiveness: Empower and engage people by making sure no one is left out. Consider inclusion and diversity in your models so that the entire spectrum of communities is covered. Transparency: Transparency means that people creating AI systems must be open about how and are using AI and also open about the limitations of their systems. Transparency also means interpretability, which refers to the fact that people must be able to understand the behavior of AI systems.  As a result, transparency helps gain more trust from users. Accountability: Define best practices and processes that AI professionals can follow, such as commitment to equity, to consider at every step of the AI lifecycle.