Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 24 Anzeige

Camlis

Herunterladen, um offline zu lesen

While data privacy challenges long predate current trends in machine-learning-as-a-service (MLAAS) offerings, predictive APIs do expose significant new attack vectors. To provide users with tailored recommendations, these applications often expose endpoints either to dynamic models or to pre-trained model artifacts, which learn patterns from data to surface insights. Problems arise when training data are collected, stored, and modeled in ways that jeopardize privacy. Even when user data is not exposed directly, private information can often be inferred using a technique called model inversion. In this talk, I discuss current research in black box model inversion and present a machine learning approach to discovering the model families of deployed black box models using only their decision topologies. Prior work suggests the efficacy of model family specific attack vectors (i.e., once the model is no longer a black box, it is easier to exploit). As such, we approach the problem only of model discovery and not of model inversion, reasoning that by solving the problem of model identification, we clear a path for information security and cryptography experts to use domain-specific tools for model inversion.

While data privacy challenges long predate current trends in machine-learning-as-a-service (MLAAS) offerings, predictive APIs do expose significant new attack vectors. To provide users with tailored recommendations, these applications often expose endpoints either to dynamic models or to pre-trained model artifacts, which learn patterns from data to surface insights. Problems arise when training data are collected, stored, and modeled in ways that jeopardize privacy. Even when user data is not exposed directly, private information can often be inferred using a technique called model inversion. In this talk, I discuss current research in black box model inversion and present a machine learning approach to discovering the model families of deployed black box models using only their decision topologies. Prior work suggests the efficacy of model family specific attack vectors (i.e., once the model is no longer a black box, it is easier to exploit). As such, we approach the problem only of model discovery and not of model inversion, reasoning that by solving the problem of model identification, we clear a path for information security and cryptography experts to use domain-specific tools for model inversion.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Ähnlich wie Camlis (20)

Weitere von Rebecca Bilbro (20)

Anzeige

Camlis

  1. 1. Inferring Model Families from Deployed Black Boxes Dr. Rebecca Bilbro CAMLIS 2018
  2. 2. Rebecca Bilbro Co-creator & Core Contrib, Scikit-Yb Adjunct Faculty, Georgetown Univ. Emeritus, Data Community DC github.com/rebeccabilbro twitter.com/rebeccabilbro
  3. 3. Data science! What could go wrong?
  4. 4. Just anonymize the data? ID Name SSN Age Ethnicity Condition 1 redacted redacted 15 African American Bronchitis 2 redacted redacted 15 Caucasian Bronchitis 3 redacted redacted 17 Hispanic Asthma 4 redacted redacted 17 Hispanic Eczema 5 redacted redacted 17 African American Eczema 6 redacted redacted 18 Asian American HIV/AIDS 7 redacted redacted 18 Asian American HIV/AIDS
  5. 5. Nope, not differentially private ID Name SSN Age Ethnicity Condition 1 redacted redacted 15 African American Bronchitis 2 redacted redacted 15 Caucasian Bronchitis 3 redacted redacted 17 Hispanic Asthma 4 redacted redacted 17 Hispanic Eczema 5 redacted redacted 17 African American Eczema 6 redacted redacted 18 Asian American HIV/AIDS 7 redacted redacted 18 Asian American HIV/AIDS
  6. 6. Safety in black boxes? Automated Build Data Insight
  7. 7. training data fitted model application interface user
  8. 8. training data fitted model application interface user Oops
  9. 9. Useful for Model Inversion ● Linearity: the more linear the model, the easier to perturb (Goodfellow et al. 2015) ● Prediction metadata: confidence scores, class prediction probabilities, or decision functions make inversion easier (Fredrickson et al. 2015) ● Commercial MLAAS: reverse-engineering is easy because the models, hyperparameters used for training are known (Tràmer et al. 2016) ● Deployed black boxes: private training data can be extracted from prediction behavior (Song et al. 2017)
  10. 10. How much can be determined about a fitted model?
  11. 11. ● Open source Python library, extends Scikit-Learn API. ● Model (not data) visualization. ● Tools for feature engineering, visual diagnostics, evaluation, and steering. ● Enhances the model selection process. Yellowbrick E.g. ScoreVisualizers to gauge accuracy and diagnose problems like overfit and heteroskedasticity
  12. 12. How can we anticipate model-specific attack vectors?
  13. 13. First, some definitions “‘Model’ is an overloaded term.” - Hadley Wickham (2015) ● Model family: high-level relationships between variables of interest. ● Model form: specific relationships between variables inside model family framework. ● Fitted model: concrete instance of model form where all parameters have been estimated from data; used to generate predictions. Do fitted models exhibit distinctive topologies you could use to infer family or form?
  14. 14. Decision Topologies
  15. 15. Linear Models
  16. 16. Trees and Ensembles
  17. 17. Nearest Neighbors
  18. 18. Radial Basis Function Kernels
  19. 19. Strategic Perturbations?
  20. 20. How noisy was the original data? How much noise to subvert inversion?
  21. 21. Add more smoothing than is strictly necessary, so long as it doesn’t increase error?
  22. 22. Inspect the spread of class predictions from the average?
  23. 23. Thank you!

Hinweis der Redaktion

  • While data privacy challenges long predate current trends in machine-learning-as-a-service (MLAAS) offerings, predictive APIs do expose significant new attack vectors. To provide users with tailored recommendations, these applications often expose endpoints either to dynamic models or to pre-trained model artifacts, which learn patterns from data to surface insights. Problems arise when training data are collected, stored, and modeled in ways that jeopardize privacy. Even when user data is not exposed directly, private information can often be inferred using a technique called model inversion. In this talk, I discuss current research in black box model inversion and present a machine learning approach to discovering the model families of deployed black box models using only their decision topologies. Prior work suggests the efficacy of model family specific attack vectors (i.e., once the model is no longer a black box, it is easier to exploit). As such, we approach the problem only of model discovery and not of model inversion, reasoning that by solving the problem of model identification, we clear a path for information security and cryptography experts to use domain-specific tools for model inversion.
  • A bit about me: I’m a data scientist, a generalist, interested in NLP and Visual Diagnostics
  • Data Science is often about consuming data for a purpose it wasn’t originally intended for. This can be tricky because security and privacy are not standard parts of most data science curricula yet.
  • So when data scientists move from doing just downstream analytics, get access to data further up the chain, or start potentially collecting their own data via deployed applications, we can run into problems.
  • Even though the name and SSN have be scrubbed, 100% of the 18-year-old Asian Americans are listed as having HIV/AIDS. In communities where the population of Asian Americans is sufficiently small, this is tantamount to directly exposing PII.

    I’ve learned a lot as a data scientist from the differential privacy discussion, and from people like Jim Klucar
  • Now with the GDPR, more and more app developers are thinking about data security issues.

    Strava's online exercise-tracking map unwittingly revealed remote military outposts in Afghanistan, Iraq, Syria, and Djibouti — and even the identities of soldiers based there. (Nov 2017)
  • But, there is a sense that black box models are relatively secure.
    This is part of the promise of Machine Learning as a Service offerings.
  • So how does MLAAS work? Data is used to train a model, and the model is serialized and hosted as an application artifact together with the other compiled source and executables. Users enter data, which is transformed at the application layer into REST-like calls to the model, which passes back a prediction.
  • But, given enough API calls, this deployed black box could expose more than just predictions.
    Each prediction generates a kind of new training vector -> (input data, ŷ)
    We could exploit this.
    Given some parts of other users’ data, we might be able to reverse engineer the rest.
  • Research is increasingly finding more evidence of the vulnerabilities of black box models
  • As I’ve said, I’m no security researcher, but I do think a lot about what we can determine about fitted models.
  • Yellowbrick is an open source Python library I started building with my colleague Benjamin Bengfort about 4 years ago.
    Yellowbrick is for…

    Data scientists to evaluate the stability and predictive value of their models.
    Data engineers to monitor model performance in real world applications.
    Users of models to interpret model behavior in high dimensional space.
    Students to understand a large variety of algorithms and methods.
    Information security specialists…?
  • Could visual diagnostics be used to identify model-specific attack vectors?
  • A visual signature?
  • RBF kernels give models a distinct signature
  • Use these signatures to steer strategic perturbations in our models before we deploy them?

×