SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Downloaden Sie, um offline zu lesen
Surveillance Platform for Bank Compliance
Mayur Thakur, Goldman Sachs
2
• “Banks pay out £166bn over six years: a history of banking
misdeeds and fines” – The Guardian
• “Banks 'pay 60%' of profits in fines and customer payments” – BBC
News
• “Deutsche Bank to Pay $2.5 Billion to Settle Libor Investigation” –
The Wall Street Journal
• “$1.2 Billion Fine for Hedge Fund SAC Capital in Insider Case” – NY
Times
Stakes Are High
Key Technical Challenges
 Diverse data sets and formats (sql, flatfiles, proprietary, etc)
 Size of data, updated frequently
• ~1B* pieces of text per year
• ~1B edges in a graph
• 100s of millions of trading events in a day
 Data from past can change (e.g., manual trade correction)
• Causes a cascade of changes
 Surveillance decisions need to be debuggable
• Why was trade X on Oct 25, 2015 not flagged?
 Not real time; often need time guarantees (say, T+1)
* All numbers are “orders of magnitudes”
3
Surveillance Architechture
4
SQL 1
Surv. 1
SQL n
Flatfile 1
Flatfile m
Prop 1
Prop k
HDFS 1
HDFS q
Flattened
1
Flattened
2
Flattened
n
Preprocessing
pipeline 1
Preprocessing
pipeline 2
Preprocessing
pipeline n
Alerts…
Bookkeeping
Spoofing Illustration
5
A Real World Spoofing Case
Navinder Singh Sarao was accused of spoofing
 …and even contributing to the flash crash of 2010
Sarao pled guilty to spoofing in Nov 2016
He allegedly made $40M in illegal profit over years.
6
Review of Regulatory Cases
7
– Analyzed six regulatory enforcement cases for related to spoofing
– Identified common factors indicative of spoofing behavior
• Creating false impression of demand by placing spoof orders on opposite
side to trigger a price movement (“order imbalance”)
• Cancellation of spoof orders within short time after pivot execution (“time
to cancel post execution”)
Case
Factors
Order Imbalance Time to Cancel Post Execution
( > 2.5 times ) ( < 1 sec )
Sarao / Flash Crash a a
Hold Brothers a a
Coscia/ Panther a a
Visionary Trading NA a
Swift a 5 secs
3 Red a a
Transactions Data Pipeline
 Spoofing implementation has 2 parts: data preprocessing and surveillance logic
 Data preprocessing pipeline is reused for multiple surveillances
 ~100M orders, 1B mkt data points, 100K products, multiple order mgmt. system
8
Order 1
Related
Transactions
Spoofing
Orders n
Exec. 1
Exec m
Market 1
Market k
Account
Product
Flattened
Order
Flattened
Exec
Flattened
Market
Order Processing
Pipeline
Exec. Processing
Pipeline
Mkt. Processing
Pipeline
Front
running
Surv. n…
Alerts Alerts Alerts…
Related Transactions Table
9
Related
Transactions
Pivot Exec Orders Execs/Cancels MktData
216.8, 216.9, …
One row of the related transactions table contains information about one pivot
execution and all the activity around the time of that execution.
X X X X
Search Problem
10
Given a semi-structured corpus of about a 1B documents in a
hadoop cluster, design a search engine over YARN that is
fast and satisfies the investigative needs of a variety of users.
Unique Challenges
 Cannot move data outside of an already existing hadoop cluster
 Support deep scoring algorithms specifically for GS-specific signals
(colloquial language, trades, etc)
 Unstructured and structured signals
Search Workflow
11
Search
Master
Ranker
Fast Index
Servers
Slow Index
Servers
HBase
Web
Client
Yarn containers
HDFS
• Implemented as YARN apps
• Auth enabled
• Slow index Servers can scale as much as HBase
# indexed documents > 1Billion
# indexed tokens > 500 billion
Current Index Size Runs in several TBs (Memory
and Disk)

Weitere ähnliche Inhalte

Andere mochten auch

Jeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisJeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, Adaptris
MLconf
 
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
MLconf
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
MLconf
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
MLconf
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
MLconf
 
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
MLconf
 
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
MLconf
 

Andere mochten auch (19)

Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
 
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
 
Layla El Asri, Research Scientist, Maluuba
Layla El Asri, Research Scientist, Maluuba Layla El Asri, Research Scientist, Maluuba
Layla El Asri, Research Scientist, Maluuba
 
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
 
Jeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisJeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, Adaptris
 
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
 
Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017
Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017
Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
 
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
 
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
 
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
 
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
 
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
 
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
 

Ähnlich wie Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017

Quant congressusa2011algotradinglast
Quant congressusa2011algotradinglastQuant congressusa2011algotradinglast
Quant congressusa2011algotradinglast
Tomasz Waszczyk
 
The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processing
confluent
 
MiFID II Compliance Solution from Corvil
MiFID II Compliance Solution from CorvilMiFID II Compliance Solution from Corvil
MiFID II Compliance Solution from Corvil
Corvil
 
Real- Time Analytics – Process Automation
Real- Time Analytics – Process Automation Real- Time Analytics – Process Automation
Real- Time Analytics – Process Automation
BSP Media Group
 
SI_050212_Olsen_Alknecht_PGE_CBMproject
SI_050212_Olsen_Alknecht_PGE_CBMprojectSI_050212_Olsen_Alknecht_PGE_CBMproject
SI_050212_Olsen_Alknecht_PGE_CBMproject
Jerry Olson
 

Ähnlich wie Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017 (20)

Quant congressusa2011algotradinglast
Quant congressusa2011algotradinglastQuant congressusa2011algotradinglast
Quant congressusa2011algotradinglast
 
Algorithmic and high-frequency_trading 2011
Algorithmic and high-frequency_trading 2011Algorithmic and high-frequency_trading 2011
Algorithmic and high-frequency_trading 2011
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
 
The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processing
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
MiFID II Compliance Solution from Corvil
MiFID II Compliance Solution from CorvilMiFID II Compliance Solution from Corvil
MiFID II Compliance Solution from Corvil
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
 
The Case for Disaggregation of Compute in the Data Center
The Case for Disaggregation of Compute in the Data CenterThe Case for Disaggregation of Compute in the Data Center
The Case for Disaggregation of Compute in the Data Center
 
7 Predictive Analytics, Spark , Streaming use cases
7 Predictive Analytics, Spark , Streaming use cases7 Predictive Analytics, Spark , Streaming use cases
7 Predictive Analytics, Spark , Streaming use cases
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architectures
 
Block chain as a graph
Block chain as a graphBlock chain as a graph
Block chain as a graph
 
Лекция в СГТУ: Информационные системы и технологическая инфраструктура биржев...
Лекция в СГТУ: Информационные системы и технологическая инфраструктура биржев...Лекция в СГТУ: Информационные системы и технологическая инфраструктура биржев...
Лекция в СГТУ: Информационные системы и технологическая инфраструктура биржев...
 
Circonus: Design failures - A Case Study
Circonus: Design failures - A Case StudyCirconus: Design failures - A Case Study
Circonus: Design failures - A Case Study
 
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIs
Flink Forward SF 2017: Konstantinos Kloudas -  Extending Flink’s Streaming APIsFlink Forward SF 2017: Konstantinos Kloudas -  Extending Flink’s Streaming APIs
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIs
 
Real- Time Analytics – Process Automation
Real- Time Analytics – Process Automation Real- Time Analytics – Process Automation
Real- Time Analytics – Process Automation
 
SI_050212_Olsen_Alknecht_PGE_CBMproject
SI_050212_Olsen_Alknecht_PGE_CBMprojectSI_050212_Olsen_Alknecht_PGE_CBMproject
SI_050212_Olsen_Alknecht_PGE_CBMproject
 
How to build megaservices mind7 2021 June 29
How to build megaservices   mind7 2021 June 29How to build megaservices   mind7 2021 June 29
How to build megaservices mind7 2021 June 29
 
Putting the Micro into Microservices with Stateful Stream Processing
Putting the Micro into Microservices with Stateful Stream ProcessingPutting the Micro into Microservices with Stateful Stream Processing
Putting the Micro into Microservices with Stateful Stream Processing
 
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...
 
7. transaction mang
7. transaction mang7. transaction mang
7. transaction mang
 

Mehr von MLconf

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 

Mehr von MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017

  • 1. Surveillance Platform for Bank Compliance Mayur Thakur, Goldman Sachs
  • 2. 2 • “Banks pay out £166bn over six years: a history of banking misdeeds and fines” – The Guardian • “Banks 'pay 60%' of profits in fines and customer payments” – BBC News • “Deutsche Bank to Pay $2.5 Billion to Settle Libor Investigation” – The Wall Street Journal • “$1.2 Billion Fine for Hedge Fund SAC Capital in Insider Case” – NY Times Stakes Are High
  • 3. Key Technical Challenges  Diverse data sets and formats (sql, flatfiles, proprietary, etc)  Size of data, updated frequently • ~1B* pieces of text per year • ~1B edges in a graph • 100s of millions of trading events in a day  Data from past can change (e.g., manual trade correction) • Causes a cascade of changes  Surveillance decisions need to be debuggable • Why was trade X on Oct 25, 2015 not flagged?  Not real time; often need time guarantees (say, T+1) * All numbers are “orders of magnitudes” 3
  • 4. Surveillance Architechture 4 SQL 1 Surv. 1 SQL n Flatfile 1 Flatfile m Prop 1 Prop k HDFS 1 HDFS q Flattened 1 Flattened 2 Flattened n Preprocessing pipeline 1 Preprocessing pipeline 2 Preprocessing pipeline n Alerts… Bookkeeping
  • 6. A Real World Spoofing Case Navinder Singh Sarao was accused of spoofing  …and even contributing to the flash crash of 2010 Sarao pled guilty to spoofing in Nov 2016 He allegedly made $40M in illegal profit over years. 6
  • 7. Review of Regulatory Cases 7 – Analyzed six regulatory enforcement cases for related to spoofing – Identified common factors indicative of spoofing behavior • Creating false impression of demand by placing spoof orders on opposite side to trigger a price movement (“order imbalance”) • Cancellation of spoof orders within short time after pivot execution (“time to cancel post execution”) Case Factors Order Imbalance Time to Cancel Post Execution ( > 2.5 times ) ( < 1 sec ) Sarao / Flash Crash a a Hold Brothers a a Coscia/ Panther a a Visionary Trading NA a Swift a 5 secs 3 Red a a
  • 8. Transactions Data Pipeline  Spoofing implementation has 2 parts: data preprocessing and surveillance logic  Data preprocessing pipeline is reused for multiple surveillances  ~100M orders, 1B mkt data points, 100K products, multiple order mgmt. system 8 Order 1 Related Transactions Spoofing Orders n Exec. 1 Exec m Market 1 Market k Account Product Flattened Order Flattened Exec Flattened Market Order Processing Pipeline Exec. Processing Pipeline Mkt. Processing Pipeline Front running Surv. n… Alerts Alerts Alerts…
  • 9. Related Transactions Table 9 Related Transactions Pivot Exec Orders Execs/Cancels MktData 216.8, 216.9, … One row of the related transactions table contains information about one pivot execution and all the activity around the time of that execution. X X X X
  • 10. Search Problem 10 Given a semi-structured corpus of about a 1B documents in a hadoop cluster, design a search engine over YARN that is fast and satisfies the investigative needs of a variety of users. Unique Challenges  Cannot move data outside of an already existing hadoop cluster  Support deep scoring algorithms specifically for GS-specific signals (colloquial language, trades, etc)  Unstructured and structured signals
  • 11. Search Workflow 11 Search Master Ranker Fast Index Servers Slow Index Servers HBase Web Client Yarn containers HDFS • Implemented as YARN apps • Auth enabled • Slow index Servers can scale as much as HBase # indexed documents > 1Billion # indexed tokens > 500 billion Current Index Size Runs in several TBs (Memory and Disk)