Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Presto Summit 2018 - 08 - FINRA

539 Aufrufe

Veröffentlicht am

Presto at FINRA – supporting market surveillance at scale (John Hitchingham, FINRA)
Presto Summit 2018 (https://www.starburstdata.com/technical-blog/presto-summit-2018-recap/)

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

Presto Summit 2018 - 08 - FINRA

  1. 1. Presto at FINRA – Supporting market surveillance at scale John Hitchingham FINRA Engineering John.Hitchingham@finra.org
  2. 2. Market Regulation surveillance workflow BDs Exchanges Reference Data Providers 100B+ events 25+ PB of Data 3+ Yrs ProdMajor Exchange Clients Market Manipulation, Insider Trading, Fraud, Abuse
  3. 3. Data volume Incoming records • 6000+ business objects • 7+ million data partitions • 160+ million data objects • 25+ data publishers • 5+ PB of data
  4. 4. Data Fragmentation makes analytics difficult
  5. 5. Scale by separating storage and compute
  6. 6. Cloud Migration – Siloed Databases to Data Lake
  7. 7. Workflow in AWS Cloud
  8. 8. Herd Catalog http://finraos.github.io/he rd/
  9. 9. ETL
  10. 10. Isolate workloads and tune capacity per process
  11. 11. Interactive
  12. 12. Ad-Hoc query design for data lake
  13. 13. Main production “data warehouse” bucket growth…
  14. 14. Managed Data Lake (MDL) – Data Lake “in a box” Just released as open source Data lake implementation on AWS Featuring Presto as query endpoint https://finraos.github.io/herd-mdl /
  15. 15. Portfolio of interactive apps on data lake
  16. 16. Data Science Ecosystem
  17. 17. Query tool use at FINRA Hive Spark Presto HBase Status Deprecated General use General use Limited use Used For ETL/ELT ETL/ELT (replace Hive) Data Science Machine Learning Data Engineering Data Profiling BI Reporting Custom Apps requiring rapid “indexed” lookups
  18. 18. Future exploration with Presto o CBO o AuthN/AuthZ • Hive metastore – column, row – Ranger? • Federated database access (Postgres) – model to control authorization unique to principal • Federated AuthN (SAML, OAuth) o Athena?

×