Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Fundamentals Big Data and AI Architecture

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 50 Anzeige

Fundamentals Big Data and AI Architecture

Herunterladen, um offline zu lesen

The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.

The right architecture is key for any IT project. This is valid in the case for big data projects as well, but on the other hand there are not yet many standard architectures which have proven their suitability over years.
This session discusses different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Event Driven architecture as well as Lambda and Kappa architecture.
Each architecture is presented in a vendor- and technology-independent way using a standard architecture blueprint. In a second step, these architecture blueprints are used to show how a given architecture can support certain use cases and which popular open source technologies can help to implement a solution based on a given architecture.

The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.

The right architecture is key for any IT project. This is valid in the case for big data projects as well, but on the other hand there are not yet many standard architectures which have proven their suitability over years.
This session discusses different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Event Driven architecture as well as Lambda and Kappa architecture.
Each architecture is presented in a vendor- and technology-independent way using a standard architecture blueprint. In a second step, these architecture blueprints are used to show how a given architecture can support certain use cases and which popular open source technologies can help to implement a solution based on a given architecture.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Fundamentals Big Data and AI Architecture (20)

Anzeige

Weitere von Guido Schmutz (20)

Aktuellste (20)

Anzeige

Fundamentals Big Data and AI Architecture

  1. 1. BASEL | BERN | BRUGG | BUCHAREST | DÜSSELDORF | FRANKFURT A.M. | FREIBURG I.BR. | GENEVA HAMBURG | COPENHAGEN | LAUSANNE | MANNHEIM | MUNICH | STUTTGART | VIENNA | ZURICH http://guidoschmutz.wordpress.com@gschmutz Grundlagen der Big-Data und KI-Architektur DOAG Data Centric Day, 25.9.2019 in Köln Guido Schmutz
  2. 2. Guido Schmutz Working at Trivadis for more than 22 years Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Oracle Groundbreaker Ambassador & Oracle ACE Director Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz 169th edition
  3. 3. From Data Warehouse …
  4. 4. Data Warehouse Architecture Enterprise Data Warehouse Extract, Transform & Load (ETL) Bulk Source DB Extract File DB Consumer RDBMS BI Tools ETL Engine high latency
  5. 5. Data Warehouse is an architecture Layered model, controlled ETL, single point of truth, query optimized data marts Tested, optimized, quality assured, „operated“ Standard-reporting, adHoc-reporting on DWH Base Perfect and fast for new requirements to known and prepared data and structures Data Warehouse ist not „agile“ No free definition and shaping of arbitrary analytical questions = Data Production Source: https://www.flickr.com/photos/128950981@N04/15452926858
  6. 6. DWH Architecture – what about Streaming Data? Enterprise Data Warehouse Extract, Transform & Load (ETL) Bulk Source DB Extract File DB Consumer RDBMS BI Tools ETL Engine Event Source Location Weather IoT Data Mobile Apps Social Yes No Low High Yes No Elasticity End-to-End Latency Ad-Hoc (SQL) Queries Low HighStorage Costs Yes NoSupports Raw Data Yes NoSupports Streaming Data Low HighAccess Latency
  7. 7. … to Big Data / Data Lake
  8. 8. Initial Idea of a Data Lake … Adapted from Wikipedia.org “Reporting, visualization, analytics and machine learning” “Single store of all data in the enterprise” “Should put an end to data silos.” “Example: Distributed file system used in Apache Hadoop”
  9. 9. Data Lake Data Lake is an Infrastructure Permanently new Data and Structures Schema on Read Really large amounts of Data Explorative Working (Research) Established Error-Culture New user groups ([Data] Scientists) Freedom of data-choice Freedom of source-choice Self-Service Data Labs adHoc- & One-Shot implementations Query + Advanced Analytics = Research & Development Source: https://www.flickr.com/photos/ian-arlett/34233379390 Data-Lab Interpretation
  10. 10. Schema on Read instead of (only) Schema on Write "Schema on Write" • Data quality managed by formalized ETL process • Data persisted in tabular, agreed and consistent form • Data integration happens in ETL • Structure must be decided before writing "Schema on Read" • Interpretation of data captured in code for each program accessing the data • Data quality dependent on code quality • Data integration happens in code EDWHETLData Source Consumer RDBMS BI Tools Data LakeData Source Consumer Storage Script Data Science Workbench Data Science Workbench Transform Transform
  11. 11. Bulk Source Consumer • Machine Learning • Graph Algorithms • Natural Language Processing DB Extract File DB Big Data / Data Lake Architecture Data Science Workbench File Import / SQL Import “Native” Raw Hadoop ClusterdHadoop ClusterBig Data Platform Parallel Processing Storage Storage Raw Refined/ UsageOpt Yes No Low High Yes No Elasticity End-to-End Latency Ad-Hoc (SQL) Queries Low HighStorage Costs Yes NoSupports Raw Data Yes NoSupports Streaming Data Low HighAccess Latency high latency
  12. 12. Bulk Source Consumer DB Extract File DB Big Data / Data Lake Architecture BI Tools Data Science Workbench SQL File Import / SQL Import “Native” Raw Hadoop ClusterdHadoop ClusterBig Data Platform Parallel Processing Storage Storage Raw Refined/ UsageOpt Yes No Low High Yes No Elasticity End-to-End Latency Ad-Hoc (SQL) Queries Low HighStorage Costs Yes NoSupports Raw Data Yes NoSupports Streaming Data Low HighAccess Latency Query Engine
  13. 13. Enterprise Data Warehouse SQL SQL Export Data Lake & EDWH Architecture Bulk Source DB Extract File DB File Import / SQL Import Consumer BI Apps Data Science Workbench “Native” Raw RDBMS Hadoop ClusterdHadoop ClusterBig Data Platform Storage Storage Raw Refined/ UsageOpt Yes No Low High Yes No Elasticity End-to-End Latency Ad-Hoc (SQL) Queries Low HighStorage Costs Yes NoSupports Raw Data Yes NoSupports Streaming Data Low HighAccess Latency Parallel Processing Query Engine
  14. 14. Enterprise Data Warehouse SQL / Search Data Lake & EDWH Architecture Consumer BI Apps Data Science Workbench SQL “Native” Raw RDBMS Hadoop ClusterdHadoop ClusterBig Data Platform Storage Storage Raw Refined/ UsageOpt File Import / SQL Import Bulk Source DB Extract File DB SQL Export Yes No Low High Yes No Elasticity End-to-End Latency Ad-Hoc (SQL) Queries Low HighStorage Costs Yes NoSupports Raw Data Yes NoSupports Streaming Data Low HighAccess Latency Parallel Processing Query Engine
  15. 15. Bulk Source Enterprise Data Warehouse SQL / Search SQL Export File Import / SQL Import DB Extract File DB Data Lake & EDWH Architecture with Streaming Data SQL Event Source Location Weather IoT Data Mobile Apps Social Hadoop ClusterdHadoop ClusterBig Data Platform Storage Storage Raw Refined/ UsageOpt Consumer BI Apps Data Science Workbench Parallel Processing Query Engine “Native” Raw
  16. 16. Bulk Source Enterprise Data Warehouse SQL / Search SQL Export File Import / SQL Import DB Extract File DB Data Lake & EDWH Architecture with Streaming Data Consumer BI Apps Data Science Workbench SQL Event Source Location Weather IoT Data Mobile Apps Social Event Hub Event Hub Event Hub Event Stream B ulk D ata Im port Hadoop ClusterdHadoop ClusterBig Data Platform Storage Storage Raw Refined/ UsageOpt high latency Yes No Low High Yes No Elasticity End-to-End Latency Ad-Hoc (SQL) Queries Low HighStorage Costs Yes NoSupports Raw Data Yes NoSupports Streaming Data Low HighAccess Latency Parallel Processing Query Engine “Native” Raw
  17. 17. Keep the data in motion … Data at Rest Data in Motion Store (Re)Act Visualize/ Analyze StoreAct Analyze 11101 01010 10110 11101 01010 10110 vs. Visualize
  18. 18. Event Hub Event Hub Event Processing Architecture Event Hub “SQL” / Search Event Stream Bulk Source Event Source Location DB Extract File Weather DB IoT Data Mobile Apps Social Low(est) latency, no history Consumer Enterprise App Dashboard Stream Processing Cluster Stream Processor Model / State Event Stream Service Yes No Low High Yes No Elasticity End-to-End Latency Ad-Hoc (SQL) Queries Low HighStorage Costs Yes NoSupports Raw Data Yes NoSupports Streaming Data Low HighAccess Latency Rules Engine • Complex Event Processing (CEP) • Machine Learning Model Execution (Inference) • State Transition Event Stream
  19. 19. Event Processing & Data Lake ServiceEvent Stream Data Flow Event Stream Bulk Source Event Source Location DB Extract File Weather DB IoT Data Mobile Apps Social File Import / SQL Import Consumer BI Apps Data Science Workbench Enterprise App “SQL” / Search Hadoop ClusterdHadoop ClusterBig Data Platform Storage Storage Raw Refined/ UsageOpt DashboardStream Processing Cluster Stream Processor Model / State Event Hub Yes No Low High Yes No Elasticity End-to-End Latency Ad-Hoc (SQL) Queries Low HighStorage Costs Yes NoSupports Raw Data Yes NoSupports Streaming Data Low HighAccess Latency Parallel Processing Query Engine Rules Engine Event Stream Enterprise Data Warehouse SQL / Search SQL “Native” Raw RDBMS SQL Export
  20. 20. Event Processing & Data Lake: Lambda Architecture Event Stream Bulk Data Flow Hadoop ClusterdHadoop ClusterBig Data Platform Storage Storage Raw Refined/ UsageOpt Stream Processing Cluster Stream Processor Model / State ML Inference Server Event Hub Consumer BI Apps Dashboard Serving API (Merger) Event Source Location Weather IoT Data Mobile Apps Social Event Stream Batch Result Speed Result { } Batch Layer Speed Layer Parallel Processing Query Engine
  21. 21. Event Processing & Data Lake: Kappa Architecture Event Stream Stream Processing Cluster Stream Processor V1.0 State V1.0 Event Hub Event Source Location Weather IoT Data Mobile Apps Social Reply Hadoop ClusterdHadoop ClusterBig Data Platform Storage Storage Raw Refined/ UsageOpt Bulk Data Flow Consumer BI Apps Dashboard Serving Stream Processor V2.0 State V2.0 Result V1.0 Result V2.0 API (Switcher) { } Speed Layer Parallel Processing Query Engine
  22. 22. Integrate existing systems with CDC ServiceEvent Stream Event Stream Bulk Source Event Source Location DB Extract File Weather DB IoT Data Mobile Apps Social File Import / SQL Import Consumer BI Apps Data Science Workbench Enterprise App “SQL” / Search Hadoop ClusterdHadoop ClusterBig Data Platform Storage Storage Raw Refined/ UsageOpt DashboardStream Processing Cluster Stream Processor Model / State Event Hub Change Data Capture Parallel Processing Query Engine Rules Engine Bulk Data Flow Event Stream Enterprise Data Warehouse SQL / Search SQL “Native” Raw RDBMS SQL Export
  23. 23. Applications participate Event-Driven Service Event Stream Bulk Data Flow Event Stream Bulk Source Event Source Location DB Extract File Weather DB IoT Data Mobile Apps Social File Import / SQL Import Consumer BI Apps Data Science Workbench Enterprise App “SQL” / Search Service Event Hub Hadoop ClusterdHadoop ClusterBig Data Platform Storage Storage Raw Refined/ UsageOpt Microservice Platform Stream Processing Platform Stream Processor Model / State Change Data Capture Rules Engine Event Stream Microservice Data { } API Enterprise Data Warehouse SQL / Search SQL “Native” Raw RDBMS SQL Export
  24. 24. Move Processing to Edge Service Event Stream Bulk Data Flow Bulk Source Event Source Location DB Extract File Weather DB IoT Data Mobile Apps Social File Import / SQL Import Consumer BI Apps Data Science Workbench Enterprise App “SQL” / Search Service Event Hub Hadoop ClusterdHadoop ClusterBig Data Platform Storage Storage Raw Refined/ UsageOpt Microservice Cluster Microservice Data { } API Stream Processing Cluster Stream Processor Model / State Change Data Capture Edge Node Rules Event Hub Storage Parallel Processing Query Engine Rules Engine Event Stream Event Stream Enterprise Data Warehouse SQL / Search SQL “Native” Raw RDBMS SQL Export
  25. 25. Anyone does what they want No (central?) documentation No unique data structure No unique transformations No unique KPI definitions No quality assurance No data flow analysis Silo-Thinking Data avalibility? Security? Auditibility? = No Data Architecture Data SwampQuelle https://www.flickr.com/photos/82134796@N03/10603438015 But be careful ….
  26. 26. Data Lake Zones & Data Catalog
  27. 27. Data Storage Landing Zone Archive Zone Data Lake Zones Object Store Tape Raw Zone Sandbox Zone Usage- Optimized Zone Data Source Data Access File System Event Hub Object Store File System Event Hub Object Store File System Object Store File System RDBMS Object Store File System RDBMS/ NoSQL Refined Zone Object Store File System Event Hub NoSQL In-Memory Grid Event Hub/ Store Disk Service Disk Service
  28. 28. Data Catalog Service Event Stream Bulk Data Flow Bulk Source Event Source Location DB Extract File Weather DB IoT Data Mobile Apps Social File Import / SQL Import Consumer BI Apps Data Science Workbench Enterprise App Enterprise Data Warehouse SQL / Search SQL “Native” Raw RDBMS “SQL” / Search Service Event Hub Hadoop ClusterdHadoop ClusterBig Data Platform SQL Export Storage Storage Raw Refined/ UsageOpt Microservice Cluster Stream Processing Cluster Stream Processor Model / State Change Data Capture Edge Node Rules Event Hub Storage Governance Data Catalog Rules Engine Parallel Processing Query Engine Microservice Data { } API Event Stream Event Stream
  29. 29. (Machine Learning Augmented) Data Catalog A data catalog creates and maintains an inventory of data assets through discovery, description and organization of distributed datasets. It provides context to enable data stewards, data/business analysts, data engineers, data scientists and other line of business (LOB) data consumers to find and understand relevant datasets for the purpose of extracting business value. Modern machine-learning-augmented data catalogs automate various tedious tasks involved in data cataloging, including metadata discovery, ingestion, translation, enrichment and the creation of semantic relationships between metadata.
  30. 30. Data Catalog Data Catalog Features Ranking on Utilization Rate Catalog Objects Maintain Multiple Versions of Catalog Object Search & Navigation for Content Content Check in/out Certify Official Versions of Metadata Analyze and Audit Decision Processes Integrate Data Lineage Levels of Access to Catalog Objects Impact Analysis API for Search / Catalog / Mgmt Functions Track Usage of Catalog Objects Integration with IAM Automated Crawling of Source System Catalog Cloud-Deployed Sources Catalog Hadoop-based Sources Catalog BI & Data Visualization Tools Catalog Databases Integration with self-service Tools Classify Catalog Objects by Business Glossary Supports user-defined Tagging Integrates with Data Profiling Supports Data Sampling Quality Metrics Catalog Machine/IoT Data Supports Discussion Threads on Catalog Objects Annotate & Comment on Catalog Objects Catalog Unstructured Data with NLP functionality Semantic Search Classify Catalog Objects by Domain Publish/Subscribe on Changes of Catalog Objects AI/ML based Recommendation Detect Similar/Duplicate or Related Data Easy to use, intuitive GUI Supports Manual Curation Supports Automated (ML based) tagging Supports ongoing discovery of new data sets Natural Language Search Facetted based Search Catalog Object Value Estimation Incentive-based Participation Encouragement Assign Data Steward
  31. 31. Traditional vs. Cloud Native Big Data Platforms
  32. 32. Traditional vs. Cloud Native Big Data Data Local Compute (traditional) Separate Compute and Storage (cloud native) Worker #1 Disk Processing Master Node Worker #2 Disk Processing Worker #3 Disk Processing Network Storage Disk Disk Disk Compute #1 Processing Compute #2 Processing Compute #3 Processing Network Master Node Network Separation of compute and storage – the fundamental difference • store data in Object Storage instead of HDFS • bring up Compute nodes only for data processing • multiple workloads on separate clusters can access same data
  33. 33. Traditional vs. Cloud Native Big Data Traditional Cloud Native Data Local Compute Yes No Network Bandwidth Req. Low High Scalable, shared-usage of Data No (only within cluster) Yes Persistence HDFS Object Storage Data Lifecycle Tiered Storage Built-in (cloud) Compute Hadoop, Spark Hadoop, Spark Serverless Processing no yes Infrastructure Hadoop Cluster Cloud, Container Orchestration Entry Threshold high low
  34. 34. Modern Data Platform
  35. 35. Data Platform Service Event Stream Bulk Data Flow Bulk Source Event Source Location DB Extract File Weather DB IoT Data Mobile Apps Social File Import / SQL Import Consumer BI Apps Data Science Workbench Enterprise App Enterprise Data Warehouse SQL / Search SQL “Native” Raw RDBMS “SQL” / Search Service Event Hub Hadoop ClusterdHadoop ClusterBig Data Platform SQL Export Storage Storage Raw Refined/ UsageOpt Microservice Cluster Stream Processing Cluster Stream Processor Model / State Change Data Capture Edge Node Rules Event Hub Storage Governance Data Catalog Rules Engine Parallel Processing Query Engine Microservice Data { } API Event Stream Event Stream Modern Data Platform
  36. 36. Service Event Stream Bulk Data Flow Bulk Source Event Source Location DB Extract File Weather DB IoT Data Mobile Apps Social File Import / SQL Import Consumer BI Apps Data Science Workbench Enterprise App Enterprise Data Warehouse SQL / Search SQL “Native” Raw RDBMS “SQL” / Search Service Event Hub Hadoop ClusterdHadoop ClusterBig Data Platform SQL Export Storage Storage Raw Refined/ UsageOpt Microservice Cluster Stream Processing Cluster Stream Processor Model / State Change Data Capture Edge Node Rules Event Hub Storage Governance Data Catalog Rules Engine Parallel Processing Query Engine Microservice Data { } API Event Stream Event Stream On-Premises – Traditional Hadoop YARN Pig HDFS HDFS Kafka Confluent Hive Kafka Streams Spring Boot NoSQL RDBMS NoSQL RDBMS RDBMS Atlas Debezium Streamsets Flume Sqoop Flume Impala MapReduce Spark SparkSQL Spark Streaming Zeppelin Jupyter
  37. 37. Service Event Stream Bulk Data Flow Bulk Source Event Source Location DB Extract File Weather DB IoT Data Mobile Apps Social File Import / SQL Import Consumer BI Apps Data Science Workbench Enterprise App Enterprise Data Warehouse SQL / Search SQL “Native” Raw RDBMS “SQL” / Search Service Event Hub Hadoop ClusterdHadoop ClusterBig Data Platform SQL Export Storage Storage Raw Refined/ UsageOpt Microservice Cluster Stream Processing Cluster Stream Processor Model / State Change Data Capture Edge Node Rules Event Hub Storage Governance Data Catalog Rules Engine Parallel Processing Query Engine Microservice Data { } API Event Stream Event Stream Oracle Cloud Kafka Confluent Streamsets Nifi Streamsets Nifi Object Storage Archive Storage Object Storage Archive Storage Data Science Big Data Cloud Service Machine Learning Streaming Data Science Functions Visual Builder Java NoSQL DB Data Catalog Autonomous Transaction Proc NoSQL DB Autonomous DWH Big Data SQL Cloud Service GoldenGate Cloud Service Kafka Streams/ KSQL SOA Cloud Service Container Engine for Kubernetes Zeppelin Jupyter Transfer Service Container Pipelines Container Registry
  38. 38. Service Event Stream Bulk Data Flow Bulk Source Event Source Location DB Extract File Weather DB IoT Data Mobile Apps Social File Import / SQL Import Consumer BI Apps Data Science Workbench Enterprise App Enterprise Data Warehouse SQL / Search SQL “Native” Raw RDBMS “SQL” / Search Service Event Hub Hadoop ClusterdHadoop ClusterBig Data Platform SQL Export Storage Storage Raw Refined/ UsageOpt Microservice Cluster Stream Processing Cluster Stream Processor Model / State Change Data Capture Edge Node Rules Event Hub Storage Governance Data Catalog Rules Engine Parallel Processing Query Engine Microservice Data { } API Event Stream Event Stream AWS Cloud Kafka Confluent Streamsets Nifi Streamsets Nifi Zeppelin Jupyter S3 S3 Glacier Deep Archive S3 Dynamo DB Redshift Redshift Spectrum Spark on EMR Glue Snowball Data Sync Athena Presto on EMR SageMaker Deep Learning Containers Spark Streaming on EMR Databricks on AWS Kinesis Data Analytics Lambda Batch Spring Boot QuickSight Zeppelin on EMR Databricks on AWS RStudio on EMR API Gateway Managed Streaming for Kafka Kinesis Data Firehose Confluent Cloud
  39. 39. Service Event Stream Bulk Data Flow Bulk Source Event Source Location DB Extract File Weather DB IoT Data Mobile Apps Social File Import / SQL Import Consumer BI Apps Data Science Workbench Enterprise App Enterprise Data Warehouse SQL / Search SQL “Native” Raw RDBMS “SQL” / Search Service Event Hub Hadoop ClusterdHadoop ClusterBig Data Platform SQL Export Storage Storage Raw Refined/ UsageOpt Microservice Cluster Stream Processing Cluster Stream Processor Model / State Change Data Capture Edge Node Rules Event Hub Storage Governance Data Catalog Rules Engine Parallel Processing Query Engine Microservice Data { } API Event Stream Event Stream On-Premises – Cloud Native Istio Kubernetes Docker SparkMinIO S3 MinIO S3 Kafka Confluent NoSQL Presto Kafka Streams Spring Boot NoSQL RDBMS NoSQL RDBMS RDBMS Atlas Debezium Streamsets Nifi StreamsetsNifi SparkSQL Spark Streaming Zeppelin Jupyter
  40. 40. Physical Data Lake vs. Virtual Data Lake
  41. 41. Physical Data Lake Hadoop ClusterdHadoop ClusterData Lake Parallel Processing Storage Storage Raw Refined/ UsageOpt Consumer Query Engine BI Apps Data Source 1 File Data Source 2 RDBMS Data Source 3 NoSQL Data Source 4 Enterprise App Governance Data Catalog Data Lineage EncryptionPolicy Mgmt Query Data Ingest DiscoveryCatalog
  42. 42. Virtual Data Lake Data Source 1 File Data Source 2 RDBMS Data Source 3 NoSQL Data Source 4 Enterprise App Data Virtuali zation Query Engine Consumer BI Apps Governance Data LineageLogical Data Catalog EncryptionPolicy Mgmt DiscoveryCatalog Catalog Query Query
  43. 43. Physical Data Lake as part of Virtual Data Lake Data Source 1 File Data Source 2 RDBMS Data Source 3 NoSQL Data Source 4 Enterprise App Data Virtuali zation Query Engine Consumer BI Apps Governance Data LineageLogical Data Catalog Hadoop ClusterdHadoop ClusterData Lake Storage Storage Raw Refined/ UsageOpt EncryptionPolicy Mgmt Parallel Processing Query Engine Query Data Ingest Query DiscoveryCatalog Catalog Query
  44. 44. Multiple Data Lakes form a Virtual Data Lake Hadoop ClusterdHadoop ClusterData Lake 1 Storage Storage Raw Refined/ UsageOpt Hadoop ClusterdHadoop ClusterData Lake 2 Storage Storage Raw Refined/ UsageOpt Data Virtuali zation Query Engine Consumer BI Apps Data Source 1 File Data Source 2 RDBMS Governance Data LineageLogical Data Catalog EncryptionPolicy Mgmt Parallel Processing Query Engine Parallel Processing Query Engine Query DiscoveryCatalogCatalog Query Query
  45. 45. AI & Machine Learning
  46. 46. AI & Machine Learning: Training vs. Inference © 2019 Gartner, Inc.ID: 354956 Raw Data Logical Flow of Data Trained Model App or Service Featuring Capability Inference Applying This Capability to New Data New Data “?” “cat” Deep-Learning Framework Training Learning a New Capability From Existing Data “cat” Training Dataset “dog” “cat” Logical Data Warehouse Edge Device, On- Premises or Cloud-Hosted On-Premises or Cloud-Hosted
  47. 47. Data Platform Service Event Stream Bulk Data Flow Bulk Source Event Source Location DB Extract File Weather DB IoT Data Mobile Apps Social File Import / SQL Import Consumer BI Apps Data Science Workbench Enterprise App Enterprise Data Warehouse SQL / Search SQL “Native” Raw RDBMS “SQL” / Search Service Event Hub Hadoop ClusterdHadoop ClusterBig Data Platform SQL Export Storage Storage Raw Refined/ UsageOpt Stream Processing Cluster Stream Processor Model / State Change Data Capture Edge Node Rules Event Hub Storage Governance Data Catalog Parallel Processing Query Engine Event Stream Event Stream Modern Data Platform ML Inference Server Microservice Cluster Microservice Data { } API ML Inference Server
  48. 48. AI & Machine Learning: Model Training & Deployment
  49. 49. Backing Service Integration of Machine Learning Model in application Trained ML Model Trained ML Model ML Serving ML Serving Application Trained ML Model ML Serving Application MLasanAPI MLinApplication Trained ML Model ML Serving Trained ML Model ML Serving Application Event Hub MLandStreamProcessing Event Hub Application MLasaCloudService Trained ML Model ML Serving

×