SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Real-time Freight Visibility
How Trimble uses NiFi and SAM to create sub-second transportation visibility
Krishna Potluri and Donnie Wheat
1
Agenda
▪ Transportation Industry Overview
▪ Adding Visibility To Transportation
▪ Reflections On HDF Application Development
2
Safe Harbor Notice
The information presented is for informational purposes only and should not
be relied upon in making a purchasing decision. Trimble is under no legal
obligation to deliver any future products, features or functions within any
specified time frame, if at all. Release dates and content are subject to
change at Trimble’s sole discretion.
3
Transportation Industry Overview
4
Transportation Industry
▪ Freight is moved via Truck, Trains, Rail, Ferry, etc,
and any Combination
▪ Trucks carries 10.55B tons of freight annually,
70.9% of 14.88B total (ATA)
▪ Shippers increasing demand for visibility of status
and estimation
▪ Industry continues to rely on 1980s EDI technology
▪ Most carriers running Transportation Management
Systems on in house Databases
5
The Shipper Dilemma
6
Visibility, Historically Speaking
▪ Common Surface Transportation Issues
– Manual Customer Service Process
– No Proactive, Reliable Notifications
– Dynamic ETAs Not Available
– Stale Transit Data
– Lack Of Shipment Visibility
7
Adding Visibility To Transportation
8
Transportation Visibility
➢ Truck Check Calls send multiple times
per hour
➢ End-to-end Visibility With Automated,
Geo-fenced Notifications
➢ Dynamic ETAs
➢ Proactive Customer Service Interaction
➢ Real-time Transit Data
➢ Full Shipment Visibility
9
Technical Requirements
▪ Streaming data application
– If data is not a stream, make
it a stream
▪ Source data from
– Database
– Web services
– Message bus
▪ Rapid development
▪ Start small and grow
infrastructure with data growth
10
Processing Approach
▪ Minimal Client Impact, heavy lifting in SaaS world
▪ Customers store order data in 10-20 tables in Relational
Database
▪ Collect key data elements from customer database for
lookup and processing
▪ Receive updates from customer every few minutes as
customer desired
▪ As Trucks move, check calls are sent
– Look up order details
– Provide Visibility
▪ Zero touch client side for new functionality
11
Look Order Data
Truck + Order
Visibility
Phoenix
Customer
DB
Check Calls
Constant
Updates
Data Estimation
12
Data Reality
13
▪ 3 Nifi, 3 Kafka, 4 HDFS/RegionServers VMs
– Originally 1 Nifi, 1 Kafka, 3 HDFS/RegionServers
▪ 2,700,000 records saved per day average
▪ 700,000 Check Calls processed per day average
▪ 9,000,000 records initial data set per customer average
▪ 100,000,000 records saved maximum in a day (with smaller setup)
▪ 330,000,000 records stored in Phoenix
▪ 687 ms average process time for each Check Call
– 4-8 Phoenix database reads
▪ 12-21 ms average
– 2 MSSQL configuration reads
▪ 150 ms average
▪ 47 ms Phoenix record save average
Transportation Data Flow Architecture
14
Analytics
HDF Architecture
DATA
PROVIDERS/
CONSUMERS
TRIMBLE IDENTITY &
AUTHORIZATION
ENTERPRISESERVICEBUS
APIGateway
MICRO-
SERVICES
CollectConfigConsume
HADOOP CLUSTER
Apache NiFi
▪ Processors handle CRUD and
conversions of data
▪ Expression Language adds incredible
flexibility
▪ JSON Jolt makes for most JSON
processing
▪ Few custom components, but custom
components are easy to add
▪ Script capable to handle moderate
complexity
16
NiFi Optimization
▪ Enable Higher Concurrent Tasks for
intensive processors
▪ NiFi automatically balances where
threads go
▪ Increase threads in controller settings
to optimize concurrency
▪ Real time and historical visibility for
performance improvement
▪ Balance Thread Pool size against
Database Pool size
17
Micro Nifi Apps
▪ Begin and End Process Group with
Kafka Queue
▪ Process Group Focussed on simple
data flows, solve simple problems
▪ Taking micro-service concept to Nifi
▪ No master flow, simply manage
Kafka Queues, consumers and
producers
18
HDF Application
▪ Kafka allows data ingestion from services
– Used to scale NiFI processing across the cluster
– Enables Micro NiFi Apps to handle specific processing
▪ Schema Registry
– Schema with version control
– Seamless integration with Nifi, Kafka, and SAM
▪ SAM
– Easy Ingestion to Hbase, Druid
– Easy to scale it to millions of transactions
– Custom processors capabilities
– Event/Rules driven workflow
19
HDP Integration
▪ Phoenix / HBase for storage fast access storage
– 330,000,000+ records persistently stored in first 6 months
▪ Phoenix Indexes provide significant Query Performance
improvement
– Optimized Indexes for reference data, 1 to many lookup
– Sequence of columns in index crucial to performance
– Primary Key is efficient for 1 to 1 lookup of columns
▪ Hive for archive and Data Science Access
20
Custom NiFi Processor
▪ Custom Processor: JDBC Results To Attributes
▪ Flow required quickly lookup referential data
from Phoenix
▪ Reading straight to attribute increases
performance, reduces flow complexity.
▪ Planned replaced by Ignite cache, but sped
time to market
21
Custom and 3rd Party
▪ Data Collector
– Change Data Capture aware
– Multiple database type support
– Converts database data to events in messages
▪ Java APIs
– Manage centralized configuration of Data Collection
– Ability to configure data to collect per customer
– Zero touch remote sites
▪ Trimble Identity with WSO2
– API Gateway
– Identity Management
22
Deployment model
▪ Azure environment
▪ Cloudbreak Deployment
– Deploy HDP to Azure Resource group
– Customize Template to add HDF components as Compute Nodes
▪ Dockerized Deployment
– Microservices
– ESB, API Gateway
– Trimble Identity & Authorization
23
Reflections On HDF Application Development
24
HDF Successes
▪ Out of the Box Nifi has processors for pretty much everything
▪ First customer processing with-in 120 days
▪ Nifi for data flow, but also data warehousing
– Used Nifi to collect reporting metrics and make available in MSSQL
Data Warehouse
▪ Performance
– Initial 6 node cluster processed over 100 million records in a day
▪ Bug forced select clients to re-push full database
▪ Each record processed by minimum 10 NiFi processors
▪ 1 Billion NiFi Tasks
▪ 4 Core, 14 GB Ram - Small Machines
▪ 1 NiFi, 3 Datanodes for Phoenix
25
HDF Challenges
26
▪ Initial workflows are long and sequential
– Breaking into Micro NiFi apps
– Leveraging Kafka for simpler flows
▪ Phoenix coupling to HBase requires re-thinking databases
– Manage Security In HBase
– JOIN Optimization for complex queries
– Small cluster increases difficulty
▪ SAM - Feature rich DIY abilities, we needed fast
development, relied on Nifi
SAM Integration
27
SAM Custom Processors
1. SqlServerEnrichmentProcessor
2. SqlServerEnrichmentCacheableProcessor (Cacheable and
with Hikari Pool)
3. PhoenixEnrichmentProcessor
4. PhoenixEnrichmentCacheableProcessor
5. JSONTransformationProcessor
6. RestApiSinkCustomProcessor
28
Apache Phoenix JOIN Optimization
29
▪ Traditional JOIN of 2 Large Datasets create timeouts
▪ Indexing did not improve performance
▪ Subqueries did not improve performance
▪ Traditional Query
– SELECT A.NAME, B.REFERENCE
FROM A
INNER JOIN B ON A.ID = B.ID
WHERE A.ID = <SOME_ID>
▪ JOIN to query with reduced data set
– SELECT A.NAME , B.REFERENCE
FROM A
LEFT JOIN (SELECT B.REFERENCE FROM B WHERE B.ID = <SOME_ID>) AS B ON B.ID = A.ID
WHERE A.ID = <SOME_ID>
Adding Master Data Management
▪ Applied to internal and
customer data
▪ Visibility is also required for
stakeholders
▪ Created NiFi flows to harvest
operational data
▪ Aggregated data sent to cloud
database for executive reports
30
Next Steps
▪ Better Data Warehouse and Data Science Integration
▪ Full integration to Ignite for lookups for complex processing
▪ Integration of additional Source Data
▪ Add additional Visibility Providers
31

Weitere ähnliche Inhalte

Was ist angesagt?

Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPDatabricks
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...Databricks
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationDataWorks Summit
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation Brett VanderPlaats
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWKent Graziano
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...DataWorks Summit/Hadoop Summit
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesAmazon Web Services
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxCalvinSim10
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®confluent
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data PipelineJesus Rodriguez
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022HostedbyConfluent
 

Was ist angesagt? (20)

Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software Integration
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI Initiatives
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data Pipeline
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
 

Ähnlich wie Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub-second transportation visibility

How to run a bank on Apache CloudStack
How to run a bank on Apache CloudStackHow to run a bank on Apache CloudStack
How to run a bank on Apache CloudStackgjdevos
 
Hhm 3474 mq messaging technologies and support for high availability and acti...
Hhm 3474 mq messaging technologies and support for high availability and acti...Hhm 3474 mq messaging technologies and support for high availability and acti...
Hhm 3474 mq messaging technologies and support for high availability and acti...Pete Siddall
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...HostedbyConfluent
 
Slides: Start Small, Grow Big with a Unified Scale-Out Infrastructure
Slides: Start Small, Grow Big with a Unified Scale-Out InfrastructureSlides: Start Small, Grow Big with a Unified Scale-Out Infrastructure
Slides: Start Small, Grow Big with a Unified Scale-Out InfrastructureNetApp
 
How Liberty Mutual Moves toward Real-Time Financial Closing
How Liberty Mutual Moves toward Real-Time Financial ClosingHow Liberty Mutual Moves toward Real-Time Financial Closing
How Liberty Mutual Moves toward Real-Time Financial ClosingAmazon Web Services
 
Row #9: An architecture overview of APNIC's RDAP deployment to the cloud
Row #9: An architecture overview of APNIC's RDAP deployment to the cloudRow #9: An architecture overview of APNIC's RDAP deployment to the cloud
Row #9: An architecture overview of APNIC's RDAP deployment to the cloudAPNIC
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analyticskgshukla
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...confluent
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineData Con LA
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
Modernizing your Application Architecture with Microservices
Modernizing your Application Architecture with MicroservicesModernizing your Application Architecture with Microservices
Modernizing your Application Architecture with Microservicesconfluent
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Alluxio, Inc.
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...HostedbyConfluent
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaFlink Forward
 
Pulsar - Real-time Analytics at Scale
Pulsar - Real-time Analytics at ScalePulsar - Real-time Analytics at Scale
Pulsar - Real-time Analytics at ScaleTony Ng
 
AME-1934 : Enable Active-Active Messaging Technology to Extend Workload Balan...
AME-1934 : Enable Active-Active Messaging Technology to Extend Workload Balan...AME-1934 : Enable Active-Active Messaging Technology to Extend Workload Balan...
AME-1934 : Enable Active-Active Messaging Technology to Extend Workload Balan...wangbo626
 
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...DataStax Academy
 
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...Prolifics
 
Data at Scale - Michael Peacock, Cloud Connect 2012
Data at Scale - Michael Peacock, Cloud Connect 2012Data at Scale - Michael Peacock, Cloud Connect 2012
Data at Scale - Michael Peacock, Cloud Connect 2012Michael Peacock
 

Ähnlich wie Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub-second transportation visibility (20)

How to run a bank on Apache CloudStack
How to run a bank on Apache CloudStackHow to run a bank on Apache CloudStack
How to run a bank on Apache CloudStack
 
Hhm 3474 mq messaging technologies and support for high availability and acti...
Hhm 3474 mq messaging technologies and support for high availability and acti...Hhm 3474 mq messaging technologies and support for high availability and acti...
Hhm 3474 mq messaging technologies and support for high availability and acti...
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
Slides: Start Small, Grow Big with a Unified Scale-Out Infrastructure
Slides: Start Small, Grow Big with a Unified Scale-Out InfrastructureSlides: Start Small, Grow Big with a Unified Scale-Out Infrastructure
Slides: Start Small, Grow Big with a Unified Scale-Out Infrastructure
 
How Liberty Mutual Moves toward Real-Time Financial Closing
How Liberty Mutual Moves toward Real-Time Financial ClosingHow Liberty Mutual Moves toward Real-Time Financial Closing
How Liberty Mutual Moves toward Real-Time Financial Closing
 
Row #9: An architecture overview of APNIC's RDAP deployment to the cloud
Row #9: An architecture overview of APNIC's RDAP deployment to the cloudRow #9: An architecture overview of APNIC's RDAP deployment to the cloud
Row #9: An architecture overview of APNIC's RDAP deployment to the cloud
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Modernizing your Application Architecture with Microservices
Modernizing your Application Architecture with MicroservicesModernizing your Application Architecture with Microservices
Modernizing your Application Architecture with Microservices
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
 
Pulsar - Real-time Analytics at Scale
Pulsar - Real-time Analytics at ScalePulsar - Real-time Analytics at Scale
Pulsar - Real-time Analytics at Scale
 
AME-1934 : Enable Active-Active Messaging Technology to Extend Workload Balan...
AME-1934 : Enable Active-Active Messaging Technology to Extend Workload Balan...AME-1934 : Enable Active-Active Messaging Technology to Extend Workload Balan...
AME-1934 : Enable Active-Active Messaging Technology to Extend Workload Balan...
 
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
 
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
 
Data at Scale - Michael Peacock, Cloud Connect 2012
Data at Scale - Michael Peacock, Cloud Connect 2012Data at Scale - Michael Peacock, Cloud Connect 2012
Data at Scale - Michael Peacock, Cloud Connect 2012
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 

Kürzlich hochgeladen (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 

Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub-second transportation visibility

  • 1. Real-time Freight Visibility How Trimble uses NiFi and SAM to create sub-second transportation visibility Krishna Potluri and Donnie Wheat 1
  • 2. Agenda ▪ Transportation Industry Overview ▪ Adding Visibility To Transportation ▪ Reflections On HDF Application Development 2
  • 3. Safe Harbor Notice The information presented is for informational purposes only and should not be relied upon in making a purchasing decision. Trimble is under no legal obligation to deliver any future products, features or functions within any specified time frame, if at all. Release dates and content are subject to change at Trimble’s sole discretion. 3
  • 5. Transportation Industry ▪ Freight is moved via Truck, Trains, Rail, Ferry, etc, and any Combination ▪ Trucks carries 10.55B tons of freight annually, 70.9% of 14.88B total (ATA) ▪ Shippers increasing demand for visibility of status and estimation ▪ Industry continues to rely on 1980s EDI technology ▪ Most carriers running Transportation Management Systems on in house Databases 5
  • 7. Visibility, Historically Speaking ▪ Common Surface Transportation Issues – Manual Customer Service Process – No Proactive, Reliable Notifications – Dynamic ETAs Not Available – Stale Transit Data – Lack Of Shipment Visibility 7
  • 8. Adding Visibility To Transportation 8
  • 9. Transportation Visibility ➢ Truck Check Calls send multiple times per hour ➢ End-to-end Visibility With Automated, Geo-fenced Notifications ➢ Dynamic ETAs ➢ Proactive Customer Service Interaction ➢ Real-time Transit Data ➢ Full Shipment Visibility 9
  • 10. Technical Requirements ▪ Streaming data application – If data is not a stream, make it a stream ▪ Source data from – Database – Web services – Message bus ▪ Rapid development ▪ Start small and grow infrastructure with data growth 10
  • 11. Processing Approach ▪ Minimal Client Impact, heavy lifting in SaaS world ▪ Customers store order data in 10-20 tables in Relational Database ▪ Collect key data elements from customer database for lookup and processing ▪ Receive updates from customer every few minutes as customer desired ▪ As Trucks move, check calls are sent – Look up order details – Provide Visibility ▪ Zero touch client side for new functionality 11 Look Order Data Truck + Order Visibility Phoenix Customer DB Check Calls Constant Updates
  • 13. Data Reality 13 ▪ 3 Nifi, 3 Kafka, 4 HDFS/RegionServers VMs – Originally 1 Nifi, 1 Kafka, 3 HDFS/RegionServers ▪ 2,700,000 records saved per day average ▪ 700,000 Check Calls processed per day average ▪ 9,000,000 records initial data set per customer average ▪ 100,000,000 records saved maximum in a day (with smaller setup) ▪ 330,000,000 records stored in Phoenix ▪ 687 ms average process time for each Check Call – 4-8 Phoenix database reads ▪ 12-21 ms average – 2 MSSQL configuration reads ▪ 150 ms average ▪ 47 ms Phoenix record save average
  • 14. Transportation Data Flow Architecture 14
  • 15. Analytics HDF Architecture DATA PROVIDERS/ CONSUMERS TRIMBLE IDENTITY & AUTHORIZATION ENTERPRISESERVICEBUS APIGateway MICRO- SERVICES CollectConfigConsume HADOOP CLUSTER
  • 16. Apache NiFi ▪ Processors handle CRUD and conversions of data ▪ Expression Language adds incredible flexibility ▪ JSON Jolt makes for most JSON processing ▪ Few custom components, but custom components are easy to add ▪ Script capable to handle moderate complexity 16
  • 17. NiFi Optimization ▪ Enable Higher Concurrent Tasks for intensive processors ▪ NiFi automatically balances where threads go ▪ Increase threads in controller settings to optimize concurrency ▪ Real time and historical visibility for performance improvement ▪ Balance Thread Pool size against Database Pool size 17
  • 18. Micro Nifi Apps ▪ Begin and End Process Group with Kafka Queue ▪ Process Group Focussed on simple data flows, solve simple problems ▪ Taking micro-service concept to Nifi ▪ No master flow, simply manage Kafka Queues, consumers and producers 18
  • 19. HDF Application ▪ Kafka allows data ingestion from services – Used to scale NiFI processing across the cluster – Enables Micro NiFi Apps to handle specific processing ▪ Schema Registry – Schema with version control – Seamless integration with Nifi, Kafka, and SAM ▪ SAM – Easy Ingestion to Hbase, Druid – Easy to scale it to millions of transactions – Custom processors capabilities – Event/Rules driven workflow 19
  • 20. HDP Integration ▪ Phoenix / HBase for storage fast access storage – 330,000,000+ records persistently stored in first 6 months ▪ Phoenix Indexes provide significant Query Performance improvement – Optimized Indexes for reference data, 1 to many lookup – Sequence of columns in index crucial to performance – Primary Key is efficient for 1 to 1 lookup of columns ▪ Hive for archive and Data Science Access 20
  • 21. Custom NiFi Processor ▪ Custom Processor: JDBC Results To Attributes ▪ Flow required quickly lookup referential data from Phoenix ▪ Reading straight to attribute increases performance, reduces flow complexity. ▪ Planned replaced by Ignite cache, but sped time to market 21
  • 22. Custom and 3rd Party ▪ Data Collector – Change Data Capture aware – Multiple database type support – Converts database data to events in messages ▪ Java APIs – Manage centralized configuration of Data Collection – Ability to configure data to collect per customer – Zero touch remote sites ▪ Trimble Identity with WSO2 – API Gateway – Identity Management 22
  • 23. Deployment model ▪ Azure environment ▪ Cloudbreak Deployment – Deploy HDP to Azure Resource group – Customize Template to add HDF components as Compute Nodes ▪ Dockerized Deployment – Microservices – ESB, API Gateway – Trimble Identity & Authorization 23
  • 24. Reflections On HDF Application Development 24
  • 25. HDF Successes ▪ Out of the Box Nifi has processors for pretty much everything ▪ First customer processing with-in 120 days ▪ Nifi for data flow, but also data warehousing – Used Nifi to collect reporting metrics and make available in MSSQL Data Warehouse ▪ Performance – Initial 6 node cluster processed over 100 million records in a day ▪ Bug forced select clients to re-push full database ▪ Each record processed by minimum 10 NiFi processors ▪ 1 Billion NiFi Tasks ▪ 4 Core, 14 GB Ram - Small Machines ▪ 1 NiFi, 3 Datanodes for Phoenix 25
  • 26. HDF Challenges 26 ▪ Initial workflows are long and sequential – Breaking into Micro NiFi apps – Leveraging Kafka for simpler flows ▪ Phoenix coupling to HBase requires re-thinking databases – Manage Security In HBase – JOIN Optimization for complex queries – Small cluster increases difficulty ▪ SAM - Feature rich DIY abilities, we needed fast development, relied on Nifi
  • 28. SAM Custom Processors 1. SqlServerEnrichmentProcessor 2. SqlServerEnrichmentCacheableProcessor (Cacheable and with Hikari Pool) 3. PhoenixEnrichmentProcessor 4. PhoenixEnrichmentCacheableProcessor 5. JSONTransformationProcessor 6. RestApiSinkCustomProcessor 28
  • 29. Apache Phoenix JOIN Optimization 29 ▪ Traditional JOIN of 2 Large Datasets create timeouts ▪ Indexing did not improve performance ▪ Subqueries did not improve performance ▪ Traditional Query – SELECT A.NAME, B.REFERENCE FROM A INNER JOIN B ON A.ID = B.ID WHERE A.ID = <SOME_ID> ▪ JOIN to query with reduced data set – SELECT A.NAME , B.REFERENCE FROM A LEFT JOIN (SELECT B.REFERENCE FROM B WHERE B.ID = <SOME_ID>) AS B ON B.ID = A.ID WHERE A.ID = <SOME_ID>
  • 30. Adding Master Data Management ▪ Applied to internal and customer data ▪ Visibility is also required for stakeholders ▪ Created NiFi flows to harvest operational data ▪ Aggregated data sent to cloud database for executive reports 30
  • 31. Next Steps ▪ Better Data Warehouse and Data Science Integration ▪ Full integration to Ignite for lookups for complex processing ▪ Integration of additional Source Data ▪ Add additional Visibility Providers 31