TMW Systems, A TRIMBLE Company, is the industry-leading transportation management software. 3PLs, brokers, distribution and supply operations, dedicated and private fleets, commercial carriers, and energy service providers rely on our transportation management systems, our fleet maintenance management software, or our routing and scheduling software to make them more efficient and profitable. Billions of data points exist in the trucking industry, and we at TMW Systems are pioneers of tracking millions of trucks, freights, and assets.
The architecture team at TMW leverages Nifi and SAM to deliver the immense volume of data in real-time. In this session, you will get a thorough understanding of all the streaming components. We have utilized Apache Kafka, Apache Nifi, and Streaming Analytics Manager to build our real-time data pipeline. We will also discuss the real-time event processing using SAM and Schema Registry. Lastly, we will show custom processors in Nifi and SAM that helped us with complex event processing.
Speaker
Krishna Potluri, TMW Systems, A Trimble Company, Big Data Architect
Donnie Wheat, Trimble, Senior Big Data Architect
3. Safe Harbor Notice
The information presented is for informational purposes only and should not
be relied upon in making a purchasing decision. Trimble is under no legal
obligation to deliver any future products, features or functions within any
specified time frame, if at all. Release dates and content are subject to
change at Trimble’s sole discretion.
3
5. Transportation Industry
▪ Freight is moved via Truck, Trains, Rail, Ferry, etc,
and any Combination
▪ Trucks carries 10.55B tons of freight annually,
70.9% of 14.88B total (ATA)
▪ Shippers increasing demand for visibility of status
and estimation
▪ Industry continues to rely on 1980s EDI technology
▪ Most carriers running Transportation Management
Systems on in house Databases
5
7. Visibility, Historically Speaking
▪ Common Surface Transportation Issues
– Manual Customer Service Process
– No Proactive, Reliable Notifications
– Dynamic ETAs Not Available
– Stale Transit Data
– Lack Of Shipment Visibility
7
9. Transportation Visibility
➢ Truck Check Calls send multiple times
per hour
➢ End-to-end Visibility With Automated,
Geo-fenced Notifications
➢ Dynamic ETAs
➢ Proactive Customer Service Interaction
➢ Real-time Transit Data
➢ Full Shipment Visibility
9
10. Technical Requirements
▪ Streaming data application
– If data is not a stream, make
it a stream
▪ Source data from
– Database
– Web services
– Message bus
▪ Rapid development
▪ Start small and grow
infrastructure with data growth
10
11. Processing Approach
▪ Minimal Client Impact, heavy lifting in SaaS world
▪ Customers store order data in 10-20 tables in Relational
Database
▪ Collect key data elements from customer database for
lookup and processing
▪ Receive updates from customer every few minutes as
customer desired
▪ As Trucks move, check calls are sent
– Look up order details
– Provide Visibility
▪ Zero touch client side for new functionality
11
Look Order Data
Truck + Order
Visibility
Phoenix
Customer
DB
Check Calls
Constant
Updates
13. Data Reality
13
▪ 3 Nifi, 3 Kafka, 4 HDFS/RegionServers VMs
– Originally 1 Nifi, 1 Kafka, 3 HDFS/RegionServers
▪ 2,700,000 records saved per day average
▪ 700,000 Check Calls processed per day average
▪ 9,000,000 records initial data set per customer average
▪ 100,000,000 records saved maximum in a day (with smaller setup)
▪ 330,000,000 records stored in Phoenix
▪ 687 ms average process time for each Check Call
– 4-8 Phoenix database reads
▪ 12-21 ms average
– 2 MSSQL configuration reads
▪ 150 ms average
▪ 47 ms Phoenix record save average
16. Apache NiFi
▪ Processors handle CRUD and
conversions of data
▪ Expression Language adds incredible
flexibility
▪ JSON Jolt makes for most JSON
processing
▪ Few custom components, but custom
components are easy to add
▪ Script capable to handle moderate
complexity
16
17. NiFi Optimization
▪ Enable Higher Concurrent Tasks for
intensive processors
▪ NiFi automatically balances where
threads go
▪ Increase threads in controller settings
to optimize concurrency
▪ Real time and historical visibility for
performance improvement
▪ Balance Thread Pool size against
Database Pool size
17
18. Micro Nifi Apps
▪ Begin and End Process Group with
Kafka Queue
▪ Process Group Focussed on simple
data flows, solve simple problems
▪ Taking micro-service concept to Nifi
▪ No master flow, simply manage
Kafka Queues, consumers and
producers
18
19. HDF Application
▪ Kafka allows data ingestion from services
– Used to scale NiFI processing across the cluster
– Enables Micro NiFi Apps to handle specific processing
▪ Schema Registry
– Schema with version control
– Seamless integration with Nifi, Kafka, and SAM
▪ SAM
– Easy Ingestion to Hbase, Druid
– Easy to scale it to millions of transactions
– Custom processors capabilities
– Event/Rules driven workflow
19
20. HDP Integration
▪ Phoenix / HBase for storage fast access storage
– 330,000,000+ records persistently stored in first 6 months
▪ Phoenix Indexes provide significant Query Performance
improvement
– Optimized Indexes for reference data, 1 to many lookup
– Sequence of columns in index crucial to performance
– Primary Key is efficient for 1 to 1 lookup of columns
▪ Hive for archive and Data Science Access
20
21. Custom NiFi Processor
▪ Custom Processor: JDBC Results To Attributes
▪ Flow required quickly lookup referential data
from Phoenix
▪ Reading straight to attribute increases
performance, reduces flow complexity.
▪ Planned replaced by Ignite cache, but sped
time to market
21
22. Custom and 3rd Party
▪ Data Collector
– Change Data Capture aware
– Multiple database type support
– Converts database data to events in messages
▪ Java APIs
– Manage centralized configuration of Data Collection
– Ability to configure data to collect per customer
– Zero touch remote sites
▪ Trimble Identity with WSO2
– API Gateway
– Identity Management
22
23. Deployment model
▪ Azure environment
▪ Cloudbreak Deployment
– Deploy HDP to Azure Resource group
– Customize Template to add HDF components as Compute Nodes
▪ Dockerized Deployment
– Microservices
– ESB, API Gateway
– Trimble Identity & Authorization
23
25. HDF Successes
▪ Out of the Box Nifi has processors for pretty much everything
▪ First customer processing with-in 120 days
▪ Nifi for data flow, but also data warehousing
– Used Nifi to collect reporting metrics and make available in MSSQL
Data Warehouse
▪ Performance
– Initial 6 node cluster processed over 100 million records in a day
▪ Bug forced select clients to re-push full database
▪ Each record processed by minimum 10 NiFi processors
▪ 1 Billion NiFi Tasks
▪ 4 Core, 14 GB Ram - Small Machines
▪ 1 NiFi, 3 Datanodes for Phoenix
25
26. HDF Challenges
26
▪ Initial workflows are long and sequential
– Breaking into Micro NiFi apps
– Leveraging Kafka for simpler flows
▪ Phoenix coupling to HBase requires re-thinking databases
– Manage Security In HBase
– JOIN Optimization for complex queries
– Small cluster increases difficulty
▪ SAM - Feature rich DIY abilities, we needed fast
development, relied on Nifi
28. SAM Custom Processors
1. SqlServerEnrichmentProcessor
2. SqlServerEnrichmentCacheableProcessor (Cacheable and
with Hikari Pool)
3. PhoenixEnrichmentProcessor
4. PhoenixEnrichmentCacheableProcessor
5. JSONTransformationProcessor
6. RestApiSinkCustomProcessor
28
29. Apache Phoenix JOIN Optimization
29
▪ Traditional JOIN of 2 Large Datasets create timeouts
▪ Indexing did not improve performance
▪ Subqueries did not improve performance
▪ Traditional Query
– SELECT A.NAME, B.REFERENCE
FROM A
INNER JOIN B ON A.ID = B.ID
WHERE A.ID = <SOME_ID>
▪ JOIN to query with reduced data set
– SELECT A.NAME , B.REFERENCE
FROM A
LEFT JOIN (SELECT B.REFERENCE FROM B WHERE B.ID = <SOME_ID>) AS B ON B.ID = A.ID
WHERE A.ID = <SOME_ID>
30. Adding Master Data Management
▪ Applied to internal and
customer data
▪ Visibility is also required for
stakeholders
▪ Created NiFi flows to harvest
operational data
▪ Aggregated data sent to cloud
database for executive reports
30
31. Next Steps
▪ Better Data Warehouse and Data Science Integration
▪ Full integration to Ignite for lookups for complex processing
▪ Integration of additional Source Data
▪ Add additional Visibility Providers
31