In recent years, Deep Neural Networks (DNNs) have rapidly advanced and reached a sufficiently mature state to be adopted in real-world applications. Especially, as DNNs solve difficult problems in computer vision (e.g., image classification and object detection), community has started exploring the use of DNNs in real-time or nearly real-time vision applications. Video content analysis (VCA) is one such application that often utilizes DNNs as its core engine and offers plentiful capabilities for a wide range of domains including safety and security, flame and smoke detection, automotive, health-care, home automation, and retail. Apache Spark Streaming has been the de-facto standard platform where real-time Big data applications such as VCA are run at hyper-scale. While the integration of DNNs and real-time vision applications promise ample opportunities for Spark Streaming community, the massive compute demand to accommodate (1) the ever-increasing DNN model size, and (2) the growing scale of data (e.g., billions of high-resolution video data) significantly limits its practicality. In this work, we seek to address this challenge and provide a solution to meet this gigantic compute demand by leveraging FPGA acceleration. We develop SparkWeaver, a full-stack solution that, from DNN-based real-time vision applications (e.g., VCA), automatically offloads the heavy DNN computations to our FPGA accelerators without the developers' intervention. We use FPGAs as our DNN acceleration platform since they not only offer low inference-latency and high power-efficiency, oftentimes required for real-time vision applications, but also provide a programmable substrate for acceleration of non-DNN components of the applications. To demonstrate the easy use of the solution, we will do a live-demo that shows the SparkWeaver's automated workflow that takes a DNN-based VCA application written using Spark Streaming APIs and runs the VCA application on a Spark cluster, while offloading DNN computations to FPGAs, without imposing additional manual efforts on the developers.
5. Applications with Ingest Bottleneck
● Many big data applications
○ Lots of raw data
○ Video surveillance
■ Industrial camera market is projected to increase 2.3x by 2024
[1]
■ For a 4k camera in 60fps, the amount of data per hour is 5.2
TB (1TB for 10fps)
○ Voice recognition
○ Fraud detection
5
1. https://www.gminsights.com/industry-analysis/ip-camera-market
6. Traditional Architecture does not Scale
6
Kafka
DNN
HDFS
Spark
Spark Streaming
- Online analytics
- Cross camera
Raw image frames
Batch processing
- Offline Training
- Offline analytics
Image features
1) Ingest stage
2) Data streaming
3) Online analytics
4) Offline analytics
7. Use Cases
● How many people went from the shoe department
to the jewelry department?
● How many people were observed walking around
the entire building on a given day?
7
Requires cross-camera online and
offline analytics
8. Traditional Architecture does not Scale
8
Kafka
DNN
HDFS
Spark
Spark
Streaming
Online analytics
Raw image frames
Batch processing
Offline analytics
Image features
Detection, Tracking, Anomaly detection, cross camera
Re-Identification, …
9. Semantic Compression with DNNs
● DNNs can be used for compression
○ Converting raw data into condensed, semantic data
○ For video analytics, we observed a ~5x
compression rate
9
1. https://www.gminsights.com/industry-analysis/ip-camera-market
10. Large Scale Image Processing
● Deep learning on traditional big data clusters
presents many challenges
○ Computationally intensive
■ Adds pressure to the entire ETL toolchain
○ Traditional CPUs are not ideal for evaluating DNN models
○ Doing many levels of DL processing on every input frame requires
■ Storing a lot of raw data
■ Storing and managing all interim data
10
13. DNNs with FPGA
● FPGAs are a good candidate
○ Faster than CPUs
○ More power efficient than GPUs
○ More programmable than ASICs
● Programmability
○ Need HDL
13
14. Solution: DNN+FPGA for Ingest
● Ingest only the data you need
○ Run DNNs on the edge
○ Condensed, meaningful features instead of raw, largely meaningless data
● Accelerate with FPGAs
○ Power efficient
○ Can be deployed with minimal infrastructure
○ Using DNNWEAVER technology for programmability
■ Compiler and full stack for automatic DNN acceleration
14
16. DNNWEAVER
● Ease DNN Deployment to FPGAs
○ Tensorflow and ONNX
○ No code changes
○ No hardware expertise needed
● Open source implementation based on original paper[1]
● Enterprise version under development by Bigstream
16
1: https://github.com/hsharma35/dnnweaver2
21. YOLO
Detection and Tracking with YOLO[1]
and Deep SORT[3]
[1] Redmon et al. “You Only Look Once, Unified, Real-Time Object Detection”
[2] https://github.com/thtrieu/darkflow
[3] Wojke et. al. “Simple Online and Real Time Tracking with a Deep Association Metric”
Bounding Boxes
Tagged Bounding
Boxes
[2] [3]
21
Deep SORT
Image sources:
23. SparkWeaver Architecture
23
Kafka
HDFS
Image
Features
Spark
Detection: YOLOv2
Tracking: Deep_SORT, ...
● Multiple cameras stream video
to an FPGA cluster
● FPGA clusters implement
YOLOv2 with DNNWEAVER
● YOLO’s image features are
streamed to a Kafka server
● Features are aggregated by
Spark and written to HDFS
● Detection performed on the fog
layer, tracking on the cluster
24. Single Node Max FPS
24
Benchmark FPS
Traditional Architecture 7.3
Detection only 10
Tracking only 12.8
YOLO on DNNWEAVER 13.2
SparkWeaver 12.8
*Dependent on the number of people in a frame
25. Next Release Max FPS (projected)
25
Benchmark FPS
SparkWeaver 46.1
*Dependent on the number of people in a frame
26. Single Node Compression Rate
• 5.5x (82%)
– Deep_SORT tracking only needs the pixels within the
bounding boxes, and their locations
26
29. Streaming Analysis
● Person Re-Identification[1]
○ Multiple solution (hot area of research)
○ Some solutions use pre-trained DNNs[2]
■ Generate a feature vector and apply similarity check on vectors
● Anomaly Detection[3]
○ Suspicious events/threats
29
1. Zheng et al “Person Re-identification: Past, Present, and Future”
2. Hermans et al “In Defense of the Triplet Loss for Person Re-Identification”
3. https://databricks.com/blog/2018/09/13/identify-suspicious-behavior-in-video-with-databricks-runtime-for-machine-learning.html
30. Use Case 1
How many people went from department X
to department Y?
30
select * from people_present where camera_id == X as peopleX
select * from people_present where camera_id == Y as peopleY
count(select person_id from peopleX inner join peopleY where
WITHIN_THRESHOLD(peopleX.exit, peopleY.enter))
people_present
person_id: INT,
camera_id: INT,
enter: TIMESTAMP,
exit: TIMESTAMP
31. Use Case 2
How many people were observed walking around the entire
building on a given day?
31
select id from unique_people where
count(select * from people_present where camera_id == CAMERA1) > 1 and
count(select * from people_present where camera_id == CAMERA2) > 1 and
count(select * from people_present where camera_id == CAMERA3) > 1 and
...
unique_people
id: INT
32. Conclusion
● DNN-optimized ingest
○ Smart compression
○ Use network resources on dense, highly meaningful data rather than sparse, raw
data
32
33. Conclusion
● FPGAs are well suited to DNN acceleration edge computing
○ Highly parallel
○ Power efficient
○ Can be deployed with minimal resources
○ DNNWEAVER can compile Tensorflow/ONNX to FPGA
● Not just DNNs, also infrastructure
○ Online and offline analytics
○ Moving data
33