Economies of scale achieved by Infrastructure as a Service (IaaS) clouds has enabled organizations to elastically scale their stream processing applications to public clouds. However, current approaches for elastic stream processing do not sufficiently address the potential security vulnerabilities in cloud environments. This presentation explains how to do secure elastic data stream processing within latency guarantees in such a way that third parties do not get access to the encrypted data. Homomorphic encryption allows for computing over encrypted data without access to the secret key. Results from homomorphic computation remains encrypted and can only be decrypted at the receiving end. Using a real world test setup, which includes an email filter benchmark and a web server access log processor benchmark (EDGAR) on WSO2 Stream Processor we demonstrate the effectiveness of our approach. We compare latency values of operating on premise vs elastic scaling to public cloud. Experiments on Amazon EC2 indicate that the proposed approach provides significant results which is 10% to 17% improvement of average latency in the case of email filter benchmark and EDGAR benchmarks respectively. See the following research paper for more information.
http://miyurud.github.io/papers/2019/dasfaa2019-homomorphic-encryption.pdf
Privacy Preserving Elastic Stream Processing with Clouds using Homomorphic Encryption
1. Privacy Preserving Elastic Stream
Processing with Clouds Using
Homomorphic Encryption
Arosha Rodrigo, Miyuru Dayarathna, and Sanath Jayasena
WSO2, Inc, USA
University of Moratuwa, Sri Lanka
24th International Conference on Database Systems for Advanced Applications (DASFAA 2019)
Chiang Mai, Thailand
25/4/2019
2. Introduction
• Data Stream Processing Conducts Online
Analytical Processing on Streams of Data
2
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
Detecting Malicious Attacks to a Web Server
(unauthorizedCount + forbiddenCount) / totalAccessCount > 0.3
In last 30 seconds
3. Introduction - Applications of Stream
Processing
3
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
Heart rate monitoring
https://blogs.mathworks.com/iot/20
17/10/27/collecting-resting-heart-rat
e-data-using-thingspeak/
Highway toll processing
http://www.cs.brandeis.edu/~linearr
oad/
Weather sensor networks
http://hpwren.ucsd.edu/news/20070911/
4. Introduction - WSO2 Stream Processor
4
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
WSO2 Stream Processor is an
open source, cloud native and
lightweight stream processing
platform that understands
streaming SQL queries in order
to capture, analyze, process
and act on events in real time.
https://wso2.com/analytics-and-stream-processing/
5. Introduction - Siddhi - Streaming and
Complex Event Processing engine
5http://siddhi.io/
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
6. Introduction (Contd.)
• Stream processing systems often face resource
limitations due to unexpected workloads [1]
• With time the stream processor deployment may
expand
• Possible Solutions
– Load shedding
– Approximate query processing
– Elastically scale into external cluster
6
[1] Miyuru Dayarathna and Toyotaro Suzumura, "A Mechanism for Stream Program Performance Recovery in Resource Limited Compute Clusters",
DASFAA 2013 (The 18th International Conference on Database Systems for Advanced Applications), Wuhan, China, 2013/4
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
7. Introduction - Elastic Switching
Mechanism (ESM)
ESM reduces the overall
average latency of event
processing jobs by
significant amount
considering the cost of
operating the system
7
Sajith Ravindra, Miyuru Dayarathna, and Sanath Jayasena. 2017. Latency Aware Elastic Switching-based Stream Processing Over
Compressed Data Streams. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering
(ICPE '17). ACM, New York, NY, USA, 91-102
(Ravindra et al. ACM/SPEC ICPE '17)
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
8. Research Problem
• Privacy preserving data stream processing
– There can be situations such as,
• You do not want to send sensitive data to public cloud
• But you need to elastically scale due to resource limitations of the on
prem deployment.
– Sending data as encrypted over the network and decrypting at the server is
not completely secure.
8
Stream Processor in
Private Cloud
Stream Processor
in Public Cloud
Input
Stream
Output
Stream
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
9. Fully Homomorphic Encryption (FHE)
• FHE is an advanced encryption technique which allows
data to be stored and processes in encrypted form.
• Current FHE techniques are computationally expensive
– Need additional space for keys and cypher texts
9
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
10. Selected FHE Implementation
• HElib - https://github.com/homenc/HElib
• Microsoft SEAL - https://github.com/Microsoft/SEAL
• PALISADE - https://git.njit.edu/palisade/PALISADE
• HEAAN - https://github.com/snucrypto/HEAAN
• Experiments done with HElib indicated that it is
practical to implement applications such as stream Data
Processing.
10
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
11. Proposed Solution
Elastic scaling in hybrid cloud scenario with
privacy preserving data stream processing.
11
Stream Processor in
Private Cloud
Stream Processor
in Public Cloud
Input
Stream
Output
Stream
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
12. Related Work
• Elastic Stream Processing
• Data Stream Compression
12
Hummer et al. [13] - Elastic Stream Processing on top of Cloud Computing
Stormy [18] - Stream processing as a service
Cervion et al. [2] - Adaptive provisioning of VMs for streaming system to scale up/down
Cuzzocrea et al. [13] - lossy compression method for efficient OLAP
[13] W. Hummer, B. Satzger, and S. Dustdar. Elastic stream processing in the cloud. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 3(5):333–345, 2013
[18] S. Loesing, M. Hentschel, T. Kraska, and D. Kossmann. Stormy: An elastic and highly available streaming service in the cloud. In Proceedings of the 2012 Joint EDBT/ICDT Workshops,
EDBT-ICDT ’12, pages 55–60, New York, NY, USA, 2012. ACM.
[2] J. Cervino, E. Kalyvianaki, J. Salvachua, and P. Pietzuch. Adaptive provisioning of stream processing systems in the cloud. In Data Engineering Workshops (ICDEW), 2012 IEEE 28th International
Conference on, pages 295–301, April 2012.
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
13. Related Work (Contd.)
• Performance optimizations of homomorphic
encryption
13
Dai et al. [4] on cuHE - Use of GPUs to accelerate computations in homomorphic level
Shaon [22] - secure data analytics in untrusted cloud setup
[4] W. Dai and B. Sunar. cuhe: A homomorphic encryption accelerator library. Cryptology ePrint Archive, Report 2015/818, 2015. https://eprint.iacr.org/2015/818.
[22] F. Shaon, M. Kantarcioglu, Z. Lin, and L. Khan. Sgx-bigmatrix: A practical encrypted data analytic framework with trusted processors. In Proceedings of the 2017 ACM SIGSAC Conference on
Computer and Communications Security, CCS ’17, pages 1211–1228, New York, NY, USA, 2017. ACM.
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
14. HomoESM System Design
• Scheduler - Collects events from input
stream
• Encryptor
• Publisher - Publishes the events to public
cloud
• Public Cloud Stream Processor
– Homomorphic CEP Engine
– Homomorphic Encryption Extension
– HElib API
• Receiver - Receives data from both private
and public stream processors
• Decryptor - Decrypts events from public
cloud stream processor
• Profiler - Measures the performance
information of the output events stream
14
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
15. Switching Function
15
- VM startup Latency
- The threshold for total amount of data received by the VM from private cloud
tt - 1 t + 1
Timeline
𝜏 - Time period to wait until the VM startup process triggers
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
Binary switching
Function for single
VM
17. Implementation - Lower Level
Optimizations
17
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
18. Implementation - Composite Event Encode
Worker
18
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
A thread pool which handles event
Encryptions and composite event creation.
19. Implementation - Composite Event Encode
Worker (Contd.)
• Combines non-operational fields of
each of the plain events in the queue
• Converts operational fields into
binary form and combines them
together
• Pads the operational fields with zeros
• Performs encryption on the
operational fields and puts the newly
created composite event into the
queue
19
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
20. Implementation - Encrypted Events
Publisher
20
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
21. Implementation - Encrypted Events
Publisher (Contd.)
• Periodically checks for
encrypted events in
the encrypted queue.
• If there are encrypted
events, they are sent at
once to public stream
processor.
21
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
22. Implementation - Lower Level
Optimizations
22
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
23. Implementation -
Event Receiver
• If the received event has been
encrypted with homomorphic
encryption it sends the
composite event into ‘Composite
Event Decode Worker’
• If not it will reload the payload
data and calculate the
performance metrics.
23
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
24. Implementation - Lower Level
Optimizations
24
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
25. Implementation -
Composite Event
Decode Worker
Handles all
decomposition and
decryptions of the
composite event.
25
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
26. Benchmarks
• Email Filter - Benchmark based on Enron data
set
• EDGAR - Dataset published by Division of
Economic and Risk Analysis, USA
– Filter
– Comparison
– Add/Subtract
26
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
https://www.cs.cmu.edu/~./enron/
https://www.sec.gov/dera/data/edgar-log-file-data-set.html
27. Benchmarks - Email Filter
27
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
28. Benchmarks - EDGAR Filter
28
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
29. Input Data Rate Variation of the Two
Benchmarks
29
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
30. Evaluation - Experiment Setup
30
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
Experiments were done using three Virtual Machines (VMs) in Amazon EC2 cloud.
Evaluation Criteria: Can the average latency of our stream processing jobs be reduced through
homomorphic elastic scaling?
31. Email Filter Benchmark
31
Homomorphic elastic scaling provides 10.24% reduction of average latency per event.
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
32. EDGAR Filter Benchmark
32
Homomorphic elastic scaling provides 17% reduction of average latency per event.
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
33. EDGAR Comparison Benchmark
33
Homomorphic elastic scaling provides 3% reduction of average latency per event.
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
34. EDGAR Add/Subtract Benchmark
34
Homomorphic elastic scaling provides 3.68% reduction of average latency per event.
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
Compared to equal only operation, less than & greater than operations consume more XOR and AND gates
operations in the Homomorphic Encryption level.
35. Conclusion
• We presented the elastic privacy preserving data
stream processing
– Real world implementation - HomoESM
– Considerable performance benefits - 10 - 17%
improvement for filter operation
• The level of performance gain obtained varies based on
the types of operations conducted.
35
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM
36. Futurework
• Handle more complicated stream processing
operations.
• Experiment with multiple query based tuning
for privacy preserving elastic scaling.
• Fine tune the switching mechanism to handle
edge cases.
• Applications in cloud native environments.
36
25/4/2019 24th International Conference on Database Systems for Advanced Applications (DASFAA 2019) HomoESM