Overview of project including building a distributed cluster from System on Chips(specifically XU4) with the goal of creating a massively parallel backtesting platform for quantitative finance. Project was a success
SoC HPC: Design, Optimization, and Application to Algorithmic Trading
1. SoC HPC: Design and Optimization
Mark Delgado
BS in Nuclear Engineering From NC State
Python User
2. Presentation Topics
Why?
SoC Choices and Economics
Software and IT Stack
Cluster Design and Decisions
Optimizations and Improvements
What has been done today
What will be done tomorrow
3. Not Presentation Topics
Calculation and Data Decisions
Data Acquisition
Parameter Selections
Application Strategies
Broker Selection and Integration
Heavy Quantitative Finance
Cluster Application Strategies
Source Code
5. Project Hypothesis
Can I build a system that can perform
massive amounts of calculations?
Can I then use this system to solve problems,
find relationships, and find strategies?
Can I build or modify the system to take any
strategies and apply them?
6. What Kind of System? What is HPC?
Titan Supercomputer
7. Titan Economics
18,688 Nodes with 16 Cores per Node
299,008 Total CPU Cores
18,688 GPUs
Total Cost: $97,000,000
Individual Unit? Only $15,000!
9. SoC vs Server?
XU4 Energy Requirements:
20 Watts
Server Energy Requirements:
750 Watts
Total Yearly Cost of Server
~$725
Total Yearly Cost of XU4
~$89
11. Software and IT Stack
1 Gbps switch
Cat6 Cables
1 Gbps Supporting SoC and Laptop
Configured and Mounted NFSv4 Folders and
Partitions
SSH access
12. Software and IT Stack
What is the System Being Designed for?
Ease of Use and Support?
Less Ease, less support, more performance?
13. Software and IT Stack
Pure and Raw Performance?
Less Support, More Difficult to Use
Difficult to Setup, Difficult to Hand-off
5-10% increase, modern software
14. Software and IT Stack
Languages Used:
Python, Cython, C/C++
Message Passing
OpenMPI, 0mq
Networking
0mq
Database
MongoDB
15. Software and IT Stack
Python Modules:
Message Passing
Pyzmq, mpi4py
Networking
pyzmq
Database
PyMongo
All Modules found on PIP!
17. Cluster Design and Decisions
The Buy Strategy: MACD Cross Over
The Sell Strategy: TP/SL
Timeframe: Weekly
Data Resolution: Minute
Question: Using a MACD Cross Over as a buy
strategy, and a TP/SL as a sell strategy, is
there a combination that yields higher ROI vs
the weekly ROI of that equity?
25. Cluster Design and Decisions
Different Designs Yield Different Results
Control time = 0.6s
Pub/Sub = ~1s = 11.1 years
Pub/Sub/Modified = 0.83s = 10.2 years
Pub/Sub/Modified/Parallel = 0.78s = 9.5 years
26. Cluster Design and Decisions
Lesson 4: Cython isn’t always the answer
Still slow, worth exploring?
27. Cluster Design and Decisions
Different types of clusters for different
problems
Previous cluster designs = Centralized
Streaming and Centralized Storage
28. Cluster Design and Decisions
Introducing Decentralized Streaming and
Centralized Storage
29. Cluster Design and Decision
Lesson 5: Good Memory Management = Good
Results
30. Cluster Design and Decision
Removing the network stream reduces the
data transmission time to 0s
New Calculation time = 1s
New Total time = 5.56years
31. Optimizations and Improvements
Lesson 6: Profile Profile Profile
What are the pain points in the algo?
Given the current algo design, what can be
ported to C/Cython?
Are the parameters ‘good’ ?
32. Optimizations and Improvements
Choosing ‘good’ parameters = .5s
New time = 2.78 years
Exporting math to C/Cython = .2s
New time = 1.1 years
Combining C/Cython and Pypy = .09s
New time = 0.5 years
Choosing ‘actually good’ parameters = .06s
***Speculating***
New Time = .33 years
34. Optimizations and Improvements
Total Calculation number = 98**3*10**2 =
94,119,200 = C
Decrease Resolution of C = Cn
Cn = C*.99
New time after Cn = .021 years
35. Optimizations and Improvements
Lesson 7: IT Automation is Awesome!
Especially when applied to math!
Use IT automation to determine new values of
Cn and automatically parallelize calculations
New time=***0.005-0.01 years***
37. Optimizations and Improvements
What did we just do?
S(M1,M2,M3,TP,SL)
M1=x1→x1*
M2=x2→x2*
M3=x3→x3*
TP=x4→x4*
SL=x5→x5*
38. What Has Been Done Today
Everything except Pypy and C/Cython merge,
IT Automation, and IT Automation + Math
What can I show you?
Fully functioning cluster without automation
Real performance differences between Python and
Pypy
NFS to aggregate the results
39. What Will Be Done Tomorrow?
Pypy and C/Cython merge, IT Automation,
and IT Automation + Math
Pandas to handle data
Matplotlib to graph potential strategies