This document discusses analysis and improvements made to the Proof of Work (PoW) implementation used in IOTA. It begins with introductions of the authors and provides background on IOTA and how transactions work. It then analyzes the performance of different PoW implementations in the gIOTA library and identifies issues with the OpenCL version. A new library called dcurl is created to optimize PoW performance by enabling multi-threaded execution and leveraging different hardware. dcurl is integrated into IRI and shown to provide significant performance improvements over the existing PoW implementation when attached to the Tangle. Future work areas are also discussed.
[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation
1. Analysis and
Improvement of
IOTA PoW
Implementation
chenwei (魏禛)
<zhenwei.tw@gmail.com>
AndyYang (楊子賢)
<kukry5566@gmail.com>
March 10, 2018 / SITCON2018 1
2. chenwei (魏禛)
● From Tainan, Taiwan
● Study Master degree at National Taiwan University
● Recent work
○ Learning how to implement a interpreter
○ Learning Golang
○ Optimize Neural Network on multiple GPUs
● GitHub <https://github.com/chenwei-tw>
2
3. AndyYang (楊子賢)
● 來自台北
● 目前就讀台大資工所一年級
● 研究領域 :
○ 機器學習
○ 計算機結構
● Recent Work :
○ ReRam Based Accelerator for Convolutional Neural
Network
3
5. Brief Introduction to IOTA
● IRI (IOTA Reference Implementation)
○ Provides RESTful API to participate in Tangle
○ Exchange transactions with other nodes
○ Maintain Database for storing transactions
Referenced: “IOTA 輕量錢包、完整錢包與 IOTA Node 的關係”
<https://blog.louie.lu/2017/12/06/relationship-between-iota-light-wallet-
full-wallet-and-full-node/>
Referenced: “IOTA API Reference”
<https://iota.readme.io/v1.2.0/reference>
5
6. Brief Introduction to IOTA
● (Light) Wallet
○ 查詢餘額、收款、轉帳
○ 因為沒有運行完整的 Node,所以 Wallet 的資訊都必
須透過前述的 RESTful API 與一個 full node 做溝通
○ Before doing any operation with your wallet,
check host connected is available
Referenced: “IOTA 輕量錢包、完整錢包與 IOTA Node 的關係”
<https://blog.louie.lu/2017/12/06/relationship-between-iota-light-wallet-
full-wallet-and-full-node/>
6
8. How I get involved in
● <attachToTangle> in IRI
Referenced: “iotaledger/iri” <https://github.com/iotaledger/iri>
8
9. How I get involved in
● There are too many IOTA PoW Implementation hided
in these libraries
○ curl.lib.js
<https://github.com/iotaledger/curl.lib.js>
○ gIOTA <https://github.com/iotaledger/gIOTA>
○ ccurl <https://github.com/iotaledger/ccurl>
○ iota-pearldiver
<https://github.com/mlouielu/iota-pearldiver>
9
11. ● Alternative to Binary, Trinary is a base-3 numeral
system
● Trits: Analogous to bits, a ternary digit is a trit .The
digits may have the values 1, 0, or -1
● Trytes: A tryte consists of 3 trits, which can
represent 27 values.
○ in IOTA, trytes are represented as characters
'9,A-Z'.
Referenced: “IOTA Glossary” <https://iota.readme.io/docs/glossary>
Trinary Structure
11
13. ● 9 in tryte = {0,0,0} in trits
What is PoW (Proof Of Work)?
Referenced: “The Anatomy of a Transaction”
<https://domschiener.gitbooks.io/iota-
guide/content/chapter1/transactions-and-bundles.html>
...0000...0
MWM
Hash
13
20. ● 最直覺的想法便是重新把 OpenCL 實作改寫為 CUDA 後
再用 toolkit 的其中一項工具 nvprof 進行觀察
● 從下圖的結果,無法直接觀察到變慢的原因
自幹一發 CUDA !
Further Reading: “Profiler :: CUDA Toolkit Documentation”
<http://docs.nvidia.com/cuda/profiler-users-guide/index.html>
20
21. ● 後來在 github 找到另一個 Profiling Tool - uftrace, 這個
工具可以提供如:
○ Duration
○ TID
○ Times of Function Call
○ Total time
● 雖然 uftrace 無法分析有關 GPU 的 Profiling
Information , 但是它提供的資訊仍可以讓我們了解效能
是卡在哪裡
Referenced: “namhyung/uftrace” <https://github.com/namhyung/uftrace>
嘗試另一個 Profiling Tool
21
22. ● record : runs a program and saves the trace data
● graph : shows function call graph in the trace data
uftrace 的量測結果
$ uftrace record pow_cl
$ uftrace graph main
22
33. Integrate dcurl into IRI
● Use javah to produce header file for c program
$ javah com.iota.iri.hash.PearlDiver
33
34. Integrate dcurl into IRI
● <jni.h> provides many functions to convert
java objects to C objects, such as ...
○ GetIntArrayElements() gets java int array
and return c int array
○ SetIntArrayRegion() copys c int array to
java int array
Further Reading: “JNI Functions”
<https://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/functions.html>
Further Reading: “Java Programming Tutorial Java Natve Interface (JNI)”
<https://www.ntu.edu.sg/home/ehchua/programming/java/JavaNativeInterface.html>
34
35. Integrate dcurl into IRI
● Reminder
○ Provide include path to OpenJDK for compiler
○ Set java library path before launch your jvm
● Lets compile it !
○ We can get a shared library for jvm to load
○ Done!
Source code: “chenwei-tw/iri” <https://github.com/chenwei-
tw/iri/tree/task/integrate_dcurl>
35
36. Performance between IRI and dcurl
Frequency
Time (s)
Different Hardware Platform
● Intel(R) Core(™) i7-8700K
Processor
● Nvidia GeForce GTX 1080 Ti
● 32 GB Memory
<attachToTangle> Performance Comparison
36
37. Something in progress ...
● Fix AVX implementation
● Let dcurl can configure environment and
support multiple GPUs
● dcurl would be crashed if GPU memory is not enough
● dcurl would decide suitable parameter set
automatically
37
38. Future Work
● Add a new interface for PearlDiver in IRI,
so everyone can load suitable PoW implementation
for their hardware environment
● Search for other bottlenecks of IRI and try to improve
38