SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Analysis and
Improvement of
IOTA PoW
Implementation
chenwei (魏禛)
<zhenwei.tw@gmail.com>
AndyYang (楊子賢)
<kukry5566@gmail.com>
March 10, 2018 / SITCON2018 1
chenwei (魏禛)
● From Tainan, Taiwan
● Study Master degree at National Taiwan University
● Recent work
○ Learning how to implement a interpreter
○ Learning Golang
○ Optimize Neural Network on multiple GPUs
● GitHub <https://github.com/chenwei-tw>
2
AndyYang (楊子賢)
● 來自台北
● 目前就讀台大資工所一年級
● 研究領域 :
○ 機器學習
○ 計算機結構
● Recent Work :
○ ReRam Based Accelerator for Convolutional Neural
Network
3
Brief Introduction to IOTA
from: “Iota Tangle Visualization” <https://simulation1.tangle.works/>
4
Brief Introduction to IOTA
● IRI (IOTA Reference Implementation)
○ Provides RESTful API to participate in Tangle
○ Exchange transactions with other nodes
○ Maintain Database for storing transactions
Referenced: “IOTA 輕量錢包、完整錢包與 IOTA Node 的關係”
<https://blog.louie.lu/2017/12/06/relationship-between-iota-light-wallet-
full-wallet-and-full-node/>
Referenced: “IOTA API Reference”
<https://iota.readme.io/v1.2.0/reference>
5
Brief Introduction to IOTA
● (Light) Wallet
○ 查詢餘額、收款、轉帳
○ 因為沒有運行完整的 Node,所以 Wallet 的資訊都必
須透過前述的 RESTful API 與一個 full node 做溝通
○ Before doing any operation with your wallet,
check host connected is available
Referenced: “IOTA 輕量錢包、完整錢包與 IOTA Node 的關係”
<https://blog.louie.lu/2017/12/06/relationship-between-iota-light-wallet-
full-wallet-and-full-node/>
6
Brief Introduction to IOTA
● 如何發起一筆交易 ?
○ Node 選擇兩個交易 (transaction) 做驗證
○ 檢查該兩筆交易是否有衝突 (conflict)
(e.g. 帳戶餘額為負)
○ 解出一道加密問題 (PoW),耗費計算力
Referenced: “Tangle 白皮書” <https://hackmd.io/s/ryriSgvAW>
Further Reading: “深入理解 IOTA 交易方式”
<https://blog.louie.lu/2018/01/10/in-depth-explain-iota-transaction/>
7
How I get involved in
● <attachToTangle> in IRI
Referenced: “iotaledger/iri” <https://github.com/iotaledger/iri>
8
How I get involved in
● There are too many IOTA PoW Implementation hided
in these libraries
○ curl.lib.js
<https://github.com/iotaledger/curl.lib.js>
○ gIOTA <https://github.com/iotaledger/gIOTA>
○ ccurl <https://github.com/iotaledger/ccurl>
○ iota-pearldiver
<https://github.com/mlouielu/iota-pearldiver>
9
● gIOTA 蒐集了多種的PoW實作(C, SSE, AVX, OpenCL)
○ 而這些實作多以 C code 的形式內嵌在 Golang 裡
Why choose gIOTA?
● 故我們可以藉由 C 打造 IOTA 底層的
trinary structure 後,便可快速將實作移轉過去
10
● Alternative to Binary, Trinary is a base-3 numeral
system
● Trits: Analogous to bits, a ternary digit is a trit .The
digits may have the values 1, 0, or -1
● Trytes: A tryte consists of 3 trits, which can
represent 27 values.
○ in IOTA, trytes are represented as characters
'9,A-Z'.
Referenced: “IOTA Glossary” <https://iota.readme.io/docs/glossary>
Trinary Structure
11
Source Code: “chenwei-tw/dcurl” <https://github.com/chenwei-
tw/dcurl/blob/dev/src/trinary/trinary.h>
Our Trinary Structure
12
● 9 in tryte = {0,0,0} in trits
What is PoW (Proof Of Work)?
Referenced: “The Anatomy of a Transaction”
<https://domschiener.gitbooks.io/iota-
guide/content/chapter1/transactions-and-bundles.html>
...0000...0
MWM
Hash
13
● giota 所蒐集的實作使用的多執行緒寫
法,並不是真的把計算函數分工,而是
同時執行多個一樣的函數看誰比較快算
出來的暴力解法
● 不同執行緒的起始 seed 不一樣
如何找出Nonce?
14
● C, GO, SSE 的實作沒有
問題
Referenced: “用 C 開發 IOTA PoW 的各種實作" <https://hackmd.io/s/HyNw4VM-z>
實測 giota 正確性
15
● AVX, OpenCL 卻沒通過
pow_avx_test.go:47: pow is illegal
J9QTUNNMONCMIR9JBNMRC9SC9QTBRKBUVCBYBUITBHEICYVQ9HXEXSPWPU9KACTSDRSQBDOJPOOEAFVMP
pow_cl_test.go:46: pow is illegal
IIHYVX9VHSMQWSNDJYWZOJBCBTPVQBLVBF9UYIYSTEKJVEFVY9JPJJMRLFWOJFKNWKAANSZKLXDBWMALI
● 後來發現 iotaledger/ccurl, 和 gIOTA 的 OpenCL Kernel
Function 是一樣的, 但是 ccurl 的結果是對的, 我們推測可
能是 gIOTA 在 launch kernel 的時候發生問題
● 於是後來的 GPU 效能評估與後續的設計都是基於
iotaledger/ccurl 版本做修改
實測 giota 正確性
16
● 以一個 tryte 量測三種 PoW 實作的效能
● 但是後來發現不同的 tryte 找到的 Nonce 時間不一樣
量測各種 PoW 實作效能
17
● 以大量的 trytes 來量測並繪製分布圖, 觀察各實作的效能
● 30 trytes 200 samples 的結果
量測各種 PoW 實作效能
47組 samples 執行時間約 10 秒
重複初始化 OpenCL context
的下場
Source Code: “chenwei-tw/iota-pow-in-c”
<https://github.com/chenwei-tw/iota-pow-in-c>
18
● 疑問: 為何使用 GPU 的 OpenCL 效能特別差 ?
● 可能的問題點:
○ 尋找 Nonce 的 kernel function 要計算很久?
○ Device 與 Host 之間的 Communication overhead
過大 ?
○ 還是 OpenCL 哪一個的 API 出了問題 ?
● 另外一個問題:
○ 由於實驗環境的 GPU 為 Nvidia,且 Nvidia 沒有提供
其 OpenCL 的 Profiling Tool
OpenCL 效能差的原因?
19
● 最直覺的想法便是重新把 OpenCL 實作改寫為 CUDA 後
再用 toolkit 的其中一項工具 nvprof 進行觀察
● 從下圖的結果,無法直接觀察到變慢的原因
自幹一發 CUDA !
Further Reading: “Profiler :: CUDA Toolkit Documentation”
<http://docs.nvidia.com/cuda/profiler-users-guide/index.html>
20
● 後來在 github 找到另一個 Profiling Tool - uftrace, 這個
工具可以提供如:
○ Duration
○ TID
○ Times of Function Call
○ Total time
● 雖然 uftrace 無法分析有關 GPU 的 Profiling
Information , 但是它提供的資訊仍可以讓我們了解效能
是卡在哪裡
Referenced: “namhyung/uftrace” <https://github.com/namhyung/uftrace>
嘗試另一個 Profiling Tool
21
● record : runs a program and saves the trace data
● graph : shows function call graph in the trace data
uftrace 的量測結果
$ uftrace record pow_cl
$ uftrace graph main
22
● GPU初始化階段占了近70%的比重
total time
init_clcon
text
init_cl_ke
rnel
write_cl_b
uffer
clEnqueueW
riteBuffer
clWaitForE
vents
clEnqueueR
eadBuffer
Hash
1.938 1.354 s 14.362 us 1.541 ms 1.538 ms 569.901 ms 84.981 us 5.502 ms
OpenCL context Initialization OpenCL searching nonce
uftrace 的量測結果
23
● 想辦法避免 OpenCL context 重複初始化的問題
○ 而 ccurl 的解決辦法是,一次只做一個 PoW Task,並
重複利用同一個 context
● 閱讀完 ccurl 的程式碼後,我們認為 ccurl 的資料結構設
計也有試圖想實現 multi-thread Pow Task,但是我們嘗
試在同一個 address space 同時 launch 多個
<ccurl_pow> ,算出來的 hash 卻是錯的
如何改善 OpenCL 版本的問題
24
New IOTA PoW Library - dcurl
● Goal
○ 在給定的硬體環境裡,想辦法讓 PoW 跑越快越好
○ 整合至 IRI,並檢驗效能是否有提升
● Our ideas
○ PoW tasks can be multi-threaded executed
○ Integrate powerful IOTA PoW implementation
25
New IOTA PoW Library - dcurl
● Hardware Environment
○ Ubuntu 16.04
○ Intel(R) Xeon(R) CPU E5-2650 v4 @2.2GHz 48 cores
○ Nvidia Titan Xp
○ 94.2 GB RAM
26
New IOTA PoW Library - dcurl
27
New IOTA PoW Library - dcurl It’s important to find
respective lock
28
Does multi-thread really bring speedup?
Frequency
Time (s)
29
Does multi-thread really bring speedup?
Frequency
Time (s)
30
Compare dcurl with other PoW Libraries
Frequency
Time (s)
31
Integrate dcurl into IRI
32
Integrate dcurl into IRI
● Use javah to produce header file for c program
$ javah com.iota.iri.hash.PearlDiver
33
Integrate dcurl into IRI
● <jni.h> provides many functions to convert
java objects to C objects, such as ...
○ GetIntArrayElements() gets java int array
and return c int array
○ SetIntArrayRegion() copys c int array to
java int array
Further Reading: “JNI Functions”
<https://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/functions.html>
Further Reading: “Java Programming Tutorial Java Natve Interface (JNI)”
<https://www.ntu.edu.sg/home/ehchua/programming/java/JavaNativeInterface.html>
34
Integrate dcurl into IRI
● Reminder
○ Provide include path to OpenJDK for compiler
○ Set java library path before launch your jvm
● Lets compile it !
○ We can get a shared library for jvm to load
○ Done!
Source code: “chenwei-tw/iri” <https://github.com/chenwei-
tw/iri/tree/task/integrate_dcurl>
35
Performance between IRI and dcurl
Frequency
Time (s)
Different Hardware Platform
● Intel(R) Core(™) i7-8700K
Processor
● Nvidia GeForce GTX 1080 Ti
● 32 GB Memory
<attachToTangle> Performance Comparison
36
Something in progress ...
● Fix AVX implementation
● Let dcurl can configure environment and
support multiple GPUs
● dcurl would be crashed if GPU memory is not enough
● dcurl would decide suitable parameter set
automatically
37
Future Work
● Add a new interface for PearlDiver in IRI,
so everyone can load suitable PoW implementation
for their hardware environment
● Search for other bottlenecks of IRI and try to improve
38

Weitere ähnliche Inhalte

Was ist angesagt?

Instruction Combine in LLVM
Instruction Combine in LLVMInstruction Combine in LLVM
Instruction Combine in LLVMWang Hsiangkai
 
Online test program generator for RISC-V processors
Online test program generator for RISC-V processorsOnline test program generator for RISC-V processors
Online test program generator for RISC-V processorsRISC-V International
 
Zn task - defcon russia 20
Zn task  - defcon russia 20Zn task  - defcon russia 20
Zn task - defcon russia 20DefconRussia
 
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMUSFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMULinaro
 
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
Runtime Code Generation and Data Management for Heterogeneous Computing in JavaRuntime Code Generation and Data Management for Heterogeneous Computing in Java
Runtime Code Generation and Data Management for Heterogeneous Computing in JavaJuan Fumero
 
Devirtualizing FinSpy
Devirtualizing FinSpyDevirtualizing FinSpy
Devirtualizing FinSpyjduart
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerMarina Kolpakova
 
QEMU - Binary Translation
QEMU - Binary Translation QEMU - Binary Translation
QEMU - Binary Translation Jiann-Fuh Liaw
 
Q4.11: NEON Intrinsics
Q4.11: NEON IntrinsicsQ4.11: NEON Intrinsics
Q4.11: NEON IntrinsicsLinaro
 
Advanced cfg bypass on adobe flash player 18 defcon russia 23
Advanced cfg bypass on adobe flash player 18 defcon russia 23Advanced cfg bypass on adobe flash player 18 defcon russia 23
Advanced cfg bypass on adobe flash player 18 defcon russia 23DefconRussia
 
Q4.11: Using GCC Auto-Vectorizer
Q4.11: Using GCC Auto-VectorizerQ4.11: Using GCC Auto-Vectorizer
Q4.11: Using GCC Auto-VectorizerLinaro
 
Implementing Lightweight Networking
Implementing Lightweight NetworkingImplementing Lightweight Networking
Implementing Lightweight Networkingguest6972eaf
 
LLVM Register Allocation
LLVM Register AllocationLLVM Register Allocation
LLVM Register AllocationWang Hsiangkai
 
Implementing STM in Java
Implementing STM in JavaImplementing STM in Java
Implementing STM in JavaMisha Kozik
 
Making OpenBSD Useful on the Octeon Network Gear by Paul Irofti
Making OpenBSD Useful on the Octeon Network Gear by Paul IroftiMaking OpenBSD Useful on the Octeon Network Gear by Paul Irofti
Making OpenBSD Useful on the Octeon Network Gear by Paul Iroftieurobsdcon
 
Virtual platform
Virtual platformVirtual platform
Virtual platformsean chen
 
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Pragmatic Optimization in Modern Programming - Mastering Compiler OptimizationsPragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Pragmatic Optimization in Modern Programming - Mastering Compiler OptimizationsMarina Kolpakova
 
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etcComparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etcYukio Okuda
 

Was ist angesagt? (20)

Instruction Combine in LLVM
Instruction Combine in LLVMInstruction Combine in LLVM
Instruction Combine in LLVM
 
Online test program generator for RISC-V processors
Online test program generator for RISC-V processorsOnline test program generator for RISC-V processors
Online test program generator for RISC-V processors
 
Zn task - defcon russia 20
Zn task  - defcon russia 20Zn task  - defcon russia 20
Zn task - defcon russia 20
 
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMUSFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
 
Onnc intro
Onnc introOnnc intro
Onnc intro
 
GCC LTO
GCC LTOGCC LTO
GCC LTO
 
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
Runtime Code Generation and Data Management for Heterogeneous Computing in JavaRuntime Code Generation and Data Management for Heterogeneous Computing in Java
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
 
Devirtualizing FinSpy
Devirtualizing FinSpyDevirtualizing FinSpy
Devirtualizing FinSpy
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
 
QEMU - Binary Translation
QEMU - Binary Translation QEMU - Binary Translation
QEMU - Binary Translation
 
Q4.11: NEON Intrinsics
Q4.11: NEON IntrinsicsQ4.11: NEON Intrinsics
Q4.11: NEON Intrinsics
 
Advanced cfg bypass on adobe flash player 18 defcon russia 23
Advanced cfg bypass on adobe flash player 18 defcon russia 23Advanced cfg bypass on adobe flash player 18 defcon russia 23
Advanced cfg bypass on adobe flash player 18 defcon russia 23
 
Q4.11: Using GCC Auto-Vectorizer
Q4.11: Using GCC Auto-VectorizerQ4.11: Using GCC Auto-Vectorizer
Q4.11: Using GCC Auto-Vectorizer
 
Implementing Lightweight Networking
Implementing Lightweight NetworkingImplementing Lightweight Networking
Implementing Lightweight Networking
 
LLVM Register Allocation
LLVM Register AllocationLLVM Register Allocation
LLVM Register Allocation
 
Implementing STM in Java
Implementing STM in JavaImplementing STM in Java
Implementing STM in Java
 
Making OpenBSD Useful on the Octeon Network Gear by Paul Irofti
Making OpenBSD Useful on the Octeon Network Gear by Paul IroftiMaking OpenBSD Useful on the Octeon Network Gear by Paul Irofti
Making OpenBSD Useful on the Octeon Network Gear by Paul Irofti
 
Virtual platform
Virtual platformVirtual platform
Virtual platform
 
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Pragmatic Optimization in Modern Programming - Mastering Compiler OptimizationsPragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
 
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etcComparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
 

Ähnlich wie [Sitcon2018] Analysis and Improvement of IOTA PoW Implementation

Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019Jakarta_EE
 
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019The Eclipse Foundation
 
OpenStack Neutron Tutorial
OpenStack Neutron TutorialOpenStack Neutron Tutorial
OpenStack Neutron Tutorialmestery
 
Montreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUpMontreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUpStacy Véronneau
 
An introduction to node3
An introduction to node3An introduction to node3
An introduction to node3Vivian S. Zhang
 
BUD17-405: Building a reference IoT product with Zephyr
BUD17-405: Building a reference IoT product with Zephyr BUD17-405: Building a reference IoT product with Zephyr
BUD17-405: Building a reference IoT product with Zephyr Linaro
 
PyTorch crash course
PyTorch crash coursePyTorch crash course
PyTorch crash courseNader Karimi
 
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...Ambassador Labs
 
PuppetConf 2016: Why Network Automation Matters, and What You Can Do About It...
PuppetConf 2016: Why Network Automation Matters, and What You Can Do About It...PuppetConf 2016: Why Network Automation Matters, and What You Can Do About It...
PuppetConf 2016: Why Network Automation Matters, and What You Can Do About It...Puppet
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsAmbassador Labs
 
Tapjoy OpenStack Summit Paris Breakout Session
Tapjoy OpenStack Summit Paris Breakout SessionTapjoy OpenStack Summit Paris Breakout Session
Tapjoy OpenStack Summit Paris Breakout SessionWeston Jossey
 
Webinar: Code Faster on Kubernetes
Webinar: Code Faster on KubernetesWebinar: Code Faster on Kubernetes
Webinar: Code Faster on KubernetesAmbassador Labs
 
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...Ambassador Labs
 
Dev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & LoggingDev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & LoggingMarian Marinov
 
Intro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps WorkshopIntro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps WorkshopWeaveworks
 
Are app servers still fascinating
Are app servers still fascinatingAre app servers still fascinating
Are app servers still fascinatingAntonio Goncalves
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsWeaveworks
 

Ähnlich wie [Sitcon2018] Analysis and Improvement of IOTA PoW Implementation (20)

Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019
 
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019
Kubernetes Native Java and Eclipse MicroProfile | EclipseCon Europe 2019
 
OpenStack Neutron Tutorial
OpenStack Neutron TutorialOpenStack Neutron Tutorial
OpenStack Neutron Tutorial
 
Montreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUpMontreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUp
 
An introduction to node3
An introduction to node3An introduction to node3
An introduction to node3
 
BUD17-405: Building a reference IoT product with Zephyr
BUD17-405: Building a reference IoT product with Zephyr BUD17-405: Building a reference IoT product with Zephyr
BUD17-405: Building a reference IoT product with Zephyr
 
PyTorch crash course
PyTorch crash coursePyTorch crash course
PyTorch crash course
 
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
 
PuppetConf 2016: Why Network Automation Matters, and What You Can Do About It...
PuppetConf 2016: Why Network Automation Matters, and What You Can Do About It...PuppetConf 2016: Why Network Automation Matters, and What You Can Do About It...
PuppetConf 2016: Why Network Automation Matters, and What You Can Do About It...
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
 
Tapjoy OpenStack Summit Paris Breakout Session
Tapjoy OpenStack Summit Paris Breakout SessionTapjoy OpenStack Summit Paris Breakout Session
Tapjoy OpenStack Summit Paris Breakout Session
 
Webinar: Code Faster on Kubernetes
Webinar: Code Faster on KubernetesWebinar: Code Faster on Kubernetes
Webinar: Code Faster on Kubernetes
 
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...
 
Dev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & LoggingDev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & Logging
 
Netty training
Netty trainingNetty training
Netty training
 
Intro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps WorkshopIntro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps Workshop
 
Netty training
Netty trainingNetty training
Netty training
 
Are app servers still fascinating
Are app servers still fascinatingAre app servers still fascinating
Are app servers still fascinating
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOps
 
nebulaconf
nebulaconfnebulaconf
nebulaconf
 

Kürzlich hochgeladen

University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf203318pmpc
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 

Kürzlich hochgeladen (20)

(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 

[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation

  • 1. Analysis and Improvement of IOTA PoW Implementation chenwei (魏禛) <zhenwei.tw@gmail.com> AndyYang (楊子賢) <kukry5566@gmail.com> March 10, 2018 / SITCON2018 1
  • 2. chenwei (魏禛) ● From Tainan, Taiwan ● Study Master degree at National Taiwan University ● Recent work ○ Learning how to implement a interpreter ○ Learning Golang ○ Optimize Neural Network on multiple GPUs ● GitHub <https://github.com/chenwei-tw> 2
  • 3. AndyYang (楊子賢) ● 來自台北 ● 目前就讀台大資工所一年級 ● 研究領域 : ○ 機器學習 ○ 計算機結構 ● Recent Work : ○ ReRam Based Accelerator for Convolutional Neural Network 3
  • 4. Brief Introduction to IOTA from: “Iota Tangle Visualization” <https://simulation1.tangle.works/> 4
  • 5. Brief Introduction to IOTA ● IRI (IOTA Reference Implementation) ○ Provides RESTful API to participate in Tangle ○ Exchange transactions with other nodes ○ Maintain Database for storing transactions Referenced: “IOTA 輕量錢包、完整錢包與 IOTA Node 的關係” <https://blog.louie.lu/2017/12/06/relationship-between-iota-light-wallet- full-wallet-and-full-node/> Referenced: “IOTA API Reference” <https://iota.readme.io/v1.2.0/reference> 5
  • 6. Brief Introduction to IOTA ● (Light) Wallet ○ 查詢餘額、收款、轉帳 ○ 因為沒有運行完整的 Node,所以 Wallet 的資訊都必 須透過前述的 RESTful API 與一個 full node 做溝通 ○ Before doing any operation with your wallet, check host connected is available Referenced: “IOTA 輕量錢包、完整錢包與 IOTA Node 的關係” <https://blog.louie.lu/2017/12/06/relationship-between-iota-light-wallet- full-wallet-and-full-node/> 6
  • 7. Brief Introduction to IOTA ● 如何發起一筆交易 ? ○ Node 選擇兩個交易 (transaction) 做驗證 ○ 檢查該兩筆交易是否有衝突 (conflict) (e.g. 帳戶餘額為負) ○ 解出一道加密問題 (PoW),耗費計算力 Referenced: “Tangle 白皮書” <https://hackmd.io/s/ryriSgvAW> Further Reading: “深入理解 IOTA 交易方式” <https://blog.louie.lu/2018/01/10/in-depth-explain-iota-transaction/> 7
  • 8. How I get involved in ● <attachToTangle> in IRI Referenced: “iotaledger/iri” <https://github.com/iotaledger/iri> 8
  • 9. How I get involved in ● There are too many IOTA PoW Implementation hided in these libraries ○ curl.lib.js <https://github.com/iotaledger/curl.lib.js> ○ gIOTA <https://github.com/iotaledger/gIOTA> ○ ccurl <https://github.com/iotaledger/ccurl> ○ iota-pearldiver <https://github.com/mlouielu/iota-pearldiver> 9
  • 10. ● gIOTA 蒐集了多種的PoW實作(C, SSE, AVX, OpenCL) ○ 而這些實作多以 C code 的形式內嵌在 Golang 裡 Why choose gIOTA? ● 故我們可以藉由 C 打造 IOTA 底層的 trinary structure 後,便可快速將實作移轉過去 10
  • 11. ● Alternative to Binary, Trinary is a base-3 numeral system ● Trits: Analogous to bits, a ternary digit is a trit .The digits may have the values 1, 0, or -1 ● Trytes: A tryte consists of 3 trits, which can represent 27 values. ○ in IOTA, trytes are represented as characters '9,A-Z'. Referenced: “IOTA Glossary” <https://iota.readme.io/docs/glossary> Trinary Structure 11
  • 12. Source Code: “chenwei-tw/dcurl” <https://github.com/chenwei- tw/dcurl/blob/dev/src/trinary/trinary.h> Our Trinary Structure 12
  • 13. ● 9 in tryte = {0,0,0} in trits What is PoW (Proof Of Work)? Referenced: “The Anatomy of a Transaction” <https://domschiener.gitbooks.io/iota- guide/content/chapter1/transactions-and-bundles.html> ...0000...0 MWM Hash 13
  • 15. ● C, GO, SSE 的實作沒有 問題 Referenced: “用 C 開發 IOTA PoW 的各種實作" <https://hackmd.io/s/HyNw4VM-z> 實測 giota 正確性 15
  • 16. ● AVX, OpenCL 卻沒通過 pow_avx_test.go:47: pow is illegal J9QTUNNMONCMIR9JBNMRC9SC9QTBRKBUVCBYBUITBHEICYVQ9HXEXSPWPU9KACTSDRSQBDOJPOOEAFVMP pow_cl_test.go:46: pow is illegal IIHYVX9VHSMQWSNDJYWZOJBCBTPVQBLVBF9UYIYSTEKJVEFVY9JPJJMRLFWOJFKNWKAANSZKLXDBWMALI ● 後來發現 iotaledger/ccurl, 和 gIOTA 的 OpenCL Kernel Function 是一樣的, 但是 ccurl 的結果是對的, 我們推測可 能是 gIOTA 在 launch kernel 的時候發生問題 ● 於是後來的 GPU 效能評估與後續的設計都是基於 iotaledger/ccurl 版本做修改 實測 giota 正確性 16
  • 17. ● 以一個 tryte 量測三種 PoW 實作的效能 ● 但是後來發現不同的 tryte 找到的 Nonce 時間不一樣 量測各種 PoW 實作效能 17
  • 18. ● 以大量的 trytes 來量測並繪製分布圖, 觀察各實作的效能 ● 30 trytes 200 samples 的結果 量測各種 PoW 實作效能 47組 samples 執行時間約 10 秒 重複初始化 OpenCL context 的下場 Source Code: “chenwei-tw/iota-pow-in-c” <https://github.com/chenwei-tw/iota-pow-in-c> 18
  • 19. ● 疑問: 為何使用 GPU 的 OpenCL 效能特別差 ? ● 可能的問題點: ○ 尋找 Nonce 的 kernel function 要計算很久? ○ Device 與 Host 之間的 Communication overhead 過大 ? ○ 還是 OpenCL 哪一個的 API 出了問題 ? ● 另外一個問題: ○ 由於實驗環境的 GPU 為 Nvidia,且 Nvidia 沒有提供 其 OpenCL 的 Profiling Tool OpenCL 效能差的原因? 19
  • 20. ● 最直覺的想法便是重新把 OpenCL 實作改寫為 CUDA 後 再用 toolkit 的其中一項工具 nvprof 進行觀察 ● 從下圖的結果,無法直接觀察到變慢的原因 自幹一發 CUDA ! Further Reading: “Profiler :: CUDA Toolkit Documentation” <http://docs.nvidia.com/cuda/profiler-users-guide/index.html> 20
  • 21. ● 後來在 github 找到另一個 Profiling Tool - uftrace, 這個 工具可以提供如: ○ Duration ○ TID ○ Times of Function Call ○ Total time ● 雖然 uftrace 無法分析有關 GPU 的 Profiling Information , 但是它提供的資訊仍可以讓我們了解效能 是卡在哪裡 Referenced: “namhyung/uftrace” <https://github.com/namhyung/uftrace> 嘗試另一個 Profiling Tool 21
  • 22. ● record : runs a program and saves the trace data ● graph : shows function call graph in the trace data uftrace 的量測結果 $ uftrace record pow_cl $ uftrace graph main 22
  • 23. ● GPU初始化階段占了近70%的比重 total time init_clcon text init_cl_ke rnel write_cl_b uffer clEnqueueW riteBuffer clWaitForE vents clEnqueueR eadBuffer Hash 1.938 1.354 s 14.362 us 1.541 ms 1.538 ms 569.901 ms 84.981 us 5.502 ms OpenCL context Initialization OpenCL searching nonce uftrace 的量測結果 23
  • 24. ● 想辦法避免 OpenCL context 重複初始化的問題 ○ 而 ccurl 的解決辦法是,一次只做一個 PoW Task,並 重複利用同一個 context ● 閱讀完 ccurl 的程式碼後,我們認為 ccurl 的資料結構設 計也有試圖想實現 multi-thread Pow Task,但是我們嘗 試在同一個 address space 同時 launch 多個 <ccurl_pow> ,算出來的 hash 卻是錯的 如何改善 OpenCL 版本的問題 24
  • 25. New IOTA PoW Library - dcurl ● Goal ○ 在給定的硬體環境裡,想辦法讓 PoW 跑越快越好 ○ 整合至 IRI,並檢驗效能是否有提升 ● Our ideas ○ PoW tasks can be multi-threaded executed ○ Integrate powerful IOTA PoW implementation 25
  • 26. New IOTA PoW Library - dcurl ● Hardware Environment ○ Ubuntu 16.04 ○ Intel(R) Xeon(R) CPU E5-2650 v4 @2.2GHz 48 cores ○ Nvidia Titan Xp ○ 94.2 GB RAM 26
  • 27. New IOTA PoW Library - dcurl 27
  • 28. New IOTA PoW Library - dcurl It’s important to find respective lock 28
  • 29. Does multi-thread really bring speedup? Frequency Time (s) 29
  • 30. Does multi-thread really bring speedup? Frequency Time (s) 30
  • 31. Compare dcurl with other PoW Libraries Frequency Time (s) 31
  • 33. Integrate dcurl into IRI ● Use javah to produce header file for c program $ javah com.iota.iri.hash.PearlDiver 33
  • 34. Integrate dcurl into IRI ● <jni.h> provides many functions to convert java objects to C objects, such as ... ○ GetIntArrayElements() gets java int array and return c int array ○ SetIntArrayRegion() copys c int array to java int array Further Reading: “JNI Functions” <https://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/functions.html> Further Reading: “Java Programming Tutorial Java Natve Interface (JNI)” <https://www.ntu.edu.sg/home/ehchua/programming/java/JavaNativeInterface.html> 34
  • 35. Integrate dcurl into IRI ● Reminder ○ Provide include path to OpenJDK for compiler ○ Set java library path before launch your jvm ● Lets compile it ! ○ We can get a shared library for jvm to load ○ Done! Source code: “chenwei-tw/iri” <https://github.com/chenwei- tw/iri/tree/task/integrate_dcurl> 35
  • 36. Performance between IRI and dcurl Frequency Time (s) Different Hardware Platform ● Intel(R) Core(™) i7-8700K Processor ● Nvidia GeForce GTX 1080 Ti ● 32 GB Memory <attachToTangle> Performance Comparison 36
  • 37. Something in progress ... ● Fix AVX implementation ● Let dcurl can configure environment and support multiple GPUs ● dcurl would be crashed if GPU memory is not enough ● dcurl would decide suitable parameter set automatically 37
  • 38. Future Work ● Add a new interface for PearlDiver in IRI, so everyone can load suitable PoW implementation for their hardware environment ● Search for other bottlenecks of IRI and try to improve 38

Hinweis der Redaktion

  1. 能夠完成這些行為的都能夠稱做 “full node”
  2. cue: