Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Profiling on steroids: Making Apache Spark Fast & Furious

233 Aufrufe

Veröffentlicht am

We would like to share our story how we troubleshot our spark jobs performance using JVM profiler and InfluxDB.
Speaker - Igor Mastesnyi, Senior Data Engineer @ AppsFlyer Data Group.

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Profiling on steroids: Making Apache Spark Fast & Furious

  1. 1. - Proprietary & Confidential - Profiling on steroids Making Apache Spark Fast & Furious
  2. 2. Who am I 2 Big data engineer at Appsflyer ML enthusiast IoT amateur
  3. 3. Few things about Appsflyer 3 XX TBs data x0000 applications history x000 EC2s We r o g a n ti an v a er p it . 70B events a day
  4. 4. 4 DAG Metrics Event timeline Performance Optimization
  5. 5. Profiling 5 Find the hot path What was really optimized Quick look under the hood
  6. 6. Profiling tools 6
  7. 7. Profiling 7 System profilers JVM profilers Distributed profiling?
  8. 8. StatsD JVM profiler 8 JVM Profiler agent: ThreadMXBean.dumpThreads0 Application
  9. 9. 9 JVM Profiler agent: ThreadMXBean.dumpThreads0 Application JVM Profiler agent: ThreadMXBean.dumpThreads0 Application JVM Profiler agent: ThreadMXBean.dumpThreads0 Application JVM Profiler agent: ThreadMXBean.dumpThreads0 Application JVM Profiler agent: ThreadMXBean.dumpThreads0 Application JVM Profiler agent: ThreadMXBean.dumpThreads0 Application
  10. 10. Configuring 10 javaagent:./statsd-...jar-with-dependencies.jar= server=influxdb.master.msp.com, reporter=InfluxDBReporter, database=profiler,username=profiler,password=pass,port=8086, prefix=20190413, tagMapping=barge, httpServerEnabled=false, packageBlacklist= io.netty.util.concurrent:io.netty.channel.nio:org.spark_p roject.jetty.util.thread:org.apache.hadoop.net.unix:org.spark_project.jett y.serve
  11. 11. Export 11 python influxdb_dump.py -o influxdb.imasternsinf.msp.com -r 8086 -u profiler -p profiler -d profiler -e 20190413 -t barge > test/executors-thread-dump-2019-04-13 (‘cpu.trace.com-amazonaws-AmazonWebServiceClient-computeServiceName-703. com-amazonaws-AmazonWebServiceClient-getServiceNameIntern-676. com-amazonaws-AmazonWebServiceClient-computeSignerByURI-278. com-amazonaws-AmazonWebServiceClient-setEndpoint-160. com-amazonaws-services-s3-AmazonS3Client-setEndpoint-475.com-amazonaws-services-s3-AmazonS3Client-init-447.com -amazonaws-services-s3-AmazonS3Client-<init>-391.com-amazonaws-services-s3-AmazonS3Client-<init>-371.org-apach e-hadoop-fs-s3a-S3AFileSystem-initialize-235.org-apache-hadoop-fs-FileSystem-createFileSystem-2669.org-apache- hadoop-fs-FileSystem-access$200-94.org-apache-hadoop-fs-FileSystem$Cache-getInternal-2703.org-apache-hadoop-fs -FileSystem$Cache-get-2685.org-apache-hadoop-fs-FileSystem-get-373.org-apache-hadoop-fs-Path-getFileSystem-295 .org-apache-parquet-hadoop-ParquetFileReader-<init>-565...’, 1
  12. 12. Flame graphs 12 Rectangle - stack fram - function on stack Y - stack depth Х - stack samples set. Sorted in alphabet order! Width - % - of stack traces / total stack traces
  13. 13. Analyzing results
  14. 14. 14
  15. 15. 15
  16. 16. 16
  17. 17. 17
  18. 18. 18
  19. 19. What we’ve got 19 Instrument to analyze Spark oriented code in depth Tool to check performance of code changes/optimizations Performance boost up to 16%
  20. 20. http://psy-lob-saw.blogspot.com/2016/02/why-most-sampling-java-profilers-are.html https://github.com/etsy/statsd-jvm-profiler https://github.com/cerndb/Hadoop-Profiler/tree/mast er/src http://www.brendangregg.com/ https://www.youtube.com/watch?v=QiGrTvsCZmA Links
  21. 21. Thanks!igor.masternyi@appsflyer.com @igormasternoy

×