SlideShare ist ein Scribd-Unternehmen logo
1 von 42
gluent.com 1
Low-Level CPU Performance Profiling Examples
Tanel Poder
a long time computer performance geek
@tanelpoder
blog.tanelpoder.com
gluent.com 2
Intro: About me
• Tanel Põder
• RDBMS Performance geek 20+ years (Oracle)
• Unix/Linux Performance geek
• Hadoop Performance geek
• Spark Performance geek?
• http://blog.tanelpoder.com
• @tanelpoder
Expert Oracle Exadata
book
gluent.com 3
Gluent
Oracle
Teradata
NoSQL
Big Data
Sources
MSSQL
App
X
App
Y
App
Z
A data sharing
platform for
enterprise
applications
Gluent as a data
virtualization layer
gluent.com 4
Some Microscopic level stuff to talk about…
1. Some things worth knowing about modern CPUs
2. Measuring internal CPU efficiency (C++)
3. A columnar database scanning example (Oracle)
4. Low level Analysis of Spark Performance
• RDD vs DataFrame
• DataFrame with bad code
This is gonna be a
(hopefully fun)
hacking session!
gluent.com 5
”100%” busy?
A CPU close to
100% busy?
What if I told you your CPU is not that busy?
gluent.com 6
CPU Performance Counters on Linux
# perf stat -d -p PID sleep 30
Performance counter stats for process id '34783':
27373.819908 task-clock # 0.912 CPUs utilized
86,428,653,040 cycles # 3.157 GHz
32,115,412,877 instructions # 0.37 insns per cycle
# 2.39 stalled cycles per insn
7,386,220,210 branches # 269.828 M/sec
22,056,397 branch-misses # 0.30% of all branches
76,697,049,420 stalled-cycles-frontend # 88.74% frontend cycles idle
58,627,393,395 stalled-cycles-backend # 67.83% backend cycles idle
256,440,384 cache-references # 9.368 M/sec
222,036,981 cache-misses # 86.584 % of all cache refs
234,361,189 LLC-loads # 8.562 M/sec
218,570,294 LLC-load-misses # 93.26% of all LL-cache hits
18,493,582 LLC-stores # 0.676 M/sec
3,233,231 LLC-store-misses # 0.118 M/sec
7,324,946,042 L1-dcache-loads # 267.589 M/sec
305,276,341 L1-dcache-load-misses # 4.17% of all L1-dcache hits
36,890,302 L1-dcache-prefetches # 1.348 M/sec
30.000601214 seconds time elapsed
Measure what’s
going on inside a
CPU!
Metrics explained in
my blog entry:
http://bit.ly/1PBIlde
gluent.com 7
Modern CPUs can run multiple operations concurrently
http://software.intel.com
Multiple
ports/execution
units for
computation &
memory ops
If waiting for RAM
– CPU pipeline
stall!
gluent.com 8
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers
--------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache,
200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
Read 1 MB sequentially from memory 250,000 ns 250 us
Round trip within same datacenter 500,000 ns 500 us
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD,
4X memory
Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter
roundtrip
Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory,
20X SSD
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms
Source:
https://gist.github.com/jboner/2841832
gluent.com 9
CPU = fast
CPU L2 / L3
cache in between
RAM = slow
gluent.com 10
Tape is dead, disk is tape, flash is disk, RAM locality is king
Jim Gray, 2006
http://research.microsoft.com/en-us/um/people/gray/talks/flash_is_good.ppt
gluent.com 11
Just caching all your data in RAM does not
give you a modern “in-memory” system!
* Columnar data structures to the rescue!
gluent.com 12
Row-Major Data Structures
SELECT
SUM(column)
FROM array
gluent.com 13
Variable field offsets Memory line
(cache line)
size = 64 Bytes
gluent.com 14
Columnar Data Structure (conceptual)
Store values of
a column next
to each other
(data locality)
Much less data
to scan (or filter)
if accessing a
subset of
columns
Better
compression due
to adjacent
repeating (or
slightly differing)
values
gluent.com 15
Single-Instruction-Multiple-Data (SIMD) processing
• Run an operation (like ADD) on multiple registers/memory
locations in a single instruction:
Do the same work
with less (but more
complex) instructions
More concurrency
inside CPU
If the underlying data
structures “feed”
data fast enough …
gluent.com 16
A database example (Oracle)
gluent.com 17
A simple Data Retrieval test!
• Retrieve 1% rows out of a 8 GB table:
SELECT
COUNT(*)
, SUM(order_total)
FROM
orders
WHERE
warehouse_id BETWEEN 500 AND 510
The Warehouse
IDs range between
1 and 999
Test data
generated by
SwingBench tool
gluent.com 18
Data Retrieval: Test Results
• Remember, this is a very simple scanning + filtering query:
TESTNAME PLAN_HASH ELA_MS CPU_MS LIOS BLK_READ
------------------------- ---------- -------- -------- --------- ---------
test1: index range scan * 16715356 265203 37438 782858 511231
test2: full buffered */ C 630573765 132075 48944 1013913 849316
test3: full direct path * 630573765 15567 11808 1013873 1013850
test4: full smart scan */ 630573765 2102 729 1013873 1013850
test5: full inmemory scan 630573765 155 155 14 0
test6: full buffer cache 630573765 7850 7831 1014741 0
Test 5 & Test 6
run entirely
from memory
Source:
http://www.slideshare.net/tanelp/oracle-database-inmemory-option-in-action
But why 50x
difference in
CPU usage?
gluent.com 19
CPU & cache friendly data structures are key!
Headers, ITL entries
Row Directory
#0 hdr row
#1 hdr row
#2 hdr row
#3 hdr row
#4 hdr row
#5 hdr row
#6 hdr row
#7 hdr row
#8 hdr row
… row
#1 offset
#2 offset
#3 offset
#0 offset
…
Hdr
byte
Column data
Lock
byte
CC
byte
Col.
len
Column data
Col.
len
Column data
Col.
len
Column data
Col.
len
• OLTP: Block->Row->Column format
• 8kB blocks
• Great for writes, changes
• Field-length encoding
• Reading column #100 requires walking
through all preceding columns
• Columns (with similar values) not densely
packed together
• Not CPU cache friendly for analytics!
gluent.com 20
Scanning columnar data structures
Scanning a column in a
row-oriented data block
Scanning a column in a
column-oriented compression unit
col 1 col 2
col 3
col 4
col 5
col 6
col 2
col 2
col 3
col 3
col 4
col 4
col 5
col 5
col5
col 6
col 1 col 2
3…
col 3 col 4
col 4 col 5
col 6 col 1 col 2
col 3
col 3
col 4
col 4
col 5
col 5
col 1 col 2
col 6
col 6
col 1 col 2
3…
col 3 col 4
col 4 col 5
col 6 col 1 col 2
col 3
col 3
col 4
col 4
col 5
col 5
col 1 col 2
col 6
col 6
col 1 col 2
3…
col 3 col 4
col 4 col 5
col 6 col 1 col 2
col 3
col 3
col 4
col 4
col 5
col 5
col 1 col 2
col 6
col 6 Read filter
column(s) first.
Access only
projected columns
if matches found.
Reduced memory
traffic. More
sequential RAM
access, SIMD on
adjacent data.
gluent.com 21
Testing data access path differences on Oracle 12c
SELECT COUNT(cust_valid) FROM
customers_nopart c WHERE cust_id
> 0
Run the same query on
same dataset stored in
different formats/layouts.
Full details:
http://blog.tanelpoder.com/2015/11/30
/ram-is-the-new-disk-and-how-to-
measure-its-performance-part-3-cpu-
instructions-cycles/
Test result data:
http://bit.ly/1RitNMr
gluent.com 22
CPU instructions used for scanning/counting 69M rows
gluent.com 23
Average CPU instructions per row processed
• Knowing that the table has about 69M rows, I can calculate
the average number of instructions issued per row processed
gluent.com 24
CPU cycles consumed (full scans only)
gluent.com 25
CPU efficiency (Instructions-per-Cycle)
Yes, modern superscalar
CPUs can execute multiple
instructions per cycle
gluent.com 26
Reducing memory writes within SQL execution
• Old approach:
1. Read compressed data chunk
2. Decompress data (write data to temporary memory location)
3. Filter out non-matching rows
4. Return data
• New approach:
1. Read and filter compressed columns
2. Decompress only required columns of matching rows
3. Return data
gluent.com 27
Memory reads & writes during internal processing
Unit = MB
Read only
requested columns
Rows counted from
chunk headers
Scan compressed data:
few memory writes
gluent.com 28
Spark Examples
• Will use:
• Spark built in tools
• Perf
• Honest Profiler
• FlameGraphs
gluent.com 29
Apache Spark
Tungsten
Data Structures
Databricks presentation:
http://www.slideshare.n
et/SparkSummit/deep-
dive-into-project-
tungsten-josh-rosen
Much denser
data structure
Using
sun.misc.Unsafe
API to bypass JVM
object allocator
gluent.com 30
Apache Spark
Tungsten
Data Structures
Much denser
data structure
“Good memory
locality”
gluent.com 31
Spark test setup (RDD)
CSV
RDD
(partitoned)
RDD
(single
partition)
“For each”
sum
column X
val lines = sc.textFile("/tmp/simple_data.csv").repartition(1)
val stringFields = lines.map(line => line.split(","))
val fullFieldLength = stringFields.first.length
val completeFields = stringFields.filter(fields => fields.length == fullFieldLength)
val data = completeFields.map(fields => fields.patch(yearIndex,
Array(Try(fields(yearIndex).toInt).getOrElse(0)), 1))
log("cache entire RDD in memory")
data.cache()
log("run map(length).max to populate cache")
println(data.map(r => r.length).reduce((l1, l2) => Math.max(l1, l2)))
.cache().repartition(1)
I wanted to simplify
this test as much as
possible
gluent.com 32
“SELECT” sum (Year) from RDD
// SUM all values of “year” column
println(data.map(d => d(yearIndex).asInstanceOf[Int]).reduce((y1, y2) => y1 + y2))
Cached RDD ~1M records, ~40 columns
1-column sum: 0.349 seconds!
17/01/19 18:43:36 INFO DAGScheduler: ResultStage 123 (reduce at demo.scala:89) finished in 0.349 s
17/01/19 18:43:36 INFO DAGScheduler: Job 61 finished: reduce at demo.scala:89, took 0.353754 s
gluent.com 33
Spark test setup (DataFrame)
CSV
RDD
partitioned
RDD
single
partition
“For each”
sum
column X
val lines = sc.textFile("/tmp/simple_data.csv").repartition(1)
val stringFields = lines.map(line => line.split(","))
val fullFieldLength = stringFields.first.length
val completeFields = stringFields.filter(fields => fields.length == fullFieldLength)
val data = completeFields.map(fields => fields.patch(yearIndex,
Array(Try(fields(yearIndex).toInt).getOrElse(0)), 1))
...
val dataFrame = ss.createDataFrame(data.map(d => Row(d: _*)), schema)
log("cache entire data-frame in memory")
dataFrame.cache()
log("run map(length).max to populate cache")
println(dataFrame.map(r => r.length).reduce((l1, l2) => Math.max(l1, l2)))
.cache().repartition(1)
DataFrame
gluent.com 34
“SELECT” sum (Year) from DataFrame (silly example!)
// SUM all values of “year” column
println(dataFrame.map(r => r(yearIndex).asInstanceOf[Int]).reduce((y1, y2) => y1 + y2))
17/01/19 19:39:25 INFO DAGScheduler: ResultStage 29 (reduce at demo.scala:71) finished in 4.664 s
17/01/19 19:39:25 INFO DAGScheduler: Job 14 finished: reduce at demo.scala:71, took 4.673204 s
Cached DataFrame: ~1M records, ~40 columns
1-column SUM: 4.67 seconds! (13x more than RDD?)
This does not
make sense!
gluent.com 35
“SELECT” sum (Year) from DataFrame (proper)
// SUM all values of “year” column
println(dataFrame.agg(sum("Year")).first.get(0))
17/01/19 19:32:02 INFO DAGScheduler: ResultStage 118 (first at demo.scala:70) finished in 0.004 s
17/01/19 19:32:02 INFO DAGScheduler: Job 40 finished: first at demo.scala:70, took 0.041698 s
Cached DataFrame ~1M records, ~40 columns
1-column sum with aggregation pushdown: 0.041 seconds!
(Over 100x faster than previous Silly DataFrame and 8.5x
faster than 1st RDD example)
gluent.com 36
Summary
• New data structures are required for CPU efficiency!
• Columnar …
• On efficient data structures, efficient code becomes possible
• Bad code still performs badly …
• It is possible to measure the CPU efficiency of your code
• That should come after the usual profiling and DAG / execution plan
validation
• All secondary metrics (like efficiency ratios) should be used in
context of how much work got done
gluent.com 37
Past & Future
gluent.com 38
Future-proof Open Data Formats!
• Disk-optimized columnar data structures
• Apache Parquet
• https://parquet.apache.org/
• Apache ORC
• https://orc.apache.org/
• Memory / CPU-cache optimized data structures
• Apache Arrow
• Not only storage format
• … also a cross-system/cross-platform IPC communication framework
• https://arrow.apache.org/
gluent.com 39
Future
1. RAM gets cheaper + bigger, not necessarily faster
2. CPU caches get larger
3. RAM blends with storage and becomes non-volatile
4. IO subsystems (flash) get even closer to CPUs
5. IO latencies shrink
6. The latency difference between non-volatile storage and volatile
RAM shrinks - new database layouts!
7. CPU cache is king – new data structures needed!
gluent.com 40
The tools used here:
• Honest Profiler by Richard Warburton (@RichardWarburto)
• https://github.com/RichardWarburton/honest-profiler
• Flame Graphs by Brendan Gregg (@brendangregg)
• http://www.brendangregg.com/flamegraphs.html
• Linux perf tool
• https://perf.wiki.kernel.org/index.php/Main_Page
• Spark-Prof demos:
• https://github.com/gluent/spark-prof
gluent.com 41
References
• Slides & Video of a similar presentation (about Oracle):
• http://www.slideshare.net/tanelp
• https://vimeo.com/gluent
• RAM is the new disk series:
• http://blog.tanelpoder.com/2015/08/09/ram-is-the-new-disk-and-
how-to-measure-its-performance-part-1/
• https://docs.google.com/spreadsheets/d/1ss0rBG8mePAVYP4hlpvjqA
AlHnZqmuVmSFbHMLDsjaU/
gluent.com 42
Thanks!
http://gluent.com/
We are hiring developers &
data engineers!!!
http://blog.tanelpoder.com
@tanelpoder

Weitere ähnliche Inhalte

Was ist angesagt?

Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder
 
Oracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret InternalsOracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret InternalsAnil Nair
 
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 1
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 1Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 1
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 1SolarWinds
 
Oracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsOracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsCarlos Sierra
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresqlbotsplash.com
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IODatabricks
 
Oracle sql high performance tuning
Oracle sql high performance tuningOracle sql high performance tuning
Oracle sql high performance tuningGuy Harrison
 
Migration to Oracle Multitenant
Migration to Oracle MultitenantMigration to Oracle Multitenant
Migration to Oracle MultitenantJitendra Singh
 
Survey of some free Tools to enhance your SQL Tuning and Performance Diagnost...
Survey of some free Tools to enhance your SQL Tuning and Performance Diagnost...Survey of some free Tools to enhance your SQL Tuning and Performance Diagnost...
Survey of some free Tools to enhance your SQL Tuning and Performance Diagnost...Carlos Sierra
 
SQL Monitoring in Oracle Database 12c
SQL Monitoring in Oracle Database 12cSQL Monitoring in Oracle Database 12c
SQL Monitoring in Oracle Database 12cTanel Poder
 
[Oracle DBA & Developer Day 2016] しばちょう先生の特別講義!!ストレージ管理のベストプラクティス ~ASMからExada...
[Oracle DBA & Developer Day 2016] しばちょう先生の特別講義!!ストレージ管理のベストプラクティス ~ASMからExada...[Oracle DBA & Developer Day 2016] しばちょう先生の特別講義!!ストレージ管理のベストプラクティス ~ASMからExada...
[Oracle DBA & Developer Day 2016] しばちょう先生の特別講義!!ストレージ管理のベストプラクティス ~ASMからExada...オラクルエンジニア通信
 
Adapting and adopting spm v04
Adapting and adopting spm v04Adapting and adopting spm v04
Adapting and adopting spm v04Carlos Sierra
 
Oracle Database SQL Tuning Concept
Oracle Database SQL Tuning ConceptOracle Database SQL Tuning Concept
Oracle Database SQL Tuning ConceptChien Chung Shen
 
Understanding oracle rac internals part 1 - slides
Understanding oracle rac internals   part 1 - slidesUnderstanding oracle rac internals   part 1 - slides
Understanding oracle rac internals part 1 - slidesMohamed Farouk
 
Troubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTroubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTanel Poder
 
UKOUG - 25 years of hints and tips
UKOUG - 25 years of hints and tipsUKOUG - 25 years of hints and tips
UKOUG - 25 years of hints and tipsConnor McDonald
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkDatabricks
 

Was ist angesagt? (20)

Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)
 
Oracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret InternalsOracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret Internals
 
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 1
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 1Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 1
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 1
 
AWR and ASH Deep Dive
AWR and ASH Deep DiveAWR and ASH Deep Dive
AWR and ASH Deep Dive
 
Oracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsOracle Performance Tuning Fundamentals
Oracle Performance Tuning Fundamentals
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 
AWR reports-Measuring CPU
AWR reports-Measuring CPUAWR reports-Measuring CPU
AWR reports-Measuring CPU
 
Oracle sql high performance tuning
Oracle sql high performance tuningOracle sql high performance tuning
Oracle sql high performance tuning
 
Migration to Oracle Multitenant
Migration to Oracle MultitenantMigration to Oracle Multitenant
Migration to Oracle Multitenant
 
Ash and awr deep dive hotsos
Ash and awr deep dive hotsosAsh and awr deep dive hotsos
Ash and awr deep dive hotsos
 
Survey of some free Tools to enhance your SQL Tuning and Performance Diagnost...
Survey of some free Tools to enhance your SQL Tuning and Performance Diagnost...Survey of some free Tools to enhance your SQL Tuning and Performance Diagnost...
Survey of some free Tools to enhance your SQL Tuning and Performance Diagnost...
 
SQL Monitoring in Oracle Database 12c
SQL Monitoring in Oracle Database 12cSQL Monitoring in Oracle Database 12c
SQL Monitoring in Oracle Database 12c
 
[Oracle DBA & Developer Day 2016] しばちょう先生の特別講義!!ストレージ管理のベストプラクティス ~ASMからExada...
[Oracle DBA & Developer Day 2016] しばちょう先生の特別講義!!ストレージ管理のベストプラクティス ~ASMからExada...[Oracle DBA & Developer Day 2016] しばちょう先生の特別講義!!ストレージ管理のベストプラクティス ~ASMからExada...
[Oracle DBA & Developer Day 2016] しばちょう先生の特別講義!!ストレージ管理のベストプラクティス ~ASMからExada...
 
Adapting and adopting spm v04
Adapting and adopting spm v04Adapting and adopting spm v04
Adapting and adopting spm v04
 
Oracle Database SQL Tuning Concept
Oracle Database SQL Tuning ConceptOracle Database SQL Tuning Concept
Oracle Database SQL Tuning Concept
 
Understanding oracle rac internals part 1 - slides
Understanding oracle rac internals   part 1 - slidesUnderstanding oracle rac internals   part 1 - slides
Understanding oracle rac internals part 1 - slides
 
Troubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTroubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contention
 
UKOUG - 25 years of hints and tips
UKOUG - 25 years of hints and tipsUKOUG - 25 years of hints and tips
UKOUG - 25 years of hints and tips
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache Spark
 

Ähnlich wie Low Level CPU Performance Profiling Examples

Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Lucidworks
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
GNW01: In-Memory Processing for Databases
GNW01: In-Memory Processing for DatabasesGNW01: In-Memory Processing for Databases
GNW01: In-Memory Processing for DatabasesTanel Poder
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Databricks
 
Mongo db v3_deep_dive
Mongo db v3_deep_diveMongo db v3_deep_dive
Mongo db v3_deep_diveBryan Reinero
 
Architecture at Scale
Architecture at ScaleArchitecture at Scale
Architecture at ScaleElasticsearch
 
Sql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramSql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramChris Adkin
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 
CPU Caches - Jamie Allen
CPU Caches - Jamie AllenCPU Caches - Jamie Allen
CPU Caches - Jamie Allenjaxconf
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBJason Terpko
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalVigyan Jain
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesHazelcast
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneEnkitec
 
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionTanel Poder
 
Managing data and operation distribution in MongoDB
Managing data and operation distribution in MongoDBManaging data and operation distribution in MongoDB
Managing data and operation distribution in MongoDBAntonios Giannopoulos
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)RichardWarburton
 

Ähnlich wie Low Level CPU Performance Profiling Examples (20)

Tuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for LogsTuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for Logs
 
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
CPU Caches
CPU CachesCPU Caches
CPU Caches
 
GNW01: In-Memory Processing for Databases
GNW01: In-Memory Processing for DatabasesGNW01: In-Memory Processing for Databases
GNW01: In-Memory Processing for Databases
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...
 
Mongo db v3_deep_dive
Mongo db v3_deep_diveMongo db v3_deep_dive
Mongo db v3_deep_dive
 
Architecture at Scale
Architecture at ScaleArchitecture at Scale
Architecture at Scale
 
Sql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramSql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ram
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
CPU Caches - Jamie Allen
CPU Caches - Jamie AllenCPU Caches - Jamie Allen
CPU Caches - Jamie Allen
 
Cpu Caches
Cpu CachesCpu Caches
Cpu Caches
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDB
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & Techniques
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry Osborne
 
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in Action
 
Managing data and operation distribution in MongoDB
Managing data and operation distribution in MongoDBManaging data and operation distribution in MongoDB
Managing data and operation distribution in MongoDB
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
 

Mehr von Tanel Poder

Troubleshooting Complex Oracle Performance Problems with Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel PoderTroubleshooting Complex Oracle Performance Problems with Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel PoderTanel Poder
 
Modern Linux Performance Tools for Application Troubleshooting
Modern Linux Performance Tools for Application TroubleshootingModern Linux Performance Tools for Application Troubleshooting
Modern Linux Performance Tools for Application TroubleshootingTanel Poder
 
SQL in the Hybrid World
SQL in the Hybrid WorldSQL in the Hybrid World
SQL in the Hybrid WorldTanel Poder
 
Tanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools shortTanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools shortTanel Poder
 
Connecting Hadoop and Oracle
Connecting Hadoop and OracleConnecting Hadoop and Oracle
Connecting Hadoop and OracleTanel Poder
 
Oracle Exadata Performance: Latest Improvements and Less Known Features
Oracle Exadata Performance: Latest Improvements and Less Known FeaturesOracle Exadata Performance: Latest Improvements and Less Known Features
Oracle Exadata Performance: Latest Improvements and Less Known FeaturesTanel Poder
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1Tanel Poder
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2Tanel Poder
 
Oracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention TroubleshootingOracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention TroubleshootingTanel Poder
 
Oracle LOB Internals and Performance Tuning
Oracle LOB Internals and Performance TuningOracle LOB Internals and Performance Tuning
Oracle LOB Internals and Performance TuningTanel Poder
 

Mehr von Tanel Poder (10)

Troubleshooting Complex Oracle Performance Problems with Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel PoderTroubleshooting Complex Oracle Performance Problems with Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel Poder
 
Modern Linux Performance Tools for Application Troubleshooting
Modern Linux Performance Tools for Application TroubleshootingModern Linux Performance Tools for Application Troubleshooting
Modern Linux Performance Tools for Application Troubleshooting
 
SQL in the Hybrid World
SQL in the Hybrid WorldSQL in the Hybrid World
SQL in the Hybrid World
 
Tanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools shortTanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools short
 
Connecting Hadoop and Oracle
Connecting Hadoop and OracleConnecting Hadoop and Oracle
Connecting Hadoop and Oracle
 
Oracle Exadata Performance: Latest Improvements and Less Known Features
Oracle Exadata Performance: Latest Improvements and Less Known FeaturesOracle Exadata Performance: Latest Improvements and Less Known Features
Oracle Exadata Performance: Latest Improvements and Less Known Features
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
 
Oracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention TroubleshootingOracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention Troubleshooting
 
Oracle LOB Internals and Performance Tuning
Oracle LOB Internals and Performance TuningOracle LOB Internals and Performance Tuning
Oracle LOB Internals and Performance Tuning
 

Kürzlich hochgeladen

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 

Kürzlich hochgeladen (20)

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 

Low Level CPU Performance Profiling Examples

  • 1. gluent.com 1 Low-Level CPU Performance Profiling Examples Tanel Poder a long time computer performance geek @tanelpoder blog.tanelpoder.com
  • 2. gluent.com 2 Intro: About me • Tanel Põder • RDBMS Performance geek 20+ years (Oracle) • Unix/Linux Performance geek • Hadoop Performance geek • Spark Performance geek? • http://blog.tanelpoder.com • @tanelpoder Expert Oracle Exadata book
  • 3. gluent.com 3 Gluent Oracle Teradata NoSQL Big Data Sources MSSQL App X App Y App Z A data sharing platform for enterprise applications Gluent as a data virtualization layer
  • 4. gluent.com 4 Some Microscopic level stuff to talk about… 1. Some things worth knowing about modern CPUs 2. Measuring internal CPU efficiency (C++) 3. A columnar database scanning example (Oracle) 4. Low level Analysis of Spark Performance • RDD vs DataFrame • DataFrame with bad code This is gonna be a (hopefully fun) hacking session!
  • 5. gluent.com 5 ”100%” busy? A CPU close to 100% busy? What if I told you your CPU is not that busy?
  • 6. gluent.com 6 CPU Performance Counters on Linux # perf stat -d -p PID sleep 30 Performance counter stats for process id '34783': 27373.819908 task-clock # 0.912 CPUs utilized 86,428,653,040 cycles # 3.157 GHz 32,115,412,877 instructions # 0.37 insns per cycle # 2.39 stalled cycles per insn 7,386,220,210 branches # 269.828 M/sec 22,056,397 branch-misses # 0.30% of all branches 76,697,049,420 stalled-cycles-frontend # 88.74% frontend cycles idle 58,627,393,395 stalled-cycles-backend # 67.83% backend cycles idle 256,440,384 cache-references # 9.368 M/sec 222,036,981 cache-misses # 86.584 % of all cache refs 234,361,189 LLC-loads # 8.562 M/sec 218,570,294 LLC-load-misses # 93.26% of all LL-cache hits 18,493,582 LLC-stores # 0.676 M/sec 3,233,231 LLC-store-misses # 0.118 M/sec 7,324,946,042 L1-dcache-loads # 267.589 M/sec 305,276,341 L1-dcache-load-misses # 4.17% of all L1-dcache hits 36,890,302 L1-dcache-prefetches # 1.348 M/sec 30.000601214 seconds time elapsed Measure what’s going on inside a CPU! Metrics explained in my blog entry: http://bit.ly/1PBIlde
  • 7. gluent.com 7 Modern CPUs can run multiple operations concurrently http://software.intel.com Multiple ports/execution units for computation & memory ops If waiting for RAM – CPU pipeline stall!
  • 8. gluent.com 8 Latency Numbers Every Programmer Should Know Latency Comparison Numbers -------------------------- L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Compress 1K bytes with Zippy 3,000 ns 3 us Send 1K bytes over 1 Gbps network 10,000 ns 10 us Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD Read 1 MB sequentially from memory 250,000 ns 250 us Round trip within same datacenter 500,000 ns 500 us Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms Source: https://gist.github.com/jboner/2841832
  • 9. gluent.com 9 CPU = fast CPU L2 / L3 cache in between RAM = slow
  • 10. gluent.com 10 Tape is dead, disk is tape, flash is disk, RAM locality is king Jim Gray, 2006 http://research.microsoft.com/en-us/um/people/gray/talks/flash_is_good.ppt
  • 11. gluent.com 11 Just caching all your data in RAM does not give you a modern “in-memory” system! * Columnar data structures to the rescue!
  • 12. gluent.com 12 Row-Major Data Structures SELECT SUM(column) FROM array
  • 13. gluent.com 13 Variable field offsets Memory line (cache line) size = 64 Bytes
  • 14. gluent.com 14 Columnar Data Structure (conceptual) Store values of a column next to each other (data locality) Much less data to scan (or filter) if accessing a subset of columns Better compression due to adjacent repeating (or slightly differing) values
  • 15. gluent.com 15 Single-Instruction-Multiple-Data (SIMD) processing • Run an operation (like ADD) on multiple registers/memory locations in a single instruction: Do the same work with less (but more complex) instructions More concurrency inside CPU If the underlying data structures “feed” data fast enough …
  • 16. gluent.com 16 A database example (Oracle)
  • 17. gluent.com 17 A simple Data Retrieval test! • Retrieve 1% rows out of a 8 GB table: SELECT COUNT(*) , SUM(order_total) FROM orders WHERE warehouse_id BETWEEN 500 AND 510 The Warehouse IDs range between 1 and 999 Test data generated by SwingBench tool
  • 18. gluent.com 18 Data Retrieval: Test Results • Remember, this is a very simple scanning + filtering query: TESTNAME PLAN_HASH ELA_MS CPU_MS LIOS BLK_READ ------------------------- ---------- -------- -------- --------- --------- test1: index range scan * 16715356 265203 37438 782858 511231 test2: full buffered */ C 630573765 132075 48944 1013913 849316 test3: full direct path * 630573765 15567 11808 1013873 1013850 test4: full smart scan */ 630573765 2102 729 1013873 1013850 test5: full inmemory scan 630573765 155 155 14 0 test6: full buffer cache 630573765 7850 7831 1014741 0 Test 5 & Test 6 run entirely from memory Source: http://www.slideshare.net/tanelp/oracle-database-inmemory-option-in-action But why 50x difference in CPU usage?
  • 19. gluent.com 19 CPU & cache friendly data structures are key! Headers, ITL entries Row Directory #0 hdr row #1 hdr row #2 hdr row #3 hdr row #4 hdr row #5 hdr row #6 hdr row #7 hdr row #8 hdr row … row #1 offset #2 offset #3 offset #0 offset … Hdr byte Column data Lock byte CC byte Col. len Column data Col. len Column data Col. len Column data Col. len • OLTP: Block->Row->Column format • 8kB blocks • Great for writes, changes • Field-length encoding • Reading column #100 requires walking through all preceding columns • Columns (with similar values) not densely packed together • Not CPU cache friendly for analytics!
  • 20. gluent.com 20 Scanning columnar data structures Scanning a column in a row-oriented data block Scanning a column in a column-oriented compression unit col 1 col 2 col 3 col 4 col 5 col 6 col 2 col 2 col 3 col 3 col 4 col 4 col 5 col 5 col5 col 6 col 1 col 2 3… col 3 col 4 col 4 col 5 col 6 col 1 col 2 col 3 col 3 col 4 col 4 col 5 col 5 col 1 col 2 col 6 col 6 col 1 col 2 3… col 3 col 4 col 4 col 5 col 6 col 1 col 2 col 3 col 3 col 4 col 4 col 5 col 5 col 1 col 2 col 6 col 6 col 1 col 2 3… col 3 col 4 col 4 col 5 col 6 col 1 col 2 col 3 col 3 col 4 col 4 col 5 col 5 col 1 col 2 col 6 col 6 Read filter column(s) first. Access only projected columns if matches found. Reduced memory traffic. More sequential RAM access, SIMD on adjacent data.
  • 21. gluent.com 21 Testing data access path differences on Oracle 12c SELECT COUNT(cust_valid) FROM customers_nopart c WHERE cust_id > 0 Run the same query on same dataset stored in different formats/layouts. Full details: http://blog.tanelpoder.com/2015/11/30 /ram-is-the-new-disk-and-how-to- measure-its-performance-part-3-cpu- instructions-cycles/ Test result data: http://bit.ly/1RitNMr
  • 22. gluent.com 22 CPU instructions used for scanning/counting 69M rows
  • 23. gluent.com 23 Average CPU instructions per row processed • Knowing that the table has about 69M rows, I can calculate the average number of instructions issued per row processed
  • 24. gluent.com 24 CPU cycles consumed (full scans only)
  • 25. gluent.com 25 CPU efficiency (Instructions-per-Cycle) Yes, modern superscalar CPUs can execute multiple instructions per cycle
  • 26. gluent.com 26 Reducing memory writes within SQL execution • Old approach: 1. Read compressed data chunk 2. Decompress data (write data to temporary memory location) 3. Filter out non-matching rows 4. Return data • New approach: 1. Read and filter compressed columns 2. Decompress only required columns of matching rows 3. Return data
  • 27. gluent.com 27 Memory reads & writes during internal processing Unit = MB Read only requested columns Rows counted from chunk headers Scan compressed data: few memory writes
  • 28. gluent.com 28 Spark Examples • Will use: • Spark built in tools • Perf • Honest Profiler • FlameGraphs
  • 29. gluent.com 29 Apache Spark Tungsten Data Structures Databricks presentation: http://www.slideshare.n et/SparkSummit/deep- dive-into-project- tungsten-josh-rosen Much denser data structure Using sun.misc.Unsafe API to bypass JVM object allocator
  • 30. gluent.com 30 Apache Spark Tungsten Data Structures Much denser data structure “Good memory locality”
  • 31. gluent.com 31 Spark test setup (RDD) CSV RDD (partitoned) RDD (single partition) “For each” sum column X val lines = sc.textFile("/tmp/simple_data.csv").repartition(1) val stringFields = lines.map(line => line.split(",")) val fullFieldLength = stringFields.first.length val completeFields = stringFields.filter(fields => fields.length == fullFieldLength) val data = completeFields.map(fields => fields.patch(yearIndex, Array(Try(fields(yearIndex).toInt).getOrElse(0)), 1)) log("cache entire RDD in memory") data.cache() log("run map(length).max to populate cache") println(data.map(r => r.length).reduce((l1, l2) => Math.max(l1, l2))) .cache().repartition(1) I wanted to simplify this test as much as possible
  • 32. gluent.com 32 “SELECT” sum (Year) from RDD // SUM all values of “year” column println(data.map(d => d(yearIndex).asInstanceOf[Int]).reduce((y1, y2) => y1 + y2)) Cached RDD ~1M records, ~40 columns 1-column sum: 0.349 seconds! 17/01/19 18:43:36 INFO DAGScheduler: ResultStage 123 (reduce at demo.scala:89) finished in 0.349 s 17/01/19 18:43:36 INFO DAGScheduler: Job 61 finished: reduce at demo.scala:89, took 0.353754 s
  • 33. gluent.com 33 Spark test setup (DataFrame) CSV RDD partitioned RDD single partition “For each” sum column X val lines = sc.textFile("/tmp/simple_data.csv").repartition(1) val stringFields = lines.map(line => line.split(",")) val fullFieldLength = stringFields.first.length val completeFields = stringFields.filter(fields => fields.length == fullFieldLength) val data = completeFields.map(fields => fields.patch(yearIndex, Array(Try(fields(yearIndex).toInt).getOrElse(0)), 1)) ... val dataFrame = ss.createDataFrame(data.map(d => Row(d: _*)), schema) log("cache entire data-frame in memory") dataFrame.cache() log("run map(length).max to populate cache") println(dataFrame.map(r => r.length).reduce((l1, l2) => Math.max(l1, l2))) .cache().repartition(1) DataFrame
  • 34. gluent.com 34 “SELECT” sum (Year) from DataFrame (silly example!) // SUM all values of “year” column println(dataFrame.map(r => r(yearIndex).asInstanceOf[Int]).reduce((y1, y2) => y1 + y2)) 17/01/19 19:39:25 INFO DAGScheduler: ResultStage 29 (reduce at demo.scala:71) finished in 4.664 s 17/01/19 19:39:25 INFO DAGScheduler: Job 14 finished: reduce at demo.scala:71, took 4.673204 s Cached DataFrame: ~1M records, ~40 columns 1-column SUM: 4.67 seconds! (13x more than RDD?) This does not make sense!
  • 35. gluent.com 35 “SELECT” sum (Year) from DataFrame (proper) // SUM all values of “year” column println(dataFrame.agg(sum("Year")).first.get(0)) 17/01/19 19:32:02 INFO DAGScheduler: ResultStage 118 (first at demo.scala:70) finished in 0.004 s 17/01/19 19:32:02 INFO DAGScheduler: Job 40 finished: first at demo.scala:70, took 0.041698 s Cached DataFrame ~1M records, ~40 columns 1-column sum with aggregation pushdown: 0.041 seconds! (Over 100x faster than previous Silly DataFrame and 8.5x faster than 1st RDD example)
  • 36. gluent.com 36 Summary • New data structures are required for CPU efficiency! • Columnar … • On efficient data structures, efficient code becomes possible • Bad code still performs badly … • It is possible to measure the CPU efficiency of your code • That should come after the usual profiling and DAG / execution plan validation • All secondary metrics (like efficiency ratios) should be used in context of how much work got done
  • 38. gluent.com 38 Future-proof Open Data Formats! • Disk-optimized columnar data structures • Apache Parquet • https://parquet.apache.org/ • Apache ORC • https://orc.apache.org/ • Memory / CPU-cache optimized data structures • Apache Arrow • Not only storage format • … also a cross-system/cross-platform IPC communication framework • https://arrow.apache.org/
  • 39. gluent.com 39 Future 1. RAM gets cheaper + bigger, not necessarily faster 2. CPU caches get larger 3. RAM blends with storage and becomes non-volatile 4. IO subsystems (flash) get even closer to CPUs 5. IO latencies shrink 6. The latency difference between non-volatile storage and volatile RAM shrinks - new database layouts! 7. CPU cache is king – new data structures needed!
  • 40. gluent.com 40 The tools used here: • Honest Profiler by Richard Warburton (@RichardWarburto) • https://github.com/RichardWarburton/honest-profiler • Flame Graphs by Brendan Gregg (@brendangregg) • http://www.brendangregg.com/flamegraphs.html • Linux perf tool • https://perf.wiki.kernel.org/index.php/Main_Page • Spark-Prof demos: • https://github.com/gluent/spark-prof
  • 41. gluent.com 41 References • Slides & Video of a similar presentation (about Oracle): • http://www.slideshare.net/tanelp • https://vimeo.com/gluent • RAM is the new disk series: • http://blog.tanelpoder.com/2015/08/09/ram-is-the-new-disk-and- how-to-measure-its-performance-part-1/ • https://docs.google.com/spreadsheets/d/1ss0rBG8mePAVYP4hlpvjqA AlHnZqmuVmSFbHMLDsjaU/
  • 42. gluent.com 42 Thanks! http://gluent.com/ We are hiring developers & data engineers!!! http://blog.tanelpoder.com @tanelpoder