Weitere ähnliche Inhalte
Ähnlich wie OOW13: It's a solid state-world (20)
Kürzlich hochgeladen (20)
OOW13: It's a solid state-world
- 1. It’s a Solid State World
How Exadata X3 leverages flash memory
Gwen Shapira
Marc Fielding
- 2. About Gwen
– Solutions Architect,
Cloudera
– Oracle ACE Director
– Presents, Blogs, Tweets
– @gwenshap
2
© 2013 Pythian
- 3. About Marc
• Senior Consultant with Pythian’s
Advanced Technology Group
• 12+ years Oracle production
systems experience starting with
Oracle 7
• Blogger and conference
presenter
pythian.com/news/author/fielding
• Occasionally on twitter: @mfild
3
© 2013 Pythian
- 5. Sh*t people say about SSDs
Too expensive
Fast for reads
Type of SSD matters
Use SSD in SAN
Don’t use for writes
Use SATA SSD
Used for REDO
Use for random writes
Becomes slower over time
Don’t use for REDO
© 2013 Pythian
Use PCI SSD
Only used in Exadata
Only Sun flash devices are supported
5
Unreliable
Is it same as Flash?
- 7. The technology: NAND flash
• Slower than RAM, but both
nonvolatile and affordable in large
capacities
• SLC
– One bit per cell
– High performance
0
1
00
• MLC
– Two bits per cell
– More capacity = cheaper
7
© 2013 Pythian
01
10
11
- 8. We will talk about
•
•
•
•
•
8
I/O Performance
Using SSDs for Oracle
How Exadata uses SSDs
SSD devices
Practice: Reading SSD
Vendor Specs
© 2013 Pythian
- 9. Cells, pages, and blocks
Cell
1bit
Page
4K
Block
128 Pages
512K
Plane = 1024 Blocks = 512MB
Planes are grouped into dies
which are grouped into packages
9
© 2013 Pythian
- 10. The big gocha
• Reads = 4KB pages
• Writes = 4KB pages
• Deletes = 512KB blocks
10
© 2013 Pythian
- 11. Reads: orders of magnitude
•
CPU registers – 0.3 * ns (1 cycle)
•
CPU Cache L1 – 1.2* ns
•
CPU Cache L2 – 3.0* ns
•
CPU Cache L3 – 12-24 ns
•
Main Memory (RAM) – 60-100 ns
•
SSD – 60,000 ns
•
Magnetic Storage (“DISK”) – 3,000,000 ns
•
SAN devices ~ 15,000,000 ns
12
© 2013 Pythian
- 13. Writes
• Writes on new SSD – 250,000 ns
• Comparable to rotating disk
How much data can you write to a new 250GB
SSD?
14
© 2013 Pythian
- 14. Deletes
• Can’t overwrite data without deleting first
• Can only delete blocks of 128*4K pages
• To Overwrite a page:
–
–
–
–
Read 127 pages
Write 127 to a free block
Delete old block
Perform the write we originally requested
• Takes 2ms
• Each cell can only be written 100K times
15
© 2013 Pythian
- 15. The SSD controller
•
•
•
•
Does the “magic” behind the scenes
Deletes in the background (“garbage collection”)
Tracks free space
Balances I/O over cells
(“wear leveling”)
• Manages spare capacity
(“overprovisioning”)
• Manages RAM cache
16
© 2013 Pythian
- 16. The consequences
• Write Amplification
–
–
–
–
How much data is really written when we write 1MB
1 means no overhead
The closer to 1 the better
Less than 1 means the vendor is lying
• Never benchmark a brand-new SSD
– Run benchmarks long enough to run out of
overprovisioned space
17
© 2013 Pythian
- 17. We will talk about
•
•
•
•
•
18
I/O Performance
Using SSDs for Oracle
How Exadata uses SSDs
SSD devices
Practice: Reading SSD
Vendor Specs
© 2013 Pythian
- 19. Solid-state your whole database?
• SSDs solve I/O latency problems
• But not if db file sequential read is not in your
top 5 wait events
• And not if you haven’t maxed out your RAM for buffer
cache (yet)
• If your CPU utilization is high, solve this first.
23
© 2013 Pythian
- 20. SSD mistakes
• SSD in primary but not DR site
– I/O capacity to apply real-time updates
– What if you need a switchover
• Over-managing active segments
– If DBAs didn’t have enough to do already…
• Database smart flash cache
25
© 2013 Pythian
- 21. Database “smart” flash cache
Block
read from
disk
Disk
26
If block is
needed, it is
read from
SSD
SGA
Block evicted
from SGA is
written to
SSD cache
by DBWR
Flash Cache
© 2013 Pythian
- 22. Database “smart” flash cache
• Pros:
– Automatically keeps active data in SSD
• Cons:
–
–
–
–
Large overhead for managing cache, all taken from SGA
Overhead for DBWR
No benefit and some overhead for writes
Only one disk
Using Smart Flash Cache will make your I/O faster than
using just disks, but smartly placing data on SSD will be
even faster.
27
© 2013 Pythian
- 23. We will talk about
•
•
•
•
•
28
I/O Performance
Using SSDs for Oracle
How Exadata uses SSDs
SSD devices
Practice: Reading SSD
Vendor Specs
© 2013 Pythian
- 24. In the beginning
• Exadata V1, 2008
• Joint project of HP and Oracle
• Designed for big and long-running
queries (think data warehouses)
• No flash cache
29
© 2013 Pythian
- 25. And then
•
•
•
•
Exadata V2, 2009
Brand-new PCI-based flash cache
Integrated with storage servers
A full high-performance rack has:
–
–
–
–
4 * 14 Sun F20 flash accelerator cards
96GB * 4 * 14 = 5.4TB SLC flash
75 GB/sec flash throughput
1.5m IOPS
• Note that InfiniBand will limit you to 4GB/sec per DB node
30
© 2013 Pythian
- 26. Fast-forward to 2012
• Exadata X3, 2012
• Still integrated with storage servers
• A full high-performance rack has:
–
–
–
–
4 * 14 Sun F40 flash accelerator cards
400GB * 4 * 14 = 22.4TB MLC flash
100 GB/sec flash throughput
1.5m IOPS
• Same InfiniBand speeds
31
© 2013 Pythian
- 27. Just announced
• Flash cache compression
– Fit more data into your flash
– Exadata hardware support TBD
– Only if the data isn’t already compressed (HCC)
32
© 2013 Pythian
- 28. Exadata smart flash cache
•
•
•
•
33
Not the database smart flash cache
No victim caching here
Flash memory on storage servers
Can be used for traditional storage too (but you
lose capacity to redundancy)
© 2013 Pythian
- 29. Uncached reads
1. Uncached data is read
from disk first
2. Sent to the database
3. and then copied to cache
cellsrv
Disks
34
© 2013 Pythian
Database
SSD Cache
- 30. Cached reads
– Cached blocks come from
flash cache directly
– Except smart scans: disk only
– If you set
cell_flash_cache keep
they read from
both disk and flash
cellsrv
Disks
35
© 2013 Pythian
Database
SSD Cache
- 31. Writes (1)
– Writes go to disk first
– Then copied to cache,
sometimes
cellsrv
Database
• Indexes and tables with
random read I/O are
prioritized
• Or use
cell_flash_cache
keep
36
Disks
© 2013 Pythian
SSD Cache
- 32. Writes (2)
–
–
–
–
Write back cache
11.2.0.3 BP9+
Writes go to SSD first
Then copied to disk,
eventually
cellsrv
Disks
37
© 2013
Database
SSD Cache
- 33. Exadata smart flash logging
•
•
•
•
•
•
38
In some Exadata systems: I/O outliers
Slow log file syncs
But aren’t flash writes slow?
We now write to both disk and flash
Puts an upper limit on latency
Data corruption bug fixed in
11.2.3.2.1, and ASM resilvering
bug fixed in 11.2.0.3 BP9
© 2013 Pythian
- 34. Mixed workloads
• Classic example: OLTP and DW on
same system
• DW does long-running, I/O-intensive
queries
• OLTP does relatively little I/O transfer
• But OLTP very latency sensitive
• DW monopolizes the flash cache
• How to prioritize cache for OLTP?
39
© 2013 Pythian
- 35. The workaround
• Control via I/O resource manager
alter iormplan dbplan=((name=dss, level=1, flashcache=off),
(name=other, level=1, flashCache=on));
•
•
•
•
•
40
Disables flash cache entirely for a DB
Very coarse control: on or off
Obvious effect in I/O performance
Use only if you need it
cellcli list flashcachecontent can show what
is in the cache
© 2013 Pythian
- 36. We will talk about
•
•
•
•
•
41
I/O Performance
Using SSDs for Oracle
How Exadata uses SSDs
SSD devices
Practice: Reading SSD
Vendor Specs
© 2013 Pythian
- 37. Interfaces
• SATA
– 32 outstanding IO
– 6Gb/s = 600MB/s
– significant latency
• SAS
– 256 outstanding IO
– 6Gb/s = 600MB/s
42
© 2013 Pythian
- 39. Interfaces
• Fiber channel
– Use existing storage
infrastructure
– High latency
– Shared: works with RAC
• Proprietary PCI
– By flash array vendors
– Avoids latency penalty of FC
44
© 2013 Pythian
- 40. We will talk about
•
•
•
•
•
45
I/O Performance
Using SSDs for Oracle
How Exadata uses SSDs
SSD devices
Practice: Reading SSD
Vendor Specs
© 2013 Pythian
- 46. Wrapping up
•
•
•
•
•
51
SSDs make random reads wicked fast
Writes and deletes are complicated
Exadata’s smart flash cache speeds up random reads
Not all SSDs are the same
Read vendor specs carefully
© 2013 Pythian
- 47. Thank you and Q&A
gshapira@cloudera.com
@gwenshap
fielding@pythian.com
@mfild
52
© 2013 Pythian