SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Andrei Khurshudov
                                                                           Sr. Director
                                                                              SSD Q&R
                                                                   Seagate Technology

                                                                                          October 20, 2008
Symposium on Magnetic Storage Tribology and Reliability
Miami, Florida
October 20, 2008


                                                      10/27/2008                                       1
SSD – In One Page
         SSD ≡ Solid State Drive
         SSD is a storage device
         ◦ using solid state memory as components instead of heads and
           disks
         ◦ appearing to the user as a drive similar to a hard disk drive (HDD)

         SSD uses non-volatile memory (NAND Flash) or volatile
         semiconductor memory (RAM) with a battery
         Current SSD products utilize either SLC (single-level cell)
         or MLC (multi-level cell) NAND Flash
         SSD benefits: read performance, higher reliability, low
         power consumption
         SSD challenges: cost, product reliability over life, and write
         performance


Andrei Khurshudov
Seagate Technology
                                                              10/27/2008         2
October 20, 2008
Today and Tomorrow of SSD
         Today’s total revenue ~ $400 M

         Projected 2011 revenue ~ $5 B

         Today’s unit shipments ~ 4M units
         ◦ Dominated by the industrial applications
         ◦ Dominated by capacities <1 GB

         Projected 2011 unit shipments ~ 50M units
         ◦ Dominated by shipments to portable PCs
         ◦ Dominated by capacities from 64 GB to 128 GB

         The Total Cost of Ownership (TCO) is expected to drive the
         transition from HDDs to SSDs
         ◦ Conclusion: there is no need for the complete price parity at
           equivalent capacity points

                                                                           | Source: IDC
Andrei Khurshudov
Seagate Technology
                                                              10/27/2008                   3
October 20, 2008
Basic Flash Operation
        Flash stores data by trapping charge at the floating
        gate
        Direct access to data:
            Program (write)    a “page” (2KB or 4 KB + ECC bytes)
        ◦
            Read   a page
        ◦
            Erase   the smallest unit is a block (64,128, or more pages)
        ◦
            Over-write = Erase (Block) + Write (page)
        ◦

        Program / Erase operations:
        ◦ Forces electrons in the substrate to tunnel through the oxide layer to be
          transported to and trapped on the floating gate (“0”)
        ◦ Forces electrons back to the substrate (“1”)

        Read operation:
        ◦ Apply voltage to the control gate and sense the current
          through the inversion channel:
              “1” if there is a current flow
              “0” if there is no current flow


Andrei Khurshudov
Seagate Technology
                                                                     10/27/2008       4
October 20, 2008
Program and Erase Cycle
                       20 V                                      0V



                     Control Gate                             Control Gate

                       Dielectric                               Dielectric
                     Floating Gate                            Floating Gate
       Float                          Float      Float                           Float
                 eeeeeeeeeee
                      Gate Oxide                               Gate Oxide
                                                           eeeeeeeeeee
        Source                       Drain        Source                        Drain




                        0V                                      20 V
        Equivalent to “data write” in HDD         Equivalent to “data erase” in HDD
        Electrons are moved from the substrate    Electrons are moved from the floating
        and trapped in the floating gate          gate into the substrate
        Programming is done by “pages”            Erasures are done by “blocks”
        Results in a logical “0”                  Results in a logical “1”
        Uses Fowler-Nordheim tunneling            Uses Fowler-Nordheim tunneling

Andrei Khurshudov
Seagate Technology
                                                                   10/27/2008             5
October 20, 2008
Flash Technology Trends
                                                                         | Source: J. Cooke, Micron technology
                           | Source: Samsung




     Future roadmap for NAND charge
     storage technology:
     Scaling down and increasing
     complexity

                                               10X reduction in reliability that
                                               needs to be compensated for by
                                               other means
             Transition from SLC (single-level cells) to MLC (multi-level
             cells) will represent a significant challenge to Flash reliability
             Not just writes but reads have a degrading effect on the
             flash data retention
Andrei Khurshudov
Seagate Technology
                                                                    10/27/2008                                   6
October 20, 2008
Quality Assurance: HDD vs. SSD
                                                                                SSD
                             HDD

                                                              Immature Industry: Non-uniform,
              Mature Industry: Mature Tests
                                                                        Inconsistent
             Development and Qualification                     Development and Qualification
             Tests – very similar across the industry          Tests – inconsistent across the industry
              Test conditions – consistent across the           Test conditions – inconsistent across the
            industry                                          industry
              Test sample size and environments - very         Test sample size, environments, and failure
            similar across the industry                       criteria - inconsistent across the industry
             Firmware testing, validation, and issue           Firmware testing, validation, and issue
            handling – years of experience                    handling – little experience
             Acceleration factors:                             Acceleration factors:
                 Temperature – similar                             Temperature – understood
                 Usage – unclear                                   Usage – understood
                 Voltage – not well defined                        Voltage – understood
             Reliability demonstration – standard RDT tests    Reliability demonstration – inconsistent across
            & standard data interpretation                    the industry

                                                                         Reliability Focus
                       Reliability Focus
                                                               Endurance (wearout)
             Head-disk interface
                                                               Data retention
             Handling robustness
                                                               Read and write disturb
                                                               Wear-leveling algorithms


Andrei Khurshudov
Seagate Technology
                                                                                       10/27/2008                7
October 20, 2008
Major Failure Modes of NAND Flash
     • Flash-specific failure modes include:
           • Program disturb: other cells than those being programmed receive elevated
           voltage. Can be on the page that is not supposed to be programmed. Erase
           will return cells to the “normal” state
           • Read disturb: within the block being read but on pages not being read. Erase
           will return cells to the “normal” state
           • Data retention: charge loss or gain occurs in the cell over time. Erase will
           return cells to the “normal” state
           • Endurance (Wear-out): cell fails due to charge trapped in the dielectric layer.
           Not recoverable by erase.
                                                       Programmed Cell after P/E Cycling

         Other SSD failure modes:
     •
                                                                   Control Gate
           • Handling damage
           • EOS/ESD
                                                                     Dielectric
                                                                   Floating Gate
                                                                                     Gate Oxide, SiO2
                                                              eeeeeeeeeee
           • Firmware / ASIC failures                          eeeee
                                                     Source                          Drain
           • Other failures
                                                                    P-substrate
Andrei Khurshudov
Seagate Technology
October10/27/2008                                                                            8
        20, 2008
SSD Endurance
                                                                               Electrical effects:
                                                      P/Emax                    Electrical effects:
                                                                                --Faster programming
                                                                                   Faster programming
                                                                               due to trapping charges
 Failure rate, %




                                                                                due to trapping charges
                                                                               inside in dielectric instead
                                                                                inside in dielectric instead
                                                                               of the FG
                                                                                of the FG
                                                                               --Slower erasure because
                                     ß=1                                          Slower erasure because
                                                                               the trapped charges are
                                                                                the trapped charges are
                                                                               harder to remove than
                                                          ß>1
                   ß<1                                                          harder to remove than
                                                                               those in FG;
                                                                                those in FG;
                                                       True P/E cycles
                                                       Time
                                                       GB written


                     Program/Erase (P/E) cycles cause charge to be trapped in the dielectric layer
                     This causes a permanent shift in cell characteristics, which is not recovered
                    by erase
                     Observed as failed program or erase status
                     In most cases, data could be recovered from the failed block
                     Blocks that fail should be retired (marked as bad and no longer used)

Andrei Khurshudov
Seagate Technology
                                                                                 10/27/2008               9
October 20, 2008
SSD Endurance: Major Factors
          Stress:
                 Number of P/E cycles
                      External P/E cycles (host write data rate)
                      Internal Write multiplication
                          External data entropy (block size distribution     application
                          specific)
                          Internal data handling (data buffering, Flash architecture, etc.)
                      Wear-leveling efficiency (write uniformity across Flash cells)
                      Operating environment
                          Temperature (could both stress and help)
          Strength:
                 Flash Endurance robustness
                 Device ECC power
                 Design redundancies or excess capacities
                 Bad block identification and data re-assign mechanism
Andrei Khurshudov
Seagate Technology
                                                                         10/27/2008           10
October 20, 2008
Endurance: SLC vs. MLC
                                                                           Multi-level cells use
                                                                         different charge levels
                                                                         to store two or more
                                                                         bits in one cell


                                                                          Read/Write design
                                                                         margins (and the gaps
                                                                         between the Vt levels)
                                                                         are much smaller for
                                                                         MLC resulting in lower
                                                                         endurance
                            | Source: W. Hutsell, Texas Memory Systems




                 Transition to MLC would represent a significant
                 reliability challenge

Andrei Khurshudov
Seagate Technology
                                                                                   10/27/2008      11
October 20, 2008
SSD Data Retention
                     Programmed Cell                                         Programmed Cell after P/E
                                                                                     Cycling
                                 Control Gate                                             Control Gate

                                   Dielectric                                               Dielectric
                                 Floating Gate                                            Floating Gate
                                                                                                                      Gate Oxide
                     eeeeeeeeeee                                                  eeeeeeeeeee
                                                                                   eeeee
                                  Gate Oxide
           Source                                                Drain   Source                                       Drain

                                  P-substrate                                              P-substrate


       Programmed Cell after long NOP Storage                                 Programmed Cell after P/E
                                                                             Cycling and long NOP Storage
                                 Control Gate                                             Control Gate

                                   Dielectric                                               Dielectric
                                 Floating Gate                                            Floating Gate
                                                                                                                      Gate Oxide
                     e       e       e       e       e       e                                e       e
                                                                                  e       e       e       e       e
                                  Gate Oxide
                         e       e       e       e       e
            Source                                               Drain   Source                                       Drain
                                                                                  e       e       e       e       e
                                                                                      e       e       e       e
                                  P-substrate                                              P-substrate



        Non-operating storage causes charge to leak from the floating gate
        P/E cycling lead to even faster charge dissipation and eventual data loss
Andrei Khurshudov
Seagate Technology
                                                                                                      10/27/2008                   12
October 20, 2008
Data Retention vs. Time and P/E cycles
                                        P/E cycling shortens data retention
                                                     | Source: Samsung




| Source: Jim Cooke, Micron
                                                                 No P/E cycling impact on endurance             Strong endurance
                                                                                                               dependence on P/E
                                                                                                                     cycling

                                         Newer technologies shortens data retention




                              Exercising flash reduces its long-term data retention
                              This problem gets worse as the Flash scales down (60
                              nm    4x nm) and increases in complexity (SLC      MLC)
Andrei Khurshudov
Seagate Technology
                                                                                                      10/27/2008                   13
October 20, 2008
Understand and Overcome Fundamental Technology limitations
              Write Endurance (max. program/erase cycles)
                     Degrades with device scaling
                     100k for SLC NAND, 10k for MLC-2b, 1k for MLC-3b, 100 for MLC-4b
              Data Retention
                     Degrades with device scaling
                     Depends on temperature and P/E cycling
                     10 year retention @ up to 10% P/E cycles, 1 year retention @ 100% P/E cycles
              Read disturb
                     Degrades with device scaling
                     1M for SLC NAND, 100k for MLC-2b
              Write multiplication
                     Block erasure might lead to many additional internal writes for every host write



        Mitigate Flash Limitations with Advanced Reliability & Test
        Technologies
              Static and dynamic wear leveling to maximize life of the device
              Write reduction solutions
              Deploying increased ECC power
              SSD-specific Test and Qualification process (CERT, DMT, RDT, ORT, etc.)


Andrei Khurshudov
Seagate Technology
                                                                                  10/27/2008            14
October 20, 2008
Predictive Life Modeling
              SSD reliability modeling could potentially be more accurate than
              that for HDD. However, …
              Failure mechanisms are highly inter-independent and supplier-
              specific, which makes things difficult


        Flash Component Quality and Reliability
              Superb quality control is required to compensate for high lot &
              part variability in high-volume environment
              Component reliability correlation to a system and to the field &
              integration needs to be established


        Standardization of the most critical tests and methodologies
              Need to establish common language and definitions




Andrei Khurshudov
Seagate Technology
                                                                10/27/2008       15
October 20, 2008
SSD future is bright and promising but dependent on several critical
         areas, including reliability
              HDD to SSD transition rate will be a strong function of the total cost of ownership
          ◦
              (TCO)
              Reliability plays a critical role in reducing the TCO
          ◦
              SSD technology scaling is expected to have a negative impact on reliability
          ◦


         SSD reliability efforts should focus on the following major areas:
              Endurance
          ◦
              Data retention
          ◦
              Read / Program disturb
          ◦
              Reliability enhancing technologies (wear leveling, ECC, etc.)
          ◦


         SSD test standardization is required:
              No “apple-to-apple” comparison will be possible otherwise
          ◦
              TCO is difficult to estimate without having standard tests
          ◦




Andrei Khurshudov
Seagate Technology
                                                                          10/27/2008            16
October 20, 2008
Backup




         10/27/2008   17
• JEDEC JC64.8 was formed to focus on developing and coordinating SSD standards
    activity
                                              JC-64.8 Co-Chairs: Alvin Cox, Seagate, Scott Graham, Micron

    Participating                                   JC64
                                     Editorial TG
                                     Editorial TG
    Companies:                                      Embedded Memory Storage and Removable Memory Cards
                                     Roadmap TG     Embedded Memory Storage and Removable Memory Cards
                                     Roadmap TG
    Intel, Microsoft, Micron,
    Samsung, Toshiba, Hitachi,       SaS TG
                                     SaS TG
    LSI, Sandisk,                                   JC64.1         JC64.2           JC64.3             JC64.8
    Seagate, Dell, HP, Tyco, STEC,   Enabling TG
                                     Enabling TG    Electrical     Mechanical       Host Controller    SSD
                                                    Electrical     Mechanical       Host Controller    SSD
    Marvell, Nvidia
    and others                                        EJTG: eMMC     MJTG: MMC        UFS TG
                                     MMCA             EJTG: eMMC     MJTG: MMC        UFS TG

                                                      UFS TG         UFS TG
                                                      UFS TG         UFS TG




        • JC-64.8 Subcommittee Scope: Solid State Drives
              Define/propose standards for solid state drives used for embedded or
              removable memory storage leveraging existing storage infrastructure…
              Include… quality, reliability, durability methods and procedures that are not
              included in the interface standards…


Andrei Khurshudov
Seagate Technology
                                                                                          10/27/2008            18
October 20, 2008

Weitere ähnliche Inhalte

Andere mochten auch

Ich will agil testen! was muss ich können iqnite 2014 - verison 2.0
Ich will agil testen! was muss ich können   iqnite 2014 - verison 2.0Ich will agil testen! was muss ich können   iqnite 2014 - verison 2.0
Ich will agil testen! was muss ich können iqnite 2014 - verison 2.0Michael Fischlein
 
University of Pune - Project report
University of Pune -  Project reportUniversity of Pune -  Project report
University of Pune - Project reportRishiraj Randive
 
Control scheme for a stand alone wind energy convertion system
Control scheme for a stand alone wind energy convertion systemControl scheme for a stand alone wind energy convertion system
Control scheme for a stand alone wind energy convertion systemshashank chelpuri
 
Solid State Drives - Seminar for Computer Engineering Semester 6 - VIT,Univer...
Solid State Drives - Seminar for Computer Engineering Semester 6 - VIT,Univer...Solid State Drives - Seminar for Computer Engineering Semester 6 - VIT,Univer...
Solid State Drives - Seminar for Computer Engineering Semester 6 - VIT,Univer...ravipbhat
 
National Housing Board prof. ajay shukla university of pune_nashik
National Housing Board  prof. ajay shukla  university of pune_nashikNational Housing Board  prof. ajay shukla  university of pune_nashik
National Housing Board prof. ajay shukla university of pune_nashikProf. Ajay H Shukla
 
University Of Pune Result 2014
University Of Pune Result 2014University Of Pune Result 2014
University Of Pune Result 2014Daily Pudhari
 
Mktstratergyofmcdonald 111231011641-phpapp02
Mktstratergyofmcdonald 111231011641-phpapp02Mktstratergyofmcdonald 111231011641-phpapp02
Mktstratergyofmcdonald 111231011641-phpapp02vinod singh
 
Marketing of Cadbury
Marketing of Cadbury Marketing of Cadbury
Marketing of Cadbury anshulramnani
 
Hard Disk Drive versus Solid State Drive
Hard Disk Drive versus Solid State DriveHard Disk Drive versus Solid State Drive
Hard Disk Drive versus Solid State DriveDac Khue Nguyen
 
A Proposal for an Alternative to MTBF/MTTF
A Proposal for an Alternative to MTBF/MTTFA Proposal for an Alternative to MTBF/MTTF
A Proposal for an Alternative to MTBF/MTTFASQ Reliability Division
 
Principles of Interior design
Principles of Interior designPrinciples of Interior design
Principles of Interior designRohan Choudhary
 
Flash Memory OS
Flash Memory OSFlash Memory OS
Flash Memory OSC.U
 
What are The Elements of Interior Design
What are The Elements of Interior Design What are The Elements of Interior Design
What are The Elements of Interior Design emmymartin
 

Andere mochten auch (14)

Ich will agil testen! was muss ich können iqnite 2014 - verison 2.0
Ich will agil testen! was muss ich können   iqnite 2014 - verison 2.0Ich will agil testen! was muss ich können   iqnite 2014 - verison 2.0
Ich will agil testen! was muss ich können iqnite 2014 - verison 2.0
 
University of Pune - Project report
University of Pune -  Project reportUniversity of Pune -  Project report
University of Pune - Project report
 
Control scheme for a stand alone wind energy convertion system
Control scheme for a stand alone wind energy convertion systemControl scheme for a stand alone wind energy convertion system
Control scheme for a stand alone wind energy convertion system
 
Solid State Drives - Seminar for Computer Engineering Semester 6 - VIT,Univer...
Solid State Drives - Seminar for Computer Engineering Semester 6 - VIT,Univer...Solid State Drives - Seminar for Computer Engineering Semester 6 - VIT,Univer...
Solid State Drives - Seminar for Computer Engineering Semester 6 - VIT,Univer...
 
National Housing Board prof. ajay shukla university of pune_nashik
National Housing Board  prof. ajay shukla  university of pune_nashikNational Housing Board  prof. ajay shukla  university of pune_nashik
National Housing Board prof. ajay shukla university of pune_nashik
 
The micoscope
The micoscopeThe micoscope
The micoscope
 
University Of Pune Result 2014
University Of Pune Result 2014University Of Pune Result 2014
University Of Pune Result 2014
 
Mktstratergyofmcdonald 111231011641-phpapp02
Mktstratergyofmcdonald 111231011641-phpapp02Mktstratergyofmcdonald 111231011641-phpapp02
Mktstratergyofmcdonald 111231011641-phpapp02
 
Marketing of Cadbury
Marketing of Cadbury Marketing of Cadbury
Marketing of Cadbury
 
Hard Disk Drive versus Solid State Drive
Hard Disk Drive versus Solid State DriveHard Disk Drive versus Solid State Drive
Hard Disk Drive versus Solid State Drive
 
A Proposal for an Alternative to MTBF/MTTF
A Proposal for an Alternative to MTBF/MTTFA Proposal for an Alternative to MTBF/MTTF
A Proposal for an Alternative to MTBF/MTTF
 
Principles of Interior design
Principles of Interior designPrinciples of Interior design
Principles of Interior design
 
Flash Memory OS
Flash Memory OSFlash Memory OS
Flash Memory OS
 
What are The Elements of Interior Design
What are The Elements of Interior Design What are The Elements of Interior Design
What are The Elements of Interior Design
 

Ähnlich wie SSD Reliability Expert Discusses SSD Quality and Failure Modes

Ocz presentation october 2010 final
Ocz presentation october 2010 finalOcz presentation october 2010 final
Ocz presentation october 2010 finalAholdsworth
 
The future of optical storage x rg zech (slide share)
The  future of optical storage x rg zech (slide share)The  future of optical storage x rg zech (slide share)
The future of optical storage x rg zech (slide share)rgzech
 
Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective Qo...
Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective Qo...Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective Qo...
Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective Qo...Alpen-Adria-Universität
 
Ideal 3D Stacked Die Test - IEEE Semiconductor Wafer Test Workshop SWTW 2013
Ideal 3D Stacked Die Test - IEEE Semiconductor Wafer Test Workshop SWTW 2013Ideal 3D Stacked Die Test - IEEE Semiconductor Wafer Test Workshop SWTW 2013
Ideal 3D Stacked Die Test - IEEE Semiconductor Wafer Test Workshop SWTW 2013Ira Feldman
 
Plastic Logic - Reliability of OTFTs
Plastic Logic - Reliability of OTFTsPlastic Logic - Reliability of OTFTs
Plastic Logic - Reliability of OTFTsPlastic Logic
 
Solid State Drives (SSDs) -What it Takes to Make Data Go Away
Solid State Drives (SSDs) -What it Takes to Make Data Go AwaySolid State Drives (SSDs) -What it Takes to Make Data Go Away
Solid State Drives (SSDs) -What it Takes to Make Data Go AwayBlancco
 
Esd the broad impact and design challenges part1of2
Esd the broad impact and design challenges part1of2Esd the broad impact and design challenges part1of2
Esd the broad impact and design challenges part1of2ASQ Reliability Division
 
FLASH MEMORY: THE BIG DATA from Structure:Data 2012
FLASH MEMORY: THE BIG DATA from Structure:Data 2012FLASH MEMORY: THE BIG DATA from Structure:Data 2012
FLASH MEMORY: THE BIG DATA from Structure:Data 2012Gigaom
 
Ocz presentation needham_hdd_conference
Ocz presentation needham_hdd_conferenceOcz presentation needham_hdd_conference
Ocz presentation needham_hdd_conferenceAholdsworth
 
Building software defined clouds - Boyan Ivanov
Building software defined clouds - Boyan Ivanov  Building software defined clouds - Boyan Ivanov
Building software defined clouds - Boyan Ivanov ShapeBlue
 
Ditching Fibre Channel & SCSI: Saying hast la vista to your vendors and "ooh ...
Ditching Fibre Channel & SCSI: Saying hast la vista to your vendors and "ooh ...Ditching Fibre Channel & SCSI: Saying hast la vista to your vendors and "ooh ...
Ditching Fibre Channel & SCSI: Saying hast la vista to your vendors and "ooh ...jasonjwwilliams
 
[DSBW Spring 2009] Unit 04: From Requirements to the UX Model
[DSBW Spring 2009] Unit 04: From Requirements to the UX Model[DSBW Spring 2009] Unit 04: From Requirements to the UX Model
[DSBW Spring 2009] Unit 04: From Requirements to the UX ModelCarles Farré
 
Application acceleration from the data storage perspective
Application acceleration from the data storage perspectiveApplication acceleration from the data storage perspective
Application acceleration from the data storage perspectiveInterop
 
Oracle Cloud is Best for Oracle Database - High Availability
Oracle Cloud is Best for Oracle Database - High AvailabilityOracle Cloud is Best for Oracle Database - High Availability
Oracle Cloud is Best for Oracle Database - High AvailabilityMarkus Michalewicz
 
YolactEdge Review [cdm]
YolactEdge Review [cdm]YolactEdge Review [cdm]
YolactEdge Review [cdm]Dongmin Choi
 
Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020
Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020
Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020data://disrupted®
 
Intel and DataStax: 3D XPoint and NVME Technology Cassandra Storage Comparison
Intel and DataStax: 3D XPoint and NVME Technology Cassandra Storage ComparisonIntel and DataStax: 3D XPoint and NVME Technology Cassandra Storage Comparison
Intel and DataStax: 3D XPoint and NVME Technology Cassandra Storage ComparisonDataStax Academy
 
Not about the Big in Big Data
Not about the Big in Big DataNot about the Big in Big Data
Not about the Big in Big DataDataWorks Summit
 

Ähnlich wie SSD Reliability Expert Discusses SSD Quality and Failure Modes (20)

Ocz presentation october 2010 final
Ocz presentation october 2010 finalOcz presentation october 2010 final
Ocz presentation october 2010 final
 
The future of optical storage x rg zech (slide share)
The  future of optical storage x rg zech (slide share)The  future of optical storage x rg zech (slide share)
The future of optical storage x rg zech (slide share)
 
Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective Qo...
Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective Qo...Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective Qo...
Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective Qo...
 
Ideal 3D Stacked Die Test - IEEE Semiconductor Wafer Test Workshop SWTW 2013
Ideal 3D Stacked Die Test - IEEE Semiconductor Wafer Test Workshop SWTW 2013Ideal 3D Stacked Die Test - IEEE Semiconductor Wafer Test Workshop SWTW 2013
Ideal 3D Stacked Die Test - IEEE Semiconductor Wafer Test Workshop SWTW 2013
 
Plastic Logic - Reliability of OTFTs
Plastic Logic - Reliability of OTFTsPlastic Logic - Reliability of OTFTs
Plastic Logic - Reliability of OTFTs
 
Solid State Drives (SSDs) -What it Takes to Make Data Go Away
Solid State Drives (SSDs) -What it Takes to Make Data Go AwaySolid State Drives (SSDs) -What it Takes to Make Data Go Away
Solid State Drives (SSDs) -What it Takes to Make Data Go Away
 
Esd the broad impact and design challenges part1of2
Esd the broad impact and design challenges part1of2Esd the broad impact and design challenges part1of2
Esd the broad impact and design challenges part1of2
 
SSD PPT BY SAURABH
SSD PPT BY SAURABHSSD PPT BY SAURABH
SSD PPT BY SAURABH
 
FLASH MEMORY: THE BIG DATA from Structure:Data 2012
FLASH MEMORY: THE BIG DATA from Structure:Data 2012FLASH MEMORY: THE BIG DATA from Structure:Data 2012
FLASH MEMORY: THE BIG DATA from Structure:Data 2012
 
Ocz presentation needham_hdd_conference
Ocz presentation needham_hdd_conferenceOcz presentation needham_hdd_conference
Ocz presentation needham_hdd_conference
 
Building software defined clouds - Boyan Ivanov
Building software defined clouds - Boyan Ivanov  Building software defined clouds - Boyan Ivanov
Building software defined clouds - Boyan Ivanov
 
Ditching Fibre Channel & SCSI: Saying hast la vista to your vendors and "ooh ...
Ditching Fibre Channel & SCSI: Saying hast la vista to your vendors and "ooh ...Ditching Fibre Channel & SCSI: Saying hast la vista to your vendors and "ooh ...
Ditching Fibre Channel & SCSI: Saying hast la vista to your vendors and "ooh ...
 
[DSBW Spring 2009] Unit 04: From Requirements to the UX Model
[DSBW Spring 2009] Unit 04: From Requirements to the UX Model[DSBW Spring 2009] Unit 04: From Requirements to the UX Model
[DSBW Spring 2009] Unit 04: From Requirements to the UX Model
 
Application acceleration from the data storage perspective
Application acceleration from the data storage perspectiveApplication acceleration from the data storage perspective
Application acceleration from the data storage perspective
 
Oracle Cloud is Best for Oracle Database - High Availability
Oracle Cloud is Best for Oracle Database - High AvailabilityOracle Cloud is Best for Oracle Database - High Availability
Oracle Cloud is Best for Oracle Database - High Availability
 
YolactEdge Review [cdm]
YolactEdge Review [cdm]YolactEdge Review [cdm]
YolactEdge Review [cdm]
 
Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020
Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020
Storage Benchmarks - Voodoo oder Wissenschaft? – data://disrupted® 2020
 
Cim 20071101 nov_2007
Cim 20071101 nov_2007Cim 20071101 nov_2007
Cim 20071101 nov_2007
 
Intel and DataStax: 3D XPoint and NVME Technology Cassandra Storage Comparison
Intel and DataStax: 3D XPoint and NVME Technology Cassandra Storage ComparisonIntel and DataStax: 3D XPoint and NVME Technology Cassandra Storage Comparison
Intel and DataStax: 3D XPoint and NVME Technology Cassandra Storage Comparison
 
Not about the Big in Big Data
Not about the Big in Big DataNot about the Big in Big Data
Not about the Big in Big Data
 

Mehr von Andrei Khurshudov

Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...
Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...
Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...Andrei Khurshudov
 
Short introduction to Big Data Analytics, the Internet of Things, and their s...
Short introduction to Big Data Analytics, the Internet of Things, and their s...Short introduction to Big Data Analytics, the Internet of Things, and their s...
Short introduction to Big Data Analytics, the Internet of Things, and their s...Andrei Khurshudov
 
Health monitoring & predictive analytics to lower the TCO in a datacenter
Health monitoring & predictive analytics to lower the TCO in a datacenterHealth monitoring & predictive analytics to lower the TCO in a datacenter
Health monitoring & predictive analytics to lower the TCO in a datacenterAndrei Khurshudov
 
clusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheetclusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheetAndrei Khurshudov
 
Future Information Growth And Storage Device Reliability 2007
Future Information Growth And Storage Device Reliability 2007Future Information Growth And Storage Device Reliability 2007
Future Information Growth And Storage Device Reliability 2007Andrei Khurshudov
 

Mehr von Andrei Khurshudov (9)

Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...
Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...
Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...
 
Short introduction to Big Data Analytics, the Internet of Things, and their s...
Short introduction to Big Data Analytics, the Internet of Things, and their s...Short introduction to Big Data Analytics, the Internet of Things, and their s...
Short introduction to Big Data Analytics, the Internet of Things, and their s...
 
Seagate_1
Seagate_1Seagate_1
Seagate_1
 
Health monitoring & predictive analytics to lower the TCO in a datacenter
Health monitoring & predictive analytics to lower the TCO in a datacenterHealth monitoring & predictive analytics to lower the TCO in a datacenter
Health monitoring & predictive analytics to lower the TCO in a datacenter
 
Using Big Data Analytics
Using Big Data AnalyticsUsing Big Data Analytics
Using Big Data Analytics
 
Presentation_Final
Presentation_FinalPresentation_Final
Presentation_Final
 
clusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheetclusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheet
 
Long Term Data Storage 2007
Long Term Data Storage 2007Long Term Data Storage 2007
Long Term Data Storage 2007
 
Future Information Growth And Storage Device Reliability 2007
Future Information Growth And Storage Device Reliability 2007Future Information Growth And Storage Device Reliability 2007
Future Information Growth And Storage Device Reliability 2007
 

SSD Reliability Expert Discusses SSD Quality and Failure Modes

  • 1. Andrei Khurshudov Sr. Director SSD Q&R Seagate Technology October 20, 2008 Symposium on Magnetic Storage Tribology and Reliability Miami, Florida October 20, 2008 10/27/2008 1
  • 2. SSD – In One Page SSD ≡ Solid State Drive SSD is a storage device ◦ using solid state memory as components instead of heads and disks ◦ appearing to the user as a drive similar to a hard disk drive (HDD) SSD uses non-volatile memory (NAND Flash) or volatile semiconductor memory (RAM) with a battery Current SSD products utilize either SLC (single-level cell) or MLC (multi-level cell) NAND Flash SSD benefits: read performance, higher reliability, low power consumption SSD challenges: cost, product reliability over life, and write performance Andrei Khurshudov Seagate Technology 10/27/2008 2 October 20, 2008
  • 3. Today and Tomorrow of SSD Today’s total revenue ~ $400 M Projected 2011 revenue ~ $5 B Today’s unit shipments ~ 4M units ◦ Dominated by the industrial applications ◦ Dominated by capacities <1 GB Projected 2011 unit shipments ~ 50M units ◦ Dominated by shipments to portable PCs ◦ Dominated by capacities from 64 GB to 128 GB The Total Cost of Ownership (TCO) is expected to drive the transition from HDDs to SSDs ◦ Conclusion: there is no need for the complete price parity at equivalent capacity points | Source: IDC Andrei Khurshudov Seagate Technology 10/27/2008 3 October 20, 2008
  • 4. Basic Flash Operation Flash stores data by trapping charge at the floating gate Direct access to data: Program (write) a “page” (2KB or 4 KB + ECC bytes) ◦ Read a page ◦ Erase the smallest unit is a block (64,128, or more pages) ◦ Over-write = Erase (Block) + Write (page) ◦ Program / Erase operations: ◦ Forces electrons in the substrate to tunnel through the oxide layer to be transported to and trapped on the floating gate (“0”) ◦ Forces electrons back to the substrate (“1”) Read operation: ◦ Apply voltage to the control gate and sense the current through the inversion channel: “1” if there is a current flow “0” if there is no current flow Andrei Khurshudov Seagate Technology 10/27/2008 4 October 20, 2008
  • 5. Program and Erase Cycle 20 V 0V Control Gate Control Gate Dielectric Dielectric Floating Gate Floating Gate Float Float Float Float eeeeeeeeeee Gate Oxide Gate Oxide eeeeeeeeeee Source Drain Source Drain 0V 20 V Equivalent to “data write” in HDD Equivalent to “data erase” in HDD Electrons are moved from the substrate Electrons are moved from the floating and trapped in the floating gate gate into the substrate Programming is done by “pages” Erasures are done by “blocks” Results in a logical “0” Results in a logical “1” Uses Fowler-Nordheim tunneling Uses Fowler-Nordheim tunneling Andrei Khurshudov Seagate Technology 10/27/2008 5 October 20, 2008
  • 6. Flash Technology Trends | Source: J. Cooke, Micron technology | Source: Samsung Future roadmap for NAND charge storage technology: Scaling down and increasing complexity 10X reduction in reliability that needs to be compensated for by other means Transition from SLC (single-level cells) to MLC (multi-level cells) will represent a significant challenge to Flash reliability Not just writes but reads have a degrading effect on the flash data retention Andrei Khurshudov Seagate Technology 10/27/2008 6 October 20, 2008
  • 7. Quality Assurance: HDD vs. SSD SSD HDD Immature Industry: Non-uniform, Mature Industry: Mature Tests Inconsistent Development and Qualification Development and Qualification Tests – very similar across the industry Tests – inconsistent across the industry Test conditions – consistent across the Test conditions – inconsistent across the industry industry Test sample size and environments - very Test sample size, environments, and failure similar across the industry criteria - inconsistent across the industry Firmware testing, validation, and issue Firmware testing, validation, and issue handling – years of experience handling – little experience Acceleration factors: Acceleration factors: Temperature – similar Temperature – understood Usage – unclear Usage – understood Voltage – not well defined Voltage – understood Reliability demonstration – standard RDT tests Reliability demonstration – inconsistent across & standard data interpretation the industry Reliability Focus Reliability Focus Endurance (wearout) Head-disk interface Data retention Handling robustness Read and write disturb Wear-leveling algorithms Andrei Khurshudov Seagate Technology 10/27/2008 7 October 20, 2008
  • 8. Major Failure Modes of NAND Flash • Flash-specific failure modes include: • Program disturb: other cells than those being programmed receive elevated voltage. Can be on the page that is not supposed to be programmed. Erase will return cells to the “normal” state • Read disturb: within the block being read but on pages not being read. Erase will return cells to the “normal” state • Data retention: charge loss or gain occurs in the cell over time. Erase will return cells to the “normal” state • Endurance (Wear-out): cell fails due to charge trapped in the dielectric layer. Not recoverable by erase. Programmed Cell after P/E Cycling Other SSD failure modes: • Control Gate • Handling damage • EOS/ESD Dielectric Floating Gate Gate Oxide, SiO2 eeeeeeeeeee • Firmware / ASIC failures eeeee Source Drain • Other failures P-substrate Andrei Khurshudov Seagate Technology October10/27/2008 8 20, 2008
  • 9. SSD Endurance Electrical effects: P/Emax Electrical effects: --Faster programming Faster programming due to trapping charges Failure rate, % due to trapping charges inside in dielectric instead inside in dielectric instead of the FG of the FG --Slower erasure because ß=1 Slower erasure because the trapped charges are the trapped charges are harder to remove than ß>1 ß<1 harder to remove than those in FG; those in FG; True P/E cycles Time GB written Program/Erase (P/E) cycles cause charge to be trapped in the dielectric layer This causes a permanent shift in cell characteristics, which is not recovered by erase Observed as failed program or erase status In most cases, data could be recovered from the failed block Blocks that fail should be retired (marked as bad and no longer used) Andrei Khurshudov Seagate Technology 10/27/2008 9 October 20, 2008
  • 10. SSD Endurance: Major Factors Stress: Number of P/E cycles External P/E cycles (host write data rate) Internal Write multiplication External data entropy (block size distribution application specific) Internal data handling (data buffering, Flash architecture, etc.) Wear-leveling efficiency (write uniformity across Flash cells) Operating environment Temperature (could both stress and help) Strength: Flash Endurance robustness Device ECC power Design redundancies or excess capacities Bad block identification and data re-assign mechanism Andrei Khurshudov Seagate Technology 10/27/2008 10 October 20, 2008
  • 11. Endurance: SLC vs. MLC Multi-level cells use different charge levels to store two or more bits in one cell Read/Write design margins (and the gaps between the Vt levels) are much smaller for MLC resulting in lower endurance | Source: W. Hutsell, Texas Memory Systems Transition to MLC would represent a significant reliability challenge Andrei Khurshudov Seagate Technology 10/27/2008 11 October 20, 2008
  • 12. SSD Data Retention Programmed Cell Programmed Cell after P/E Cycling Control Gate Control Gate Dielectric Dielectric Floating Gate Floating Gate Gate Oxide eeeeeeeeeee eeeeeeeeeee eeeee Gate Oxide Source Drain Source Drain P-substrate P-substrate Programmed Cell after long NOP Storage Programmed Cell after P/E Cycling and long NOP Storage Control Gate Control Gate Dielectric Dielectric Floating Gate Floating Gate Gate Oxide e e e e e e e e e e e e e Gate Oxide e e e e e Source Drain Source Drain e e e e e e e e e P-substrate P-substrate Non-operating storage causes charge to leak from the floating gate P/E cycling lead to even faster charge dissipation and eventual data loss Andrei Khurshudov Seagate Technology 10/27/2008 12 October 20, 2008
  • 13. Data Retention vs. Time and P/E cycles P/E cycling shortens data retention | Source: Samsung | Source: Jim Cooke, Micron No P/E cycling impact on endurance Strong endurance dependence on P/E cycling Newer technologies shortens data retention Exercising flash reduces its long-term data retention This problem gets worse as the Flash scales down (60 nm 4x nm) and increases in complexity (SLC MLC) Andrei Khurshudov Seagate Technology 10/27/2008 13 October 20, 2008
  • 14. Understand and Overcome Fundamental Technology limitations Write Endurance (max. program/erase cycles) Degrades with device scaling 100k for SLC NAND, 10k for MLC-2b, 1k for MLC-3b, 100 for MLC-4b Data Retention Degrades with device scaling Depends on temperature and P/E cycling 10 year retention @ up to 10% P/E cycles, 1 year retention @ 100% P/E cycles Read disturb Degrades with device scaling 1M for SLC NAND, 100k for MLC-2b Write multiplication Block erasure might lead to many additional internal writes for every host write Mitigate Flash Limitations with Advanced Reliability & Test Technologies Static and dynamic wear leveling to maximize life of the device Write reduction solutions Deploying increased ECC power SSD-specific Test and Qualification process (CERT, DMT, RDT, ORT, etc.) Andrei Khurshudov Seagate Technology 10/27/2008 14 October 20, 2008
  • 15. Predictive Life Modeling SSD reliability modeling could potentially be more accurate than that for HDD. However, … Failure mechanisms are highly inter-independent and supplier- specific, which makes things difficult Flash Component Quality and Reliability Superb quality control is required to compensate for high lot & part variability in high-volume environment Component reliability correlation to a system and to the field & integration needs to be established Standardization of the most critical tests and methodologies Need to establish common language and definitions Andrei Khurshudov Seagate Technology 10/27/2008 15 October 20, 2008
  • 16. SSD future is bright and promising but dependent on several critical areas, including reliability HDD to SSD transition rate will be a strong function of the total cost of ownership ◦ (TCO) Reliability plays a critical role in reducing the TCO ◦ SSD technology scaling is expected to have a negative impact on reliability ◦ SSD reliability efforts should focus on the following major areas: Endurance ◦ Data retention ◦ Read / Program disturb ◦ Reliability enhancing technologies (wear leveling, ECC, etc.) ◦ SSD test standardization is required: No “apple-to-apple” comparison will be possible otherwise ◦ TCO is difficult to estimate without having standard tests ◦ Andrei Khurshudov Seagate Technology 10/27/2008 16 October 20, 2008
  • 17. Backup 10/27/2008 17
  • 18. • JEDEC JC64.8 was formed to focus on developing and coordinating SSD standards activity JC-64.8 Co-Chairs: Alvin Cox, Seagate, Scott Graham, Micron Participating JC64 Editorial TG Editorial TG Companies: Embedded Memory Storage and Removable Memory Cards Roadmap TG Embedded Memory Storage and Removable Memory Cards Roadmap TG Intel, Microsoft, Micron, Samsung, Toshiba, Hitachi, SaS TG SaS TG LSI, Sandisk, JC64.1 JC64.2 JC64.3 JC64.8 Seagate, Dell, HP, Tyco, STEC, Enabling TG Enabling TG Electrical Mechanical Host Controller SSD Electrical Mechanical Host Controller SSD Marvell, Nvidia and others EJTG: eMMC MJTG: MMC UFS TG MMCA EJTG: eMMC MJTG: MMC UFS TG UFS TG UFS TG UFS TG UFS TG • JC-64.8 Subcommittee Scope: Solid State Drives Define/propose standards for solid state drives used for embedded or removable memory storage leveraging existing storage infrastructure… Include… quality, reliability, durability methods and procedures that are not included in the interface standards… Andrei Khurshudov Seagate Technology 10/27/2008 18 October 20, 2008