SlideShare ist ein Scribd-Unternehmen logo
1 von 58
Downloaden Sie, um offline zu lesen
的性能调优
• This session will be delivered in English only
      But I have colleagues from the local office who speak Chinese and
       can help with questions.
• Target audience is intermediate level SQL Server professionals
      SQL Server experts should feel free to attend other sessions
DAT300-5


Performance tuning for non-
beginners but not yet experts
           Joe Yong
           Senior Program Manager
           Microsoft Corp
• Goals
    Understand key performance influencers
    Review common performance monitoring tools and techniques
    How to interpret the data and determine what best practices are
     relevant to your environment
    Review actual customer performance tuning mistakes
• Non-goals
    Make you a SQL Server performance monitoring/tuning expert
    Deep dive on features or tools
       o   But feel free to ask questions
    Provide performance troubleshooting magic bullets
    Flashy demos
• Plenty of content for beginners
• Plenty of content for expert level
• Not everybody can jump from A to Z in one step
• A lot of the content in this space is too cryptic and confusing for
  the average Joe
• This Joe does not believe you need to be an expert to find
  performance problems
       Though you might need help fixing them
         o   But at least you know what needs to be fixed
                And you’ll learn to spot real experts vs. those who read a lot of
                 blogs and repeat what they read
•   Understanding performance
•   Measuring and tracking
•   Interpreting the data
•   Practical solutions
• Understanding performance
    What is performance
      Performance influencers
• Measuring and tracking
• Interpreting the data
• Practical solutions
• Fulfilling workload demands within a period deemed acceptable
        or agreed upon by the user1
                 Guaranteed < 2s business transaction response time
                 130,000 database transactions per second
                 Loading >1 billion rows an hour
                 18GB/s storage subsystem
                 75seconds RTO for SQL Server in a Failover Cluster
                 Backup 2TB per hour over the network
                 Restore 1.3TB database in under 30 minutes
                 DBCC DBREINDEX on a 3 billion row table (ok, probably not a good
                  idea)

1   DBAs are just another class of user
• Scalability is the level of performance maintained as workload
  influencers increase
      ERP application
        o   5 users: 2 seconds query response time
        o   700 users: 2.3 seconds query response time
    Generate year end report for
        o   300GB database: 85seconds
        o   1.5TB database: 30 minutes
• Understanding performance
      What is performance
      Performance influencers
• Measuring and tracking
• Interpreting the data
• Practical solutions
• User workload
       DBAs are a special class of users
• System & infrastructure resources
       Server, storage, interconnects, networks
• Optimization
       Workload and resources sized and configured appropriately for your performance
        requirements at your require level (scale)
• Contention
       A performance blocker that highlights a system’s ability to scale (or not)
• Environment
       Mostly stuff beyond your control like data center, ISP, workload source (e.g. end-
        user PC)
• Understanding performance
• Measuring and tracking
• Interpreting the data
• Practical solutions
• Measures will vary depending on user workload
      Accounting will have very different performance metrics from
       an online trading – financial analyzer metrics are nothing like
       either of those
• Different perspective of the same workload will have
  different measures
      Retail POS system
        o   Cashier cares about items being scanned and registered quickly
        o   Store manager cares about sales totals at any time
        o   Loss prevention specialist wants to know quickly if items registered in
            POS equals to decrease in automated inventory manager
• Workload: measures will vary depending on the workload type
    Accounting will have very different performance metrics from online
     trading – financial analyzer metrics are nothing like either of those
• User: different perspective of the same application will have different
  measures
    Library information system
         o   Image recording/scanning clerk cares about fast BLOB inserts
         o   Researcher needs fast searches on large data sets across multiple formats
       Retail POS system
         o   Cashier cares about items being scanned and registered quickly
         o   Store manager cares about sales totals at any time
         o   Loss prevention specialist wants to know quickly if items registered in POS
             equals to decrease in automated inventory manager
• Different application admins/owners track different
  performance metrics
      Not always aligned with what users care about
• DBAs have some basic metrics that are “almost” universally
  helpful
    Resource utilization
    Workload response time
    Contention and waits
• Important: performance metrics taken at any point in time
  without a baseline reference is often not useful
• Establishes “known behavior”
      Resource utilization levels for a given workload and scale
        o   Order entry system with 500 concurrent users consumes x GB RAM, y%
            sustained average CPU utilization – certified by ISV to scale almost
            linearly up to 11,000 concurrent users
      User workload query response times
        o   Monday 8am-11am: 2-5 seconds
        o   Friday 12pm-4pm: 5-12 seconds
        o   Other times: 1-3 seconds
• Documents acceptable performance – your performance SLA
• Networks never have problems
      We use multiple redundant 10Gb/s switches and average reliability
       is around 99.98%, what network problem?
• Storage is never the problem
      We have two 2Gb/s ports from your server to the SAN, you are only
       using 400MB/s, go back and tune your SQL queries
• End users can “instinctively” tell if performance is bad
      Why is it very slow today? It is usually a lot faster. You must have
       done something wrong during the upgrade.
• End users’ PCs/laptops are always working fine and nothing has
  changed (or ever changes)
• Databases are guilty until proven innocent
• Without documented performance metrics and baselines you
  just can’t win
• Especially if you have just performed a major operation (e.g.
  just upgraded to SQL Server 2008 R2)
• Difficult to convince other parties to investigate their areas (see
  previous slide)
•   Major medical health care provider
•   3 months upgrade planning and testing
•   Successful upgrade over weekend
•   Normal operations Monday morning; 6 phone calls and multiple
    emails starting 130pm
     Complaints about very slow queries, timeouts, etc…
     Blames DBAs for botching the upgrade
     Managers and senior managers put on CC
• DBA team reviews database server load and resource utilization
    Server not breaking a sweat
    Network operations center (NOC) says network is healthy
    Storage monitor shows light loads
    DBAs run tests from local clients and verified performance metrics
     very close to baseline
       o   Blocker scripts, long running queries, etc… all show nothing suspicious
• Eliminated database, server, storage – focus on network
• 2.5 hours later, faulty switch at Mexico office discovered
    NOC still says network is ok (apparently local network equipment
     problems don’t count)
    Other local US user problems not related
•   Storage
•   Memory
•   CPU
•   Network
• This is the physical foundation of your database
• 80% deployments (that I’ve seen or worked on) have sub-
  optimal or incorrectly configured storage
      Including those certified by a storage vendor
• Always test (SQLIO tool) before deployment
• Many counters and waitstats data available to monitor
• Start with the basics
    IOPS
    Throughput
    Latency
    Queuing
Metric                     Data source                     Considerations
         IOPS                       Performance Monitor             • Number of I/O requests per second
                                    • Average disk read/write per   • Type of disk and size can affect this greatly
                                      second                              • E.g. 2.5” SAS can sustain ~180 IOPS,
                                                                            3.5” 15k rpm SCSI sustains ~130 IOPS
         Throughput                 Performance Monitor             • Bytes throughput per second
                                    • Disk read/write bytes per     • Beware of B vs b
                                      second                              • 1Gb/s ~ 125MB/s
                                                                          • 3Gb/s ~ 300MB/s
         Latency                    Performance Monitor             • Latency of I/O operations
                                    • Average disk sec per          • Typically1
                                      read/write                          • Log: 1-5ms (ideally <1ms)
                                                                          • Data (OLTP): 5-20ms (ideally <10ms)
                                                                          • Data (DSS): 20-30ms
         Queuing                    Performance Monitor             • Number of I/O requests that cannot be
                                    • Logical/Physical disk:          serviced immediately
                                      Average disk queue length     • Value is measured across all disks for
                                                                      direct attached storage
1   SQL Server Pre-deployment IO Best Practices
• Next in-line in the list of common performance bottlenecks
• Difficult to really determine how much is enough given SQL
  Server’s memory management approach
      Measure in reverse; look for pressure
• Further complicated with NUMA servers
      One or more nodes may have severe memory pressure while other
       nodes are idle
• Often result in performance problem symptoms in other areas
• Start with the basics
    Available free memory
    Memory pressure hints
Metric            Data source                    Considerations
Available free    Performance Monitor          • Tracks available system memory for use
memory            • Memory:Available Mbytes    • Total number of pages on the free list
                  • SQLServer:BufferManagerFr
                    ee Pages
Memory pressure   Performance Monitor          • Tracks potential memory pressure by
hints             • SQLServer:BufferManagerFr   measuring number of free page requests
                    ee list stalls /sec          that are not granted immediately
                  • SQLServer:Memory           • Track what SQL Server thinks it could use
                    ManagerTarget Server        vs what it current is able to use
                    Memory (KB) and            • Track memory grants pending
                    SQLServer:Memory
                    ManagerTotal Server
                    Memory (KB)
                  DMV
                  • Sys.dm_os_wait_stats
• Often recommended in public forums, blogs, events as counters
  to determine if memory is adequate
      Cache hit ratio >90%
        o   Highly workload dependent – if the queries do not touch the same
            data, hit ratio will be low but this is not a clear indicator of where the
            performance problem might be
      Page life expectancy >300
        o   Can be workload dependent and behavior is similar to cache hit ratios
      Memory grants pending <2
        o   Theoretically tracks memory requests pending but this can be caused
            by contention (e.g. blocking), not just lack of memory
• Correlate multiple data point; don’t just measure one
• Sustained high activity may or may not indicate bottlenecks or
  problems
• Hardware faults can sometimes result in high CPU utilization
  and/or context switches
• Start with the basics
    Overall utilization
    CPU pressure hints
Metric                Data source                   Considerations
Overall utilization   Performance Monitor           • Amount of time the CPUs are actually
                      • Processor:%Processor Time     doing work across the entire system
                                                    • Value is for all CPUs in the system

CPU pressure          Performance Monitor           • Total number of threads awaiting CPU
hints                 • System:Processor Queue        resource
                        Length                      • Value is for all CPUs in the system
• Usually one of the easiest resources to manage for performance
• Start with the basics
    Overall utilization
    Device errors


• Be careful with some commonly recommended counters
      Output queue length
        o   Dependent on operating system implementation (E.g. always zero in
            Vista)
Metric                Data source                     Considerations
Overall utilization   Performance Monitor             • Measures volume of data sent and
                      • Network Interface: Bytes        received by each network adapter or
                        Total /sec                      cumulative for all NICs in the system
Device errors         Windows Event Log               • Non-critical hardware faults are logged
                      • System Logs: network device     here
                        errors                        • Critical hardware faults can result in blue-
                                                        screen
• Understanding performance
• Measuring and tracking
• Interpreting the data
• Practical solutions
Data                               Considerations
     • Average disk read/write per      • Ensure system requirements are less than storage capacity
       second                           • If using SAN, ensure SAN cache can flush to disk faster then
     • Disk read/write bytes per          workload requests fills up SAN cache
       second                           • Beware of B vs b for bytes/second
                                               • 1Gb/s ~ 125MB/s
                                               • 3Gb/s ~ 300MB/s
     • Average disk sec per             • Hardware and configuration can affect values
       read/write                       • Log: 1-5ms (ideally <1ms)
                                              • Minimal tolerance for exceeding 5ms
                                        • Data (OLTP): 5-20ms (ideally <10ms)
                                              • Moderate tolerance levels dependent on application
                                        • Data (DSS): 20-30ms
                                              • Higher values are not uncommon and may not be a
                                                performance issue but try to keep <50ms
     • Logical/Physical disk: Average   • Use logical value for SAN devices and corroborate with SAN
       disk queue length                  data
*Beware of polling intervals
• Problems with storage can result in many different symptoms
  across the system
• General statements like
       “…storage performance problem…”
       “…query timed out…”
  is not helpful in most modern systems – get the specifics
    HBA
    Switch
    SAN
       o   Ports
       o   Cache
       o   Disks
• Top 10 largest banks in the US
• Commercial customers system upgrade SQL Server & hardware
• Phase 1 (SQL Server) stable – tens of millions transacted daily
• Phase 2 (new hardware) nearing rollout; expecting 200%
  increase in customer base within 6-12 months then stable
• Second largest possible server in the market (at the time) –
  sized to handle ~10x growth
• Top of the line SAN – sized to handle at least 12x growth
• POC stress test consistently collapses at 3x ~ 4x load
     Blamed SQL Server scalability; suggestions to use Other database
     All other parties claim no-fault
• POC background
    Realistic workload capture/replay
    Server burn-in tests onsite (memory, CPU, peripherals)
    Theoretical maximum throughput & capacity for SAN calculated
         o   Alerted sponsor and vendor that SAN is significantly under configured –
             dismissed by vendor citing POC team lacks knowledge on brand new SAN
       Actual SAN capacity tests onsite (IOPS, MB/s, latency)
         o   Basic tests support POC team’s calculations
• Full POC tests fails with multiple indicators pointing to SAN overload
    1.5 days before SAN team finally concede they had erred in their data
      collection and interpretation (neglected to mention original design error)
    This was with detailed documentation and extensive data
• Re-designed SAN with POC team guidance supported new stress tests up to
  7x workload – load generators exhausted
• Chasing symptoms
    Query timeouts
    CXPacket waits – someone read on a blog that this was bad
    Context switching
    Compilation & re-compilations – apparently this is bad too
    Load generators’ CPUs constantly ~90%
    Application’s history of performance issues – send developers for flogging
     training
• Root cause
    Storage unable to keep up
        o   Overlapping stripes – same HDDs striped twice to create LUN
        o   Over estimated SAN cache capacities – not enough HDDs to flush cache quickly
Data                           Considerations
• Memory:Available Mbytes      • Appropriate values dependent on type of application and
• SQLServer:BufferManagerFr     volatility of query activity
  ee Pages                     • Reasonable starting points
• SQLServer:BufferManagerFr         • Available Mbytes: >50
  ee list stalls /sec                • Free pages: >640
• SQLServer:Memory                   • Free list stalls: <2
  ManagerTarget Server              • Target/total memory: dependent on total physical
  Memory (KB) and                      memory on server and number/type of workloads active
  SQLServer:Memory
  ManagerTotal Server
  Memory (KB)
• SQLServer:Buffer NodeFree   • Additional counters to monitor for NUMA based servers
  pages                        • Local node memory usage and cross-node usage dependent
• SQLServer:Buffer               on application behavior
  NodeForeign pages                 • SQL Server tries to work efficiently within local node but
• SQLServer:Buffer                     application design and/or behavior can prevent this
  NodeRemote node page
  lookups /sec
• Major retail chain running sales monitoring and inventory
  tracking application
• Regular performance degradation but non-discernable pattern
  or root causes
• System review indicates storage IO bottlenecks
    CPU utilization low to moderate
    Overall memory utilization low to moderate
• Upgraded SAN; storage IO bottlenecks removed
• Performance issues persist
• Engaged specialists for detailed system review
    Occasional storage bottlenecks still exist
    1-2 out of 4 NUMA nodes experiences memory pressure routinely
       o   Induced by application design and data patterns
             o Manually managed workload/worker threads
             o Long running transactions
             o Vastly different data volume from hundreds of sources – from few
               MB to few GB
Memory

                                          Front side bus contention increases w/ higher #CPUs

   LP 0        LP 1     LP 2      LP 3        LP 4          LP 5       LP 6    LP 7

          Symmetric Multiprocessing Architecture (SMP)




   LP 0                    LP 2                      LP 4                     LP 6


   LP 1                    LP 3                      LP 5                     LP 7
                                    Foreign
Local Memory   Memory               Memory                         Memory
   Access
                                    Access

               Non-Uniform Memory Access (NUMA)
                        Foreign memory access >> local memory access
Data                          Considerations
• Processor:%Processor Time   • Reasonable starting point
                                    • <80% but highly dependent on nature of application and
                                     spike values
• System:Processor Queue      • Reasonable starting point
  Length                            • <10 per logical CPU
Data                            Considerations

• Network Interface: Bytes      • Compare values to NIC capacity
  Total /sec                    • Again, beware of B vs b

• System Logs: network device   • Device errors are bad news
  errors
• There are many more you can monitoring and analyze
      Queries and query plans
      Filestats
      Waitstats
      Index usage
      Statistics
      Extended events
• These will help you identify the root cause of the problems
      Always use both resource utilization data with waitstats data in the context
       of the specific application
• Understanding performance
• Measuring and tracking
• Interpreting the data
• Practical solutions
• Default reports
    Build your own once you’re familiar with it
    Don’t run this too frequently
• Profiler (or SQL Trace) + Performance Monitor
      Correlate data from these tools
• Performance data warehouse or scheduled DMV snapshots
• Really useful once you get past the basics; for example:
• Top 50 statements by IO
SELECT TOP 50
        (qs.total_logical_reads + qs.total_logical_writes)
   /qs.execution_count
   as [Avg IO]
       , substring (qt.text,qs.statement_start_offset/2
       , (case when qs.statement_end_offset = -1
              then len(convert(nvarchar(max), qt.text)) * 2
          else qs.statement_end_offset end -
   qs.statement_start_offset)/2) as query_text
       , qt.dbid
       , qt.objectid
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text (qs.sql_handle) as qt
ORDER BY [Avg IO] DESC
select database_id
       , file_id
       , io_stall,io_pending_ms_ticks
       , scheduler_address
from sys.dm_io_virtual_file_stats (NULL, NULL) as t1,
     sys.dm_io_pending_io_requests as t2
where t1.file_handle = t2.io_handle
SELECT t1.session_id,
(t1.internal_objects_alloc_page_count + task_alloc) as allocated,
(t1.internal_objects_dealloc_page_count + task_dealloc) as
 deallocated, t3.sql_handle
from sys.dm_db_session_space_usage as t1,
       sys.dm_exec_requests t3,
(select session_id,
   sum(internal_objects_alloc_page_count)
                       as task_alloc,
   sum (internal_objects_dealloc_page_count) as
       task_dealloc
      from sys.dm_db_task_space_usage group by session_id) as t2
where t1.session_id = t2.session_id and t1.session_id >50
and t1.database_id = 2    --- tempdb is database_id=2
and t1.session_id = t3.session_id
order by allocated DESC
SELECT object_name(i.object_id), i.name
FROM sys.indexes i, sys.objects o
WHERE i.index_id NOT IN
       (select s.index_id
        FROM sys.dm_db_index_usage_stats s
        WHERE s.object_id=i.object_id
        AND i.index_id=s.index_id
        AND database_id = <dbid> )
AND o.type = 'U'
AND o.object_id = i.object_id
ORDER BY object_name(i.object_id) ASC
• Hardware vendors have built-in tests for most environments
• More thorough testing can be performed by vendor or certified
  partners
• Global bank with extensive SQL Server and other database
  deployments
• Major SQL Server infrastructure overhaul and consolidation
  preparation
    Largest x64 server in the market – fully loaded
    Latest enterprise SAN – almost fully loaded
    Windows Server 2008 R2 and SQL Server 2008
• HA and DR by design
    Large multi-node failover cluster
    SAN replication for DR
• System design reviewed and approved by specialists from MS
      Secondary review and approval by independent SQL Server experts
• Servers & storage setup and configured by specialists from
  hardware principal and SI partner
      All the latest & greatest from server DIMM to physical HDD
• Never ending problems with performance (right from the start)
    Unable to complete even one full cycle of performance testing
    Intermittent cluster failures, occasional system dumps, several
     bluescreens
    Some failures require full system rebuild – 2 days each time
• Unable to even hand-over to QA team after 2 months
• Chasing symptoms
    Software setup taking a really long time – blamed storage
     performance
    Bluescreens – blamed hardware, Windows
    Intermittent cluster failures – blamed use of latest and greatest of
     everything
    Never before seen such high level of instability/problems – blamed
     everybody (including performance test team for system overload)
• Root cause
      Poorly seated components in server discovered with hardware
       diagnostic tools ~ 2 days work
        o   Strip, rebuild, burn-in, full-test – handover to QA in < 2 weeks
• Performance monitoring and troubleshooting can be done by
  the average Joe
• Baseline data is critical
• Start with the basics
      Capture everything including the kitchen sink approach is very
       intrusive and usually not helpful
        o   If you must capture everything (sometimes it’s right), analyze in layers
• Do not draw conclusions based on singular data points or using
  raw data by itself
• Leverage what already exists before creating
• What and how are easy questions, why and why not are the
  tough questions
• Troubleshooting performance problems in
       SQL Server 2008
        http://msdn.microsoft.com/en-us/library/dd672789.aspx
       SQL Server 2005
        http://technet.microsoft.com/en-us/library/cc966540.aspx
• SQL Server Pre-deployment IO Best Practices
       http://sqlcat.com/whitepapers/archive/2008/10/03/running-sql-server-2008-in-a-hyper-v-
        environment-best-practices-and-performance-recommendations.aspx
• SQL Server waits and queues
       http://sqlcat.com/whitepapers/archive/2007/11/19/sql-server-2005-waits-and-queues.aspx
• SQLCAT best practices and high watermark customer experiences
       http://sqlcat.com/Default.aspx
Questions?
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should
 not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
                                                                           IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Weitere ähnliche Inhalte

Andere mochten auch

Sunpinyin and it's future
Sunpinyin and it's futureSunpinyin and it's future
Sunpinyin and it's futureRhythm Sun
 
Doc 2011101410543721
Doc 2011101410543721Doc 2011101410543721
Doc 2011101410543721Rhythm Sun
 
Adorable python
Adorable pythonAdorable python
Adorable pythonRhythm Sun
 
Doc 2011101404575913
Doc 2011101404575913Doc 2011101404575913
Doc 2011101404575913Rhythm Sun
 
First meetingwithgit
First meetingwithgitFirst meetingwithgit
First meetingwithgitRhythm Sun
 
M5. L'entrevista
M5. L'entrevistaM5. L'entrevista
M5. L'entrevistatcborges
 

Andere mochten auch (8)

Sunpinyin and it's future
Sunpinyin and it's futureSunpinyin and it's future
Sunpinyin and it's future
 
Doc 2011101410543721
Doc 2011101410543721Doc 2011101410543721
Doc 2011101410543721
 
Using vim
Using vimUsing vim
Using vim
 
Adorable python
Adorable pythonAdorable python
Adorable python
 
Linuxdeepin
LinuxdeepinLinuxdeepin
Linuxdeepin
 
Doc 2011101404575913
Doc 2011101404575913Doc 2011101404575913
Doc 2011101404575913
 
First meetingwithgit
First meetingwithgitFirst meetingwithgit
First meetingwithgit
 
M5. L'entrevista
M5. L'entrevistaM5. L'entrevista
M5. L'entrevista
 

Ähnlich wie Doc 2011101412020074

Scaling apps for the big time
Scaling apps for the big timeScaling apps for the big time
Scaling apps for the big timeproitconsult
 
(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP PerformanceBIOVIA
 
OSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithOSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithNETWAYS
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1sqlserver.co.il
 
Got Problems? Let's Do a Health Check
Got Problems? Let's Do a Health CheckGot Problems? Let's Do a Health Check
Got Problems? Let's Do a Health CheckLuis Guirigay
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity PlanningMongoDB
 
adap-stability-202310.pptx
adap-stability-202310.pptxadap-stability-202310.pptx
adap-stability-202310.pptxMichael Ming Lei
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity PlanningNorberto Leite
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]shuwutong
 
S016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710bS016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710bTony Pearson
 
Database performance monitoring:Key to seamless application performance
Database performance monitoring:Key to seamless application performanceDatabase performance monitoring:Key to seamless application performance
Database performance monitoring:Key to seamless application performanceManageEngine, Zoho Corporation
 
Progress OE performance management
Progress OE performance managementProgress OE performance management
Progress OE performance managementYassine MOALLA
 
Progress Openedge performance management
Progress Openedge performance managementProgress Openedge performance management
Progress Openedge performance managementYassine MOALLA
 
Collaborate 2011-tuning-ebusiness-416502
Collaborate 2011-tuning-ebusiness-416502Collaborate 2011-tuning-ebusiness-416502
Collaborate 2011-tuning-ebusiness-416502kaziul Islam Bulbul
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure DataTaro L. Saito
 
Average Active Sessions - OaktableWorld 2013
Average Active Sessions - OaktableWorld 2013Average Active Sessions - OaktableWorld 2013
Average Active Sessions - OaktableWorld 2013John Beresniewicz
 
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter
 

Ähnlich wie Doc 2011101412020074 (20)

Scaling apps for the big time
Scaling apps for the big timeScaling apps for the big time
Scaling apps for the big time
 
Database Management System - 2a
Database Management System - 2aDatabase Management System - 2a
Database Management System - 2a
 
(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance
 
OSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithOSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles Judith
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1
 
Got Problems? Let's Do a Health Check
Got Problems? Let's Do a Health CheckGot Problems? Let's Do a Health Check
Got Problems? Let's Do a Health Check
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 
adap-stability-202310.pptx
adap-stability-202310.pptxadap-stability-202310.pptx
adap-stability-202310.pptx
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]
 
Timesten Architecture
Timesten ArchitectureTimesten Architecture
Timesten Architecture
 
S016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710bS016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710b
 
Database performance monitoring:Key to seamless application performance
Database performance monitoring:Key to seamless application performanceDatabase performance monitoring:Key to seamless application performance
Database performance monitoring:Key to seamless application performance
 
Breaking data
Breaking dataBreaking data
Breaking data
 
Progress OE performance management
Progress OE performance managementProgress OE performance management
Progress OE performance management
 
Progress Openedge performance management
Progress Openedge performance managementProgress Openedge performance management
Progress Openedge performance management
 
Collaborate 2011-tuning-ebusiness-416502
Collaborate 2011-tuning-ebusiness-416502Collaborate 2011-tuning-ebusiness-416502
Collaborate 2011-tuning-ebusiness-416502
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Average Active Sessions - OaktableWorld 2013
Average Active Sessions - OaktableWorld 2013Average Active Sessions - OaktableWorld 2013
Average Active Sessions - OaktableWorld 2013
 
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in Telco
 

Mehr von Rhythm Sun

长连接服务 WebSocket Service
长连接服务 WebSocket Service长连接服务 WebSocket Service
长连接服务 WebSocket ServiceRhythm Sun
 
Trello workflow by @imRhythm
Trello workflow by @imRhythmTrello workflow by @imRhythm
Trello workflow by @imRhythmRhythm Sun
 
Bitcoin and retail
Bitcoin and retailBitcoin and retail
Bitcoin and retailRhythm Sun
 
Garage cafe keynote peak ji_no_video
Garage cafe keynote peak ji_no_videoGarage cafe keynote peak ji_no_video
Garage cafe keynote peak ji_no_videoRhythm Sun
 
Doc 2011101411284862
Doc 2011101411284862Doc 2011101411284862
Doc 2011101411284862Rhythm Sun
 
Doc 2010050608572429
Doc 2010050608572429Doc 2010050608572429
Doc 2010050608572429Rhythm Sun
 
火狐2011 sfd讲稿
火狐2011 sfd讲稿火狐2011 sfd讲稿
火狐2011 sfd讲稿Rhythm Sun
 
Customize snipmate
Customize snipmateCustomize snipmate
Customize snipmateRhythm Sun
 

Mehr von Rhythm Sun (11)

长连接服务 WebSocket Service
长连接服务 WebSocket Service长连接服务 WebSocket Service
长连接服务 WebSocket Service
 
Trello workflow by @imRhythm
Trello workflow by @imRhythmTrello workflow by @imRhythm
Trello workflow by @imRhythm
 
Bitcoin and retail
Bitcoin and retailBitcoin and retail
Bitcoin and retail
 
Garage cafe keynote peak ji_no_video
Garage cafe keynote peak ji_no_videoGarage cafe keynote peak ji_no_video
Garage cafe keynote peak ji_no_video
 
Beginning git
Beginning gitBeginning git
Beginning git
 
Doc 2011101411284862
Doc 2011101411284862Doc 2011101411284862
Doc 2011101411284862
 
Doc 2010050608572429
Doc 2010050608572429Doc 2010050608572429
Doc 2010050608572429
 
Outside
OutsideOutside
Outside
 
火狐2011 sfd讲稿
火狐2011 sfd讲稿火狐2011 sfd讲稿
火狐2011 sfd讲稿
 
Customize snipmate
Customize snipmateCustomize snipmate
Customize snipmate
 
Zsh
ZshZsh
Zsh
 

Kürzlich hochgeladen

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 

Kürzlich hochgeladen (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 

Doc 2011101412020074

  • 2. • This session will be delivered in English only  But I have colleagues from the local office who speak Chinese and can help with questions. • Target audience is intermediate level SQL Server professionals  SQL Server experts should feel free to attend other sessions
  • 3. DAT300-5 Performance tuning for non- beginners but not yet experts Joe Yong Senior Program Manager Microsoft Corp
  • 4. • Goals  Understand key performance influencers  Review common performance monitoring tools and techniques  How to interpret the data and determine what best practices are relevant to your environment  Review actual customer performance tuning mistakes • Non-goals  Make you a SQL Server performance monitoring/tuning expert  Deep dive on features or tools o But feel free to ask questions  Provide performance troubleshooting magic bullets  Flashy demos
  • 5. • Plenty of content for beginners • Plenty of content for expert level • Not everybody can jump from A to Z in one step • A lot of the content in this space is too cryptic and confusing for the average Joe • This Joe does not believe you need to be an expert to find performance problems  Though you might need help fixing them o But at least you know what needs to be fixed  And you’ll learn to spot real experts vs. those who read a lot of blogs and repeat what they read
  • 6. Understanding performance • Measuring and tracking • Interpreting the data • Practical solutions
  • 7. • Understanding performance  What is performance  Performance influencers • Measuring and tracking • Interpreting the data • Practical solutions
  • 8. • Fulfilling workload demands within a period deemed acceptable or agreed upon by the user1  Guaranteed < 2s business transaction response time  130,000 database transactions per second  Loading >1 billion rows an hour  18GB/s storage subsystem  75seconds RTO for SQL Server in a Failover Cluster  Backup 2TB per hour over the network  Restore 1.3TB database in under 30 minutes  DBCC DBREINDEX on a 3 billion row table (ok, probably not a good idea) 1 DBAs are just another class of user
  • 9. • Scalability is the level of performance maintained as workload influencers increase  ERP application o 5 users: 2 seconds query response time o 700 users: 2.3 seconds query response time  Generate year end report for o 300GB database: 85seconds o 1.5TB database: 30 minutes
  • 10. • Understanding performance  What is performance  Performance influencers • Measuring and tracking • Interpreting the data • Practical solutions
  • 11. • User workload  DBAs are a special class of users • System & infrastructure resources  Server, storage, interconnects, networks • Optimization  Workload and resources sized and configured appropriately for your performance requirements at your require level (scale) • Contention  A performance blocker that highlights a system’s ability to scale (or not) • Environment  Mostly stuff beyond your control like data center, ISP, workload source (e.g. end- user PC)
  • 12. • Understanding performance • Measuring and tracking • Interpreting the data • Practical solutions
  • 13. • Measures will vary depending on user workload  Accounting will have very different performance metrics from an online trading – financial analyzer metrics are nothing like either of those • Different perspective of the same workload will have different measures  Retail POS system o Cashier cares about items being scanned and registered quickly o Store manager cares about sales totals at any time o Loss prevention specialist wants to know quickly if items registered in POS equals to decrease in automated inventory manager
  • 14. • Workload: measures will vary depending on the workload type  Accounting will have very different performance metrics from online trading – financial analyzer metrics are nothing like either of those • User: different perspective of the same application will have different measures  Library information system o Image recording/scanning clerk cares about fast BLOB inserts o Researcher needs fast searches on large data sets across multiple formats  Retail POS system o Cashier cares about items being scanned and registered quickly o Store manager cares about sales totals at any time o Loss prevention specialist wants to know quickly if items registered in POS equals to decrease in automated inventory manager
  • 15. • Different application admins/owners track different performance metrics  Not always aligned with what users care about • DBAs have some basic metrics that are “almost” universally helpful  Resource utilization  Workload response time  Contention and waits • Important: performance metrics taken at any point in time without a baseline reference is often not useful
  • 16. • Establishes “known behavior”  Resource utilization levels for a given workload and scale o Order entry system with 500 concurrent users consumes x GB RAM, y% sustained average CPU utilization – certified by ISV to scale almost linearly up to 11,000 concurrent users  User workload query response times o Monday 8am-11am: 2-5 seconds o Friday 12pm-4pm: 5-12 seconds o Other times: 1-3 seconds • Documents acceptable performance – your performance SLA
  • 17. • Networks never have problems  We use multiple redundant 10Gb/s switches and average reliability is around 99.98%, what network problem? • Storage is never the problem  We have two 2Gb/s ports from your server to the SAN, you are only using 400MB/s, go back and tune your SQL queries • End users can “instinctively” tell if performance is bad  Why is it very slow today? It is usually a lot faster. You must have done something wrong during the upgrade. • End users’ PCs/laptops are always working fine and nothing has changed (or ever changes)
  • 18. • Databases are guilty until proven innocent • Without documented performance metrics and baselines you just can’t win • Especially if you have just performed a major operation (e.g. just upgraded to SQL Server 2008 R2) • Difficult to convince other parties to investigate their areas (see previous slide)
  • 19. Major medical health care provider • 3 months upgrade planning and testing • Successful upgrade over weekend • Normal operations Monday morning; 6 phone calls and multiple emails starting 130pm  Complaints about very slow queries, timeouts, etc…  Blames DBAs for botching the upgrade  Managers and senior managers put on CC
  • 20. • DBA team reviews database server load and resource utilization  Server not breaking a sweat  Network operations center (NOC) says network is healthy  Storage monitor shows light loads  DBAs run tests from local clients and verified performance metrics very close to baseline o Blocker scripts, long running queries, etc… all show nothing suspicious • Eliminated database, server, storage – focus on network • 2.5 hours later, faulty switch at Mexico office discovered  NOC still says network is ok (apparently local network equipment problems don’t count)  Other local US user problems not related
  • 21. Storage • Memory • CPU • Network
  • 22. • This is the physical foundation of your database • 80% deployments (that I’ve seen or worked on) have sub- optimal or incorrectly configured storage  Including those certified by a storage vendor • Always test (SQLIO tool) before deployment • Many counters and waitstats data available to monitor • Start with the basics  IOPS  Throughput  Latency  Queuing
  • 23. Metric Data source Considerations IOPS Performance Monitor • Number of I/O requests per second • Average disk read/write per • Type of disk and size can affect this greatly second • E.g. 2.5” SAS can sustain ~180 IOPS, 3.5” 15k rpm SCSI sustains ~130 IOPS Throughput Performance Monitor • Bytes throughput per second • Disk read/write bytes per • Beware of B vs b second • 1Gb/s ~ 125MB/s • 3Gb/s ~ 300MB/s Latency Performance Monitor • Latency of I/O operations • Average disk sec per • Typically1 read/write • Log: 1-5ms (ideally <1ms) • Data (OLTP): 5-20ms (ideally <10ms) • Data (DSS): 20-30ms Queuing Performance Monitor • Number of I/O requests that cannot be • Logical/Physical disk: serviced immediately Average disk queue length • Value is measured across all disks for direct attached storage 1 SQL Server Pre-deployment IO Best Practices
  • 24. • Next in-line in the list of common performance bottlenecks • Difficult to really determine how much is enough given SQL Server’s memory management approach  Measure in reverse; look for pressure • Further complicated with NUMA servers  One or more nodes may have severe memory pressure while other nodes are idle • Often result in performance problem symptoms in other areas • Start with the basics  Available free memory  Memory pressure hints
  • 25. Metric Data source Considerations Available free Performance Monitor • Tracks available system memory for use memory • Memory:Available Mbytes • Total number of pages on the free list • SQLServer:BufferManagerFr ee Pages Memory pressure Performance Monitor • Tracks potential memory pressure by hints • SQLServer:BufferManagerFr measuring number of free page requests ee list stalls /sec that are not granted immediately • SQLServer:Memory • Track what SQL Server thinks it could use ManagerTarget Server vs what it current is able to use Memory (KB) and • Track memory grants pending SQLServer:Memory ManagerTotal Server Memory (KB) DMV • Sys.dm_os_wait_stats
  • 26. • Often recommended in public forums, blogs, events as counters to determine if memory is adequate  Cache hit ratio >90% o Highly workload dependent – if the queries do not touch the same data, hit ratio will be low but this is not a clear indicator of where the performance problem might be  Page life expectancy >300 o Can be workload dependent and behavior is similar to cache hit ratios  Memory grants pending <2 o Theoretically tracks memory requests pending but this can be caused by contention (e.g. blocking), not just lack of memory • Correlate multiple data point; don’t just measure one
  • 27. • Sustained high activity may or may not indicate bottlenecks or problems • Hardware faults can sometimes result in high CPU utilization and/or context switches • Start with the basics  Overall utilization  CPU pressure hints
  • 28. Metric Data source Considerations Overall utilization Performance Monitor • Amount of time the CPUs are actually • Processor:%Processor Time doing work across the entire system • Value is for all CPUs in the system CPU pressure Performance Monitor • Total number of threads awaiting CPU hints • System:Processor Queue resource Length • Value is for all CPUs in the system
  • 29. • Usually one of the easiest resources to manage for performance • Start with the basics  Overall utilization  Device errors • Be careful with some commonly recommended counters  Output queue length o Dependent on operating system implementation (E.g. always zero in Vista)
  • 30. Metric Data source Considerations Overall utilization Performance Monitor • Measures volume of data sent and • Network Interface: Bytes received by each network adapter or Total /sec cumulative for all NICs in the system Device errors Windows Event Log • Non-critical hardware faults are logged • System Logs: network device here errors • Critical hardware faults can result in blue- screen
  • 31. • Understanding performance • Measuring and tracking • Interpreting the data • Practical solutions
  • 32. Data Considerations • Average disk read/write per • Ensure system requirements are less than storage capacity second • If using SAN, ensure SAN cache can flush to disk faster then • Disk read/write bytes per workload requests fills up SAN cache second • Beware of B vs b for bytes/second • 1Gb/s ~ 125MB/s • 3Gb/s ~ 300MB/s • Average disk sec per • Hardware and configuration can affect values read/write • Log: 1-5ms (ideally <1ms) • Minimal tolerance for exceeding 5ms • Data (OLTP): 5-20ms (ideally <10ms) • Moderate tolerance levels dependent on application • Data (DSS): 20-30ms • Higher values are not uncommon and may not be a performance issue but try to keep <50ms • Logical/Physical disk: Average • Use logical value for SAN devices and corroborate with SAN disk queue length data *Beware of polling intervals
  • 33. • Problems with storage can result in many different symptoms across the system • General statements like “…storage performance problem…” “…query timed out…” is not helpful in most modern systems – get the specifics  HBA  Switch  SAN o Ports o Cache o Disks
  • 34. • Top 10 largest banks in the US • Commercial customers system upgrade SQL Server & hardware • Phase 1 (SQL Server) stable – tens of millions transacted daily • Phase 2 (new hardware) nearing rollout; expecting 200% increase in customer base within 6-12 months then stable • Second largest possible server in the market (at the time) – sized to handle ~10x growth • Top of the line SAN – sized to handle at least 12x growth • POC stress test consistently collapses at 3x ~ 4x load  Blamed SQL Server scalability; suggestions to use Other database  All other parties claim no-fault
  • 35. • POC background  Realistic workload capture/replay  Server burn-in tests onsite (memory, CPU, peripherals)  Theoretical maximum throughput & capacity for SAN calculated o Alerted sponsor and vendor that SAN is significantly under configured – dismissed by vendor citing POC team lacks knowledge on brand new SAN  Actual SAN capacity tests onsite (IOPS, MB/s, latency) o Basic tests support POC team’s calculations • Full POC tests fails with multiple indicators pointing to SAN overload  1.5 days before SAN team finally concede they had erred in their data collection and interpretation (neglected to mention original design error)  This was with detailed documentation and extensive data • Re-designed SAN with POC team guidance supported new stress tests up to 7x workload – load generators exhausted
  • 36. • Chasing symptoms  Query timeouts  CXPacket waits – someone read on a blog that this was bad  Context switching  Compilation & re-compilations – apparently this is bad too  Load generators’ CPUs constantly ~90%  Application’s history of performance issues – send developers for flogging training • Root cause  Storage unable to keep up o Overlapping stripes – same HDDs striped twice to create LUN o Over estimated SAN cache capacities – not enough HDDs to flush cache quickly
  • 37. Data Considerations • Memory:Available Mbytes • Appropriate values dependent on type of application and • SQLServer:BufferManagerFr volatility of query activity ee Pages • Reasonable starting points • SQLServer:BufferManagerFr • Available Mbytes: >50 ee list stalls /sec • Free pages: >640 • SQLServer:Memory • Free list stalls: <2 ManagerTarget Server • Target/total memory: dependent on total physical Memory (KB) and memory on server and number/type of workloads active SQLServer:Memory ManagerTotal Server Memory (KB) • SQLServer:Buffer NodeFree • Additional counters to monitor for NUMA based servers pages • Local node memory usage and cross-node usage dependent • SQLServer:Buffer on application behavior NodeForeign pages • SQL Server tries to work efficiently within local node but • SQLServer:Buffer application design and/or behavior can prevent this NodeRemote node page lookups /sec
  • 38. • Major retail chain running sales monitoring and inventory tracking application • Regular performance degradation but non-discernable pattern or root causes • System review indicates storage IO bottlenecks  CPU utilization low to moderate  Overall memory utilization low to moderate • Upgraded SAN; storage IO bottlenecks removed • Performance issues persist
  • 39. • Engaged specialists for detailed system review  Occasional storage bottlenecks still exist  1-2 out of 4 NUMA nodes experiences memory pressure routinely o Induced by application design and data patterns o Manually managed workload/worker threads o Long running transactions o Vastly different data volume from hundreds of sources – from few MB to few GB
  • 40. Memory Front side bus contention increases w/ higher #CPUs LP 0 LP 1 LP 2 LP 3 LP 4 LP 5 LP 6 LP 7 Symmetric Multiprocessing Architecture (SMP) LP 0 LP 2 LP 4 LP 6 LP 1 LP 3 LP 5 LP 7 Foreign Local Memory Memory Memory Memory Access Access Non-Uniform Memory Access (NUMA) Foreign memory access >> local memory access
  • 41. Data Considerations • Processor:%Processor Time • Reasonable starting point • <80% but highly dependent on nature of application and spike values • System:Processor Queue • Reasonable starting point Length • <10 per logical CPU
  • 42. Data Considerations • Network Interface: Bytes • Compare values to NIC capacity Total /sec • Again, beware of B vs b • System Logs: network device • Device errors are bad news errors
  • 43. • There are many more you can monitoring and analyze  Queries and query plans  Filestats  Waitstats  Index usage  Statistics  Extended events • These will help you identify the root cause of the problems  Always use both resource utilization data with waitstats data in the context of the specific application
  • 44. • Understanding performance • Measuring and tracking • Interpreting the data • Practical solutions
  • 45. • Default reports  Build your own once you’re familiar with it  Don’t run this too frequently • Profiler (or SQL Trace) + Performance Monitor  Correlate data from these tools • Performance data warehouse or scheduled DMV snapshots
  • 46. • Really useful once you get past the basics; for example: • Top 50 statements by IO
  • 47. SELECT TOP 50 (qs.total_logical_reads + qs.total_logical_writes) /qs.execution_count as [Avg IO] , substring (qt.text,qs.statement_start_offset/2 , (case when qs.statement_end_offset = -1 then len(convert(nvarchar(max), qt.text)) * 2 else qs.statement_end_offset end - qs.statement_start_offset)/2) as query_text , qt.dbid , qt.objectid FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text (qs.sql_handle) as qt ORDER BY [Avg IO] DESC
  • 48. select database_id , file_id , io_stall,io_pending_ms_ticks , scheduler_address from sys.dm_io_virtual_file_stats (NULL, NULL) as t1, sys.dm_io_pending_io_requests as t2 where t1.file_handle = t2.io_handle
  • 49. SELECT t1.session_id, (t1.internal_objects_alloc_page_count + task_alloc) as allocated, (t1.internal_objects_dealloc_page_count + task_dealloc) as deallocated, t3.sql_handle from sys.dm_db_session_space_usage as t1, sys.dm_exec_requests t3, (select session_id, sum(internal_objects_alloc_page_count) as task_alloc, sum (internal_objects_dealloc_page_count) as task_dealloc from sys.dm_db_task_space_usage group by session_id) as t2 where t1.session_id = t2.session_id and t1.session_id >50 and t1.database_id = 2 --- tempdb is database_id=2 and t1.session_id = t3.session_id order by allocated DESC
  • 50. SELECT object_name(i.object_id), i.name FROM sys.indexes i, sys.objects o WHERE i.index_id NOT IN (select s.index_id FROM sys.dm_db_index_usage_stats s WHERE s.object_id=i.object_id AND i.index_id=s.index_id AND database_id = <dbid> ) AND o.type = 'U' AND o.object_id = i.object_id ORDER BY object_name(i.object_id) ASC
  • 51. • Hardware vendors have built-in tests for most environments • More thorough testing can be performed by vendor or certified partners
  • 52. • Global bank with extensive SQL Server and other database deployments • Major SQL Server infrastructure overhaul and consolidation preparation  Largest x64 server in the market – fully loaded  Latest enterprise SAN – almost fully loaded  Windows Server 2008 R2 and SQL Server 2008 • HA and DR by design  Large multi-node failover cluster  SAN replication for DR
  • 53. • System design reviewed and approved by specialists from MS  Secondary review and approval by independent SQL Server experts • Servers & storage setup and configured by specialists from hardware principal and SI partner  All the latest & greatest from server DIMM to physical HDD • Never ending problems with performance (right from the start)  Unable to complete even one full cycle of performance testing  Intermittent cluster failures, occasional system dumps, several bluescreens  Some failures require full system rebuild – 2 days each time • Unable to even hand-over to QA team after 2 months
  • 54. • Chasing symptoms  Software setup taking a really long time – blamed storage performance  Bluescreens – blamed hardware, Windows  Intermittent cluster failures – blamed use of latest and greatest of everything  Never before seen such high level of instability/problems – blamed everybody (including performance test team for system overload) • Root cause  Poorly seated components in server discovered with hardware diagnostic tools ~ 2 days work o Strip, rebuild, burn-in, full-test – handover to QA in < 2 weeks
  • 55. • Performance monitoring and troubleshooting can be done by the average Joe • Baseline data is critical • Start with the basics  Capture everything including the kitchen sink approach is very intrusive and usually not helpful o If you must capture everything (sometimes it’s right), analyze in layers • Do not draw conclusions based on singular data points or using raw data by itself • Leverage what already exists before creating • What and how are easy questions, why and why not are the tough questions
  • 56. • Troubleshooting performance problems in  SQL Server 2008 http://msdn.microsoft.com/en-us/library/dd672789.aspx  SQL Server 2005 http://technet.microsoft.com/en-us/library/cc966540.aspx • SQL Server Pre-deployment IO Best Practices  http://sqlcat.com/whitepapers/archive/2008/10/03/running-sql-server-2008-in-a-hyper-v- environment-best-practices-and-performance-recommendations.aspx • SQL Server waits and queues  http://sqlcat.com/whitepapers/archive/2007/11/19/sql-server-2005-waits-and-queues.aspx • SQLCAT best practices and high watermark customer experiences  http://sqlcat.com/Default.aspx
  • 58. © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.