SlideShare ist ein Scribd-Unternehmen logo
1 von 54
Databases
For storage People



    Thomas Kejser
  thomas@kejser.org
 http://blog.kejser.org
    @thomaskejser
Agenda

• The Microsoft Database Stack
• Hard problems the database solves
• File layout and I/O pattern
  • Data and Log Files
  • Analysis Services Files
  • TempDb and other system databases


• Installation of SQL
• Q&A
The SQL Server Stack
Product Portfolio
• SQL Server (aka: Core Engine)
• SQL Server Analysis Services (SSAS)
    • Tabular
    • Multi Dimensional
•   SQL Server Service Broker (SSB)
•   SQL Server Integration Services (SSIS)
•   SQL Server Reporting Services (SSRS)
•   SQL Server Data Quality Tools
•   SQL Server Master Data Services
•   SQL Server Parallel Data Warehouse
•   .NET stuff…
•   Various Excel plug-ins

• A “full” stack!
What Type of Workload?



                       Big   Simulation                ETL
     Data Returned




                     Small     OLTP                BI/DW

                                Small                  Big

                                        Data Touched
A Template OLTP System



                                   “App” tier
.NET          .NET   .NET   .NET   Web Server Windows
                                   License




                                   Database Tier
                                   Web/Core Licensing
                                   2 or 4 sockets
       Core
A Template Data Warehouse


                                                               SSAS
SSIS


                   Core
SSIS
                                                        SSAS


SSIS


SSIS                                                    Core
                   Core

                                                                SSRS




Integration Tier          “Enterprise” Warehouse Tier   BI / Presentation / Cubes
     Blades                     Large machines
 CPU Intensive                 VERY CPU greedy
                                                             Medium Servers
    low IOPS                VERY I/O greedy (GB/sec)       Can be IOPS greedy
Fast Track Data Warehouses
A Template MPP Warehouse



SSIS
                                         SSAS


SSIS


SSIS
                                         Core
SSIS


                                                  Data Marts
                                                (The “spokes”)




             Enterprise Warehouse Tier
               Appliance (The “hub”)
Management Tools you Need to Know
Pre 2012                    2012

Management Studio           (Management Studio)
(AKA: Enterprise Manager)

Project Data Dude           Data Tools

Configuration Manager       Configuration Manager

SQL Server Profiler         Xevent Tracing

Reporting Services Config   Reporting Services Config
Manager                     Manager
Sp_configure                Sp_configure / ALTER SERVER
Hard problems
databases
help you solve
Query Plan Generation




                        Find all parts bought by
                             Thomas Kejser
Express Problem, Auto get solutions
To do this well, we need Statistics




SQL Did it


     I did it 

              THIS is not accurate and it will never be!
… and we Need Indexes

                   B+ Tree
95% of all database problems* are caused by:

A) Poor indexing

B) Wrong Statistics

A) Badly written queries

B) All of the above


                           * Low estimate, trying to be nice to humanity
And most of the time,
    there is nothing you can do about that*




… which is where storage come into the picture



                            * AKA: “Craplications”, technical term
Two types of bad Queries
• The CPU Bound
  • Have to help rewrite                   C
                                                L
                                                2    L
  • Better storage does not help           C
                                                L
                                                2
                                                     3
  • But DBAs may still believe it is I/O       CPU




• The I/O bound
  • Can throw NAND at it
  • I will show you how to diagnose



• DBA people like to talk about this like…
Response time = Service Time + Wait Time

                    Algorithms      “Bottlenecks”
                        and
                  Data Structures
When Speaking about Service Time

• We normally end up talking about bad
  join plans

• Joins come in three flavours
  • Merge
  • Hash
  • Loop
Merge Join


         m row result                      n row result
              1                                 1
              1                                 2
              2                                 3




                                                          Sorted
              3                                 4
              4
Sorted




                                                43




              43


                        Complexity: O(m + n)
Hash Join


   m row result                                  n row join table
        1
        43
        13
        3


                               Hash(1)




                              n row hash table



        7


                  Complexity: O(m + 2n)
Loop Join


   m row result
        1
        43                                         Log(n) reads
        13
        3

                                    n row B-tree




        7


                  Complexity: O(m * log(n))
When Hash Joins hurt you




                       Runtime (seconds)
      30



      25



      20



      15



      10



      5
                                          Spill Zone!
      0
400        350   300    250         200          150    100   50   0
                              Hash Memory (MB)
Join Hints




             B probed, lower table in join
          (second table in join statement)




       A probed, upper table in join
       (first table in join statement)
                   Just the way it is …
Why is it so hard to get joins right?
                                Time

                                          Loop Join

Merge Join




                                        Hash Join
                   n




                                m
No-one has been
able to get joins consistently right!


           P = NP ?
Getting I/O right…


                             Language Processing (Parse/Bind)


              Query Optimization
                                                       Statement/Batch Execution
            (Plan Generation, View
          Matching, Statistics, Costing)
                                                            Query Execution
                                                       (Query Operators, Memory
            Plan Cache Management
                                                          Grants, Parallelism)

     Storage Engine (Access Methods, Database Page Cache, Locking, Transactions, …)


   SQL-OS (Schedulers, Buffer Pool, Memory Management, Synchronization Primitives, …)
The Storage Engines makes I/O Transparent!


                  Rest of engine
                 only sees the API



                   Storage Engine




 RAM                       Storage
Two Different Philosophies

Primitive            SQL Server                  Analysis Services

Scheduling           Voluntary Yield, User       Kernel mode, Preemptive
                     mode

I/O Engine           Dedicated I/O stack         Windows Buffered I/O


Waiting / Spinning   SQLOS Primitives            Windows


Memory Management    SQLOS / Storage Engine      Windows Paging


Serialisation        TDS special purpose         XML


Network              Fully optimizable, async,   Windows primitives,
                     affinitized engine          blocking
SQL Server is different

• Primitives are a different beast than
  Windows
• Scale issues are generally specific to the
  core, not Windows
• Exposes own “belly of the beast”
  profiling
• SQL Team build their own
  primitives, often better than Windows
  core
• Highest throughput app on
  Windows, drives all the scale stuff there
Analysis Services is “just another App”

• Analysis Services relies fully on
  Windows primitives
• You can profile it by looking at how
  Windows behaves
• Upgrades to Windows are more likely to
  help it
• No TPC style benchmarks…
A is for Atomic

    LINEITEM        LINEITEM       LINEITEM

 ORDER_KEY
                  ORDER_KEY      ORDER_KEY
 PART_KEY
                  PART_KEY       PART_KEY
                  COMMITDATE     COMMITDATE
 COMMITDATE
                  QUANTITY       QUANTITY
 QUANTITY




     ORDER           ORDER           ORDER

 ORDER_KEY        ORDER_KEY      ORDER_KEY
 CUSTOMER_KEY     CUSTOMER_KEY   CUSTOMER_KEY
C is for Consistency

    LINEITEM           LINEITEM     LINEITEM

                                  ORDER_KEY
 ORDER_KEY         COMMITDATE     PART_KEY
 = 42              = 2012-02-30   COMMITDATE
                                  QUANTITY




     ORDER              ORDER        ORDER

 ORDER_KEY                        ORDER_KEY
 != 42             ORDER_KEY      CUSTOMER_KEY
I is for Isolation

SELECT @LastTransaction_ID =
LastTransaction_ID
FROM ATM
WHERE ATM_ID = 13
                                    SELECT @LastTransaction_ID =
                                    LastTransaction_ID
                                    FROM ATM
(@LastTransaction_ID = 42)          WHERE ATM_ID = 13

                                    (@LastTransaction_ID = 42)
SET @ID = @LastTransaction_ID + 1   SET @ID = @LastTransaction_ID + 1

UPDATE ATM                          UPDATE ATM
SET @LastTransaction_ID = @ID       SET @LastTransaction_ID = @ID
WHERE ATM_ID = 13                   WHERE ATM_ID = 13
D is for Durability


                Do Transactions
                Do Transactions
                 Do Transactions
                   Do Transactions
                Do Transactions
                       Ack
                  Do Transactions
                      Ack
                        Ack
                Do Transactions
                         Ack
                Do Transactions
               Do Transactions
                      Ack
                Do Transactions
                        Ack
                      Ack
                      Ack
                     Ack
                      Ack
Summary – Databases Help You

•   Do complex operations in optimal time
•   …at high parallelism
•   Optimise I/O pattern
•   Be ACID compliant
•   Store stuff safely…



• noSQL/Big Data systems trade off >0 of
  these to get more of the others
System Databases

• Server won’t start without:
  • master
  • mssqlsystemressource
• System CAN start, but wont work well
  • model
  • msdb
• System will start under special
  conditions
  • tempdb
Master and mssqlsystemressources

• Together, contain all system information
• Mssqlsystemressource
  • Lives under: MSSQLBinn
  • Contains all system code
  • Hidden by default
• Master
  • Lives under: MSSQLDATA


• You should move these to a safe
  location
Disaster: Master or systemResources

• You lost:
  • All passwords and server logins
  • All system wide certificates (You may be
    unable to decrypt!)
  • All System procedures you created
• You are not 100% screwed, but you are
  in for a long night
  • Both can be rebuild (empty) during server
    start
  • …Or restored from backup
    • if you remembered to take one
  • Need /f and /T3608 to get back up
Database: model

• Every new created
  database is cloned
  from this
• Loss is not
  catastrophic
  • Copy from healthy
    machine
• Tempdb can’t boot
  without it
• Lives with master
Database tempdb

• Database “swap file”

• Does not survive
  restarts

• No Durability
  guarantees here

• Fast I/O helps
Loss of Tempdb…is…Temporary

• Will rebuild itself after instance restart

• Configuration is stored in master

• Clones from msdb

• Nearly every installation must change
  defaults

• If tempdb cannot be created, server will
  only start from command line
User Databases and Failure

• A database consists of
  • At least one Transaction Log File
  • The PRIMARY filegroup
  • At least one data file in PRIMARY
• If any of these are lost, the database is
  dead
  • You can in some cases bring a database
    without a transaction log back alive
  • But typically with data loss…
• Lesson: carefully protect all of
  above
What is in the Files?

      PRIMARY           Transaction Log

       Primary File
                           Headers
      GAM / SGAM

        PFS Map
                             VLF
         Metadata
     (system objects)


                             VLF
        User Data




                             VLF
Data Files

• Regular files in NTFS
• Secured
• Files can Auto Grow as needed
  • Risky
  • File Imbalance
How are Database Files Created?

• ALTER or CREATE
  DATABASE
• Transaction log file
  always zeroed out
  • This looks super cool
    on FusionIo by the
    way
• Data files MAY be
  zeroed out
  • Depends in privileges
  • May use instant file
    init
Filegroups

• Filegroups (one         PRIMARY
  word) are containers
  of files                 User Data

• Used to group similar
  data together            DATA
• Oracle people know       User Data
  this concept as a
  table-spaces             User Data

• Files inside FG are
  accessed/allocated       User Data

  round-robin              User Data
Reclaiming/Moving Space in Files

• DBCC SHRINKFILE

• REBUILD data
DBCC SHRINKFILE




        7       8
        5       6
        3       4
        1       2



      LUN 1   LUN 2   LUN 3   LUN 4
How to reclaim space the right way…


                          New Filegroup

          7       8         7       8

          5       6         5       6

          3       4         3       4

          1       2         1       2



        LUN 1   LUN 2     LUN 3   LUN 4




      ALTER INDEX Foo WITH
      REBUILD, SORT_IN_TEMPDB = ON
PFS Contention

• Too few PFS maps can
  lead to latch
                                 File
  contention                  PFS Map
• Diagnosed in:
     sys.dm_os_waiting_tas     User Data
                             (8000 pages)
ks
                              PFS Map

• Look for
  PAGELATCH_UP                 User Data
                             (8000 pages)
I/O DBA people worry about

• DBAs typically diagnose issues with
  waits stats
• Issues they look for:
  •   WRITELOG/LOGBUFFER waits
  •   PAGELATCHIO_<X> waits
  •   BACKUPIO waits
  •   IO_COMPLETION/ASYNC_IO_COMPLETION
Places you need to know about

• Diagnosing ressource waits:
  • sys.dm_os_wait_stats
  • Post 2008R2 – can use Xevents (harder)
• More detail in:
  • sys.dm_io_virtual_filestats(NULL, NULL)
  • Confirm waits here!
• SQL Server errors in log file:

Weitere ähnliche Inhalte

Was ist angesagt?

Data model for analysis of scholarly documents in the MapReduce paradigm
Data model for analysis of scholarly documents in the MapReduce paradigm Data model for analysis of scholarly documents in the MapReduce paradigm
Data model for analysis of scholarly documents in the MapReduce paradigm
Adam Kawa
 
Db tech show - hivemall
Db tech show - hivemallDb tech show - hivemall
Db tech show - hivemall
Makoto Yui
 
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
Romeo Kienzler
 

Was ist angesagt? (20)

Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonBig Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive Comparison
 
HDFS
HDFSHDFS
HDFS
 
Data model for analysis of scholarly documents in the MapReduce paradigm
Data model for analysis of scholarly documents in the MapReduce paradigm Data model for analysis of scholarly documents in the MapReduce paradigm
Data model for analysis of scholarly documents in the MapReduce paradigm
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduce
 
Db tech show - hivemall
Db tech show - hivemallDb tech show - hivemall
Db tech show - hivemall
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Intro To Hadoop
Intro To HadoopIntro To Hadoop
Intro To Hadoop
 
Hive sq lfor-hadoop
Hive sq lfor-hadoopHive sq lfor-hadoop
Hive sq lfor-hadoop
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
SQLBits XI - ETL with Hadoop
SQLBits XI - ETL with HadoopSQLBits XI - ETL with Hadoop
SQLBits XI - ETL with Hadoop
 
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Jan 2013 HUG: Cloud-Friendly Hadoop and HiveJan 2013 HUG: Cloud-Friendly Hadoop and Hive
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
 
Apache hadoop
Apache hadoopApache hadoop
Apache hadoop
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
 
Understanding hdfs
Understanding hdfsUnderstanding hdfs
Understanding hdfs
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Bringing back the excitement to data analysis
Bringing back the excitement to data analysisBringing back the excitement to data analysis
Bringing back the excitement to data analysis
 

Ähnlich wie Databases for Storage Engineers

Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Charley Hanania
 
Implementing the Databese Server session 02
Implementing the Databese Server session 02Implementing the Databese Server session 02
Implementing the Databese Server session 02
Guillermo Julca
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용
Byeongweon Moon
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
Heriyadi Janwar
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Cal Henderson
 

Ähnlich wie Databases for Storage Engineers (20)

Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at night
 
Architecture by Accident
Architecture by AccidentArchitecture by Accident
Architecture by Accident
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
 
Sql Health in a SharePoint environment
Sql Health in a SharePoint environmentSql Health in a SharePoint environment
Sql Health in a SharePoint environment
 
The Adventure: BlackRay as a Storage Engine
The Adventure: BlackRay as a Storage EngineThe Adventure: BlackRay as a Storage Engine
The Adventure: BlackRay as a Storage Engine
 
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
 
Implementing the Databese Server session 02
Implementing the Databese Server session 02Implementing the Databese Server session 02
Implementing the Databese Server session 02
 
SQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataSQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big Data
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
Sql Server 2005 Memory Internals
Sql Server 2005 Memory InternalsSql Server 2005 Memory Internals
Sql Server 2005 Memory Internals
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Kürzlich hochgeladen (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

Databases for Storage Engineers

  • 1. Databases For storage People Thomas Kejser thomas@kejser.org http://blog.kejser.org @thomaskejser
  • 2. Agenda • The Microsoft Database Stack • Hard problems the database solves • File layout and I/O pattern • Data and Log Files • Analysis Services Files • TempDb and other system databases • Installation of SQL • Q&A
  • 4. Product Portfolio • SQL Server (aka: Core Engine) • SQL Server Analysis Services (SSAS) • Tabular • Multi Dimensional • SQL Server Service Broker (SSB) • SQL Server Integration Services (SSIS) • SQL Server Reporting Services (SSRS) • SQL Server Data Quality Tools • SQL Server Master Data Services • SQL Server Parallel Data Warehouse • .NET stuff… • Various Excel plug-ins • A “full” stack!
  • 5. What Type of Workload? Big Simulation ETL Data Returned Small OLTP BI/DW Small Big Data Touched
  • 6. A Template OLTP System “App” tier .NET .NET .NET .NET Web Server Windows License Database Tier Web/Core Licensing 2 or 4 sockets Core
  • 7. A Template Data Warehouse SSAS SSIS Core SSIS SSAS SSIS SSIS Core Core SSRS Integration Tier “Enterprise” Warehouse Tier BI / Presentation / Cubes Blades Large machines CPU Intensive VERY CPU greedy Medium Servers low IOPS VERY I/O greedy (GB/sec) Can be IOPS greedy
  • 8. Fast Track Data Warehouses
  • 9. A Template MPP Warehouse SSIS SSAS SSIS SSIS Core SSIS Data Marts (The “spokes”) Enterprise Warehouse Tier Appliance (The “hub”)
  • 10. Management Tools you Need to Know Pre 2012 2012 Management Studio (Management Studio) (AKA: Enterprise Manager) Project Data Dude Data Tools Configuration Manager Configuration Manager SQL Server Profiler Xevent Tracing Reporting Services Config Reporting Services Config Manager Manager Sp_configure Sp_configure / ALTER SERVER
  • 12. Query Plan Generation Find all parts bought by Thomas Kejser
  • 13. Express Problem, Auto get solutions
  • 14. To do this well, we need Statistics SQL Did it I did it  THIS is not accurate and it will never be!
  • 15. … and we Need Indexes B+ Tree
  • 16. 95% of all database problems* are caused by: A) Poor indexing B) Wrong Statistics A) Badly written queries B) All of the above * Low estimate, trying to be nice to humanity
  • 17. And most of the time, there is nothing you can do about that* … which is where storage come into the picture * AKA: “Craplications”, technical term
  • 18. Two types of bad Queries • The CPU Bound • Have to help rewrite C L 2 L • Better storage does not help C L 2 3 • But DBAs may still believe it is I/O CPU • The I/O bound • Can throw NAND at it • I will show you how to diagnose • DBA people like to talk about this like…
  • 19. Response time = Service Time + Wait Time Algorithms “Bottlenecks” and Data Structures
  • 20. When Speaking about Service Time • We normally end up talking about bad join plans • Joins come in three flavours • Merge • Hash • Loop
  • 21. Merge Join m row result n row result 1 1 1 2 2 3 Sorted 3 4 4 Sorted 43 43 Complexity: O(m + n)
  • 22. Hash Join m row result n row join table 1 43 13 3 Hash(1) n row hash table 7 Complexity: O(m + 2n)
  • 23. Loop Join m row result 1 43 Log(n) reads 13 3 n row B-tree 7 Complexity: O(m * log(n))
  • 24. When Hash Joins hurt you Runtime (seconds) 30 25 20 15 10 5 Spill Zone! 0 400 350 300 250 200 150 100 50 0 Hash Memory (MB)
  • 25. Join Hints B probed, lower table in join (second table in join statement) A probed, upper table in join (first table in join statement) Just the way it is …
  • 26. Why is it so hard to get joins right? Time Loop Join Merge Join Hash Join n m
  • 27. No-one has been able to get joins consistently right! P = NP ?
  • 28. Getting I/O right… Language Processing (Parse/Bind) Query Optimization Statement/Batch Execution (Plan Generation, View Matching, Statistics, Costing) Query Execution (Query Operators, Memory Plan Cache Management Grants, Parallelism) Storage Engine (Access Methods, Database Page Cache, Locking, Transactions, …) SQL-OS (Schedulers, Buffer Pool, Memory Management, Synchronization Primitives, …)
  • 29. The Storage Engines makes I/O Transparent! Rest of engine only sees the API Storage Engine RAM Storage
  • 30. Two Different Philosophies Primitive SQL Server Analysis Services Scheduling Voluntary Yield, User Kernel mode, Preemptive mode I/O Engine Dedicated I/O stack Windows Buffered I/O Waiting / Spinning SQLOS Primitives Windows Memory Management SQLOS / Storage Engine Windows Paging Serialisation TDS special purpose XML Network Fully optimizable, async, Windows primitives, affinitized engine blocking
  • 31. SQL Server is different • Primitives are a different beast than Windows • Scale issues are generally specific to the core, not Windows • Exposes own “belly of the beast” profiling • SQL Team build their own primitives, often better than Windows core • Highest throughput app on Windows, drives all the scale stuff there
  • 32. Analysis Services is “just another App” • Analysis Services relies fully on Windows primitives • You can profile it by looking at how Windows behaves • Upgrades to Windows are more likely to help it • No TPC style benchmarks…
  • 33. A is for Atomic LINEITEM LINEITEM LINEITEM ORDER_KEY ORDER_KEY ORDER_KEY PART_KEY PART_KEY PART_KEY COMMITDATE COMMITDATE COMMITDATE QUANTITY QUANTITY QUANTITY ORDER ORDER ORDER ORDER_KEY ORDER_KEY ORDER_KEY CUSTOMER_KEY CUSTOMER_KEY CUSTOMER_KEY
  • 34. C is for Consistency LINEITEM LINEITEM LINEITEM ORDER_KEY ORDER_KEY COMMITDATE PART_KEY = 42 = 2012-02-30 COMMITDATE QUANTITY ORDER ORDER ORDER ORDER_KEY ORDER_KEY != 42 ORDER_KEY CUSTOMER_KEY
  • 35. I is for Isolation SELECT @LastTransaction_ID = LastTransaction_ID FROM ATM WHERE ATM_ID = 13 SELECT @LastTransaction_ID = LastTransaction_ID FROM ATM (@LastTransaction_ID = 42) WHERE ATM_ID = 13 (@LastTransaction_ID = 42) SET @ID = @LastTransaction_ID + 1 SET @ID = @LastTransaction_ID + 1 UPDATE ATM UPDATE ATM SET @LastTransaction_ID = @ID SET @LastTransaction_ID = @ID WHERE ATM_ID = 13 WHERE ATM_ID = 13
  • 36. D is for Durability Do Transactions Do Transactions Do Transactions Do Transactions Do Transactions Ack Do Transactions Ack Ack Do Transactions Ack Do Transactions Do Transactions Ack Do Transactions Ack Ack Ack Ack Ack
  • 37. Summary – Databases Help You • Do complex operations in optimal time • …at high parallelism • Optimise I/O pattern • Be ACID compliant • Store stuff safely… • noSQL/Big Data systems trade off >0 of these to get more of the others
  • 38. System Databases • Server won’t start without: • master • mssqlsystemressource • System CAN start, but wont work well • model • msdb • System will start under special conditions • tempdb
  • 39. Master and mssqlsystemressources • Together, contain all system information • Mssqlsystemressource • Lives under: MSSQLBinn • Contains all system code • Hidden by default • Master • Lives under: MSSQLDATA • You should move these to a safe location
  • 40. Disaster: Master or systemResources • You lost: • All passwords and server logins • All system wide certificates (You may be unable to decrypt!) • All System procedures you created • You are not 100% screwed, but you are in for a long night • Both can be rebuild (empty) during server start • …Or restored from backup • if you remembered to take one • Need /f and /T3608 to get back up
  • 41. Database: model • Every new created database is cloned from this • Loss is not catastrophic • Copy from healthy machine • Tempdb can’t boot without it • Lives with master
  • 42. Database tempdb • Database “swap file” • Does not survive restarts • No Durability guarantees here • Fast I/O helps
  • 43. Loss of Tempdb…is…Temporary • Will rebuild itself after instance restart • Configuration is stored in master • Clones from msdb • Nearly every installation must change defaults • If tempdb cannot be created, server will only start from command line
  • 44. User Databases and Failure • A database consists of • At least one Transaction Log File • The PRIMARY filegroup • At least one data file in PRIMARY • If any of these are lost, the database is dead • You can in some cases bring a database without a transaction log back alive • But typically with data loss… • Lesson: carefully protect all of above
  • 45. What is in the Files? PRIMARY Transaction Log Primary File Headers GAM / SGAM PFS Map VLF Metadata (system objects) VLF User Data VLF
  • 46. Data Files • Regular files in NTFS • Secured • Files can Auto Grow as needed • Risky • File Imbalance
  • 47. How are Database Files Created? • ALTER or CREATE DATABASE • Transaction log file always zeroed out • This looks super cool on FusionIo by the way • Data files MAY be zeroed out • Depends in privileges • May use instant file init
  • 48. Filegroups • Filegroups (one PRIMARY word) are containers of files User Data • Used to group similar data together DATA • Oracle people know User Data this concept as a table-spaces User Data • Files inside FG are accessed/allocated User Data round-robin User Data
  • 49. Reclaiming/Moving Space in Files • DBCC SHRINKFILE • REBUILD data
  • 50. DBCC SHRINKFILE 7 8 5 6 3 4 1 2 LUN 1 LUN 2 LUN 3 LUN 4
  • 51. How to reclaim space the right way… New Filegroup 7 8 7 8 5 6 5 6 3 4 3 4 1 2 1 2 LUN 1 LUN 2 LUN 3 LUN 4 ALTER INDEX Foo WITH REBUILD, SORT_IN_TEMPDB = ON
  • 52. PFS Contention • Too few PFS maps can lead to latch File contention PFS Map • Diagnosed in: sys.dm_os_waiting_tas User Data (8000 pages) ks PFS Map • Look for PAGELATCH_UP User Data (8000 pages)
  • 53. I/O DBA people worry about • DBAs typically diagnose issues with waits stats • Issues they look for: • WRITELOG/LOGBUFFER waits • PAGELATCHIO_<X> waits • BACKUPIO waits • IO_COMPLETION/ASYNC_IO_COMPLETION
  • 54. Places you need to know about • Diagnosing ressource waits: • sys.dm_os_wait_stats • Post 2008R2 – can use Xevents (harder) • More detail in: • sys.dm_io_virtual_filestats(NULL, NULL) • Confirm waits here! • SQL Server errors in log file:

Hinweis der Redaktion

  1. SQL Server will auto generate statistics for columns unless we ask it not to. This helps, but is never accurate
  2. Highlight spill warning
  3. http://msdn.microsoft.com/en-us/library/ms345408(v=sql.90).aspx
  4. http://msdn.microsoft.com/en-us/library/dd207003.aspx
  5. NET START MSSQL$&lt;instance&gt; /f /T3608 will bring instance up without tempdb
  6. http://msdn.microsoft.com/en-us/library/ms175935(v=sql.105).aspx