SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
VIRTUALIZED DATABASES?



Approach: mechanics of virtualization
"certain big players" will not be mentioned
Talk is general, mostly about hardware issues which are the same for any platform
ME

• Liz   van Dijk (@lizztheblizz)

• Working     at Sizing Servers Research Lab

• First-timer   at FOSDEM!

• Not    really a developer, not really a sysadmin, not really a DBA

•I   just like knowing how stuff works.
SO... VIRTUALIZATION, HUH.

  • It’s   far too broad a term

  • It’s   a pretty old concept. (about half a century, actually)

  • Its    main purposes are abstraction and security

          • Making    use of the correct CPU execution mode

          • Managing Virtual        Memory


History!
Broad term, 100 different meanings
Full-system virtualization on the mainframes in the 60's
IBM m44, trap and emulate

Recently:
* x86 did not support full virtualization, trap and emulate did not work
* multicore hardware, single threaded software. Inefficient datacenters.

Full Virtualization is not the only virtualization
combination of different methods

Who uses RAID?
Who uses Virtual Memory?

2 big issues that all solutions try to work around
Focus on these, the next steps should be more or less logical

Problem 1: matter of privileges
kernels assume full control over hardware
how does the hardware deal with this?

layer-based security system (onion)
2-bit code in memory address, cpu verifies the code, does or doesn't do the instruction

x86: 4 layers
code 00: supervisor mode
code 11: user mode
SO... VIRTUALIZATION, HUH.

  • It’s   far too broad a term

  • It’s   a pretty old concept. (about half a century, actually)

  • Its    main purposes are abstraction and security

          • Making    use of the correct CPU execution mode

          • Managing Virtual        Memory


History!
Broad term, 100 different meanings
Full-system virtualization on the mainframes in the 60's
IBM m44, trap and emulate

Recently:
* x86 did not support full virtualization, trap and emulate did not work
* multicore hardware, single threaded software. Inefficient datacenters.

Full Virtualization is not the only virtualization
combination of different methods

Who uses RAID?
Who uses Virtual Memory?

2 big issues that all solutions try to work around
Focus on these, the next steps should be more or less logical

Problem 1: matter of privileges
kernels assume full control over hardware
how does the hardware deal with this?

layer-based security system (onion)
2-bit code in memory address, cpu verifies the code, does or doesn't do the instruction

x86: 4 layers
code 00: supervisor mode
code 11: user mode
SO... VIRTUALIZATION, HUH.

  • It’s   far too broad a term

  • It’s   a pretty old concept. (about half a century, actually)

  • Its    main purposes are abstraction and security

          • Making    use of the correct CPU execution mode

          • Managing Virtual        Memory


History!
Broad term, 100 different meanings
Full-system virtualization on the mainframes in the 60's
IBM m44, trap and emulate

Recently:
* x86 did not support full virtualization, trap and emulate did not work
* multicore hardware, single threaded software. Inefficient datacenters.

Full Virtualization is not the only virtualization
combination of different methods

Who uses RAID?
Who uses Virtual Memory?

2 big issues that all solutions try to work around
Focus on these, the next steps should be more or less logical

Problem 1: matter of privileges
kernels assume full control over hardware
how does the hardware deal with this?

layer-based security system (onion)
2-bit code in memory address, cpu verifies the code, does or doesn't do the instruction

x86: 4 layers
code 00: supervisor mode
code 11: user mode
X86 VIRTUALIZATION

  • Binary Translation, aka “faking         it”

       • Applies    ring deprivileging, and translates “bad calls” on the
         fly

  • “Full” Hardware Virtualization

       • Introduced      Ring -1: Hypervisor mode

       • Only    intervenes when absolutely necessary

BT, old awesome, employed by QEMU and wine.
Less relevant now for full-virtualization
ring deprivileging, look it up!

Intel/AMD caught up, implemented VT-x and AMD-V
ring -1: hypervisor
Let OS'es do whatever they want, but use trap and emulate
extra roundtrip, extra overhead

CPU has more tasks to perform, but they also take longer
newer cpu is better
X86 VIRTUALIZATION

  • Binary Translation, aka “faking         it”

       • Applies    ring deprivileging, and translates “bad calls” on the
         fly

  • “Full” Hardware Virtualization

       • Introduced      Ring -1: Hypervisor mode

       • Only    intervenes when absolutely necessary

BT, old awesome, employed by QEMU and wine.
Less relevant now for full-virtualization
ring deprivileging, look it up!

Intel/AMD caught up, implemented VT-x and AMD-V
ring -1: hypervisor
Let OS'es do whatever they want, but use trap and emulate
extra roundtrip, extra overhead

CPU has more tasks to perform, but they also take longer
newer cpu is better
X86 VIRTUALIZATION

  • Binary Translation, aka “faking         it”

       • Applies    ring deprivileging, and translates “bad calls” on the
         fly

  • “Full” Hardware Virtualization

       • Introduced      Ring -1: Hypervisor mode

       • Only    intervenes when absolutely necessary

BT, old awesome, employed by QEMU and wine.
Less relevant now for full-virtualization
ring deprivileging, look it up!

Intel/AMD caught up, implemented VT-x and AMD-V
ring -1: hypervisor
Let OS'es do whatever they want, but use trap and emulate
extra roundtrip, extra overhead

CPU has more tasks to perform, but they also take longer
newer cpu is better
VIRTUAL MEMORY
                                                 0xA
                                                 0xB
                                                 0xC
                                                 0xD
                                                 0xE
                                                  0xF
                                                 0xG
                                                 0xH
                                                                  CPU
                                                Mem




            Managed by software

             Actual Hardware

Problem 2: Virtual memory
4kb physical segments with physical addresses
software: pages

very easy to manage in OS, all software gets a continuous block
page table keeps track of physical to virtual mapping

TLB cache keeps track of these mappings, very fast
needs to flush every context switch.
VIRTUAL MEMORY
               Virtual
                                                 0xA
              Memory
                                                 0xB

                 1                               0xC
                 2                               0xD
                 3                               0xE
                 4                                0xF
                 5                               0xG


         OS
                 6
                 7
                                                 0xH
                                                                  CPU
                 8
                 9
                                                Mem
                 10
                 11
                 12




            Managed by software

             Actual Hardware

Problem 2: Virtual memory
4kb physical segments with physical addresses
software: pages

very easy to manage in OS, all software gets a continuous block
page table keeps track of physical to virtual mapping

TLB cache keeps track of these mappings, very fast
needs to flush every context switch.
VIRTUAL MEMORY
               Virtual
                                  Page Table     0xA
              Memory
                                                 0xB

                 1                  1 | 0xD      0xC
                 2                  2 | 0xC      0xD
                 3                  3 | 0xF      0xE
                 4                  4 | 0xA       0xF
                 5                  5 | 0xH      0xG


         OS
                 6
                 7
                                    6 | 0xG
                                    7 | 0xB
                                                 0xH
                                                                  CPU
                 8
                 9
                                    8 | 0xE
                                                Mem
                 10
                 11
                                      etc.
                 12




            Managed by software

             Actual Hardware

Problem 2: Virtual memory
4kb physical segments with physical addresses
software: pages

very easy to manage in OS, all software gets a continuous block
page table keeps track of physical to virtual mapping

TLB cache keeps track of these mappings, very fast
needs to flush every context switch.
VIRTUAL MEMORY
               Virtual
                                  Page Table     0xA
              Memory
                                                 0xB

                 1                  1 | 0xD      0xC
                 2                  2 | 0xC      0xD          TLB
                 3                  3 | 0xF      0xE
                                                             1 | 0xD
                 4                  4 | 0xA       0xF
                                                             5 | 0xH
                 5                  5 | 0xH      0xG
                                                             2 | 0xC

         OS
                 6
                 7
                                    6 | 0xG
                                    7 | 0xB
                                                 0xH
                                                                       CPU
                 8
                 9
                                    8 | 0xE
                                                Mem
                                                              etc.
                 10
                 11
                                      etc.
                 12




            Managed by software

             Actual Hardware

Problem 2: Virtual memory
4kb physical segments with physical addresses
software: pages

very easy to manage in OS, all software gets a continuous block
page table keeps track of physical to virtual mapping

TLB cache keeps track of these mappings, very fast
needs to flush every context switch.
SPT VS HAP
                      “Read-only”
                                          0xA
                       Page Table
                                          0xB
                        1 | 0xD           0xC
                 1
                 2      5 | 0xH           0xD

       VM A      3      2 | 0xC           0xE
                                          0xF
                 4
                 5                        0xG
                          N
                                          0xH
                                                                      CPU
                 1
                        12 | 0xB

                        10 | 0xE
                                        Mem
                 2

        VM B     3      9 | 0xA

                 4
                 12       etc.




            Managed by VM OS
            Managed by hypervisor
            Actual Hardware

2 methods
locked page table, access generates trap, VMM handles memory access
much slower memory access

EPT/RVI/HAP
Make TLB much bigger, make it smarter, VM-aware
much more complex to fill up, though. slow initial memory access
filled TLB is very fast, tho.
SPT VS HAP
                      “Read-only”   “Shadow”
                                                  0xA
                       Page Table   Page Table
                                                  0xB
                        1 | 0xD       1 | 0xG     0xC
                 1
                        5 | 0xH       5 | 0xD     0xD
                 2

       VM A      3      2 | 0xC       2 | 0xF     0xE
                                                  0xF
                 4
                 5
                          N
                                        A         0xG
                                                  0xH
                                                                      CPU
                 1
                        12 | 0xB

                        10 | 0xE
                                      12 | 0xE
                                      10 | 0xB
                                                 Mem
                 2

        VM B     3      9 | 0xA       9 | 0xC

                 4
                 12       etc.
                                        B


            Managed by VM OS
            Managed by hypervisor
            Actual Hardware

2 methods
locked page table, access generates trap, VMM handles memory access
much slower memory access

EPT/RVI/HAP
Make TLB much bigger, make it smarter, VM-aware
much more complex to fill up, though. slow initial memory access
filled TLB is very fast, tho.
SPT VS HAP
                      “Read-only”
                                          0xA
                       Page Table                           TLB
                                          0xB
                        1 | 0xD           0xC              A1 | 0xD
                 1
                        5 | 0xH           0xD              A5 | 0xH
                 2

       VM A      3      2 | 0xC           0xE
                                          0xF
                                                           A2 | 0xC
                                                           B12 | 0xB
                 4
                 5                        0xG              B10 | 0xE
                          N
                                          0xH              B9 | 0xA
                                                                       CPU
                 1
                        12 | 0xB

                        10 | 0xE
                                        Mem
                 2

        VM B     3      9 | 0xA

                 4
                 12       etc.                               etc.




            Managed by VM OS
            Managed by hypervisor
            Actual Hardware

2 methods
locked page table, access generates trap, VMM handles memory access
much slower memory access

EPT/RVI/HAP
Make TLB much bigger, make it smarter, VM-aware
much more complex to fill up, though. slow initial memory access
filled TLB is very fast, tho.
WHAT DOES THIS TEACH US?



   • All “kernel” activity is a lot more              costly:
       • Interrupts
       • System Calls (I/O)
       • Memory page management




so, 3 actions are slower in virtualization
Interrupts - hardware asking for attention
System Calls - software asking for kernel attention
Page Management - memory access
IN THE WILD...


• From   best to worst case scenario...

   • Bare-metal   (Xen, KVM, ESX, Hyper-V)

   • Host-based    (VirtualBox, VMware Workstation, etc.)

   • Cloud-based    (Amazon, Terremark, etc.)
BARE-METAL OPTIONS

  • Know      your my.cnf inside out

  • Use hardware-assisted paging + Large Pages! (InnoDB: large-
    pages)

  • Make      use of paravirtualized HW options

  • Take   care of all your caching levels

  • Use    DirectIO (innodb_flush_method=O_DIRECT)

smalls mistakes in a native environment get bigger in virtual one
memory allocations are expensive
optimize your my.cnf!!!
tools.percona.com good starting point
connection-specific buffers (join-buffer, sort-buffer, etc)
sweet spot = test!!

SWAPPING = EVIL
swappiness

Large Pages

DirectIO
BARE-METAL OPTIONS

  • Know      your my.cnf inside out

  • Use hardware-assisted paging + Large Pages! (InnoDB: large-
    pages)

  • Make      use of paravirtualized HW options

  • Take   care of all your caching levels

  • Use    DirectIO (innodb_flush_method=O_DIRECT)

smalls mistakes in a native environment get bigger in virtual one
memory allocations are expensive
optimize your my.cnf!!!
tools.percona.com good starting point
connection-specific buffers (join-buffer, sort-buffer, etc)
sweet spot = test!!

SWAPPING = EVIL
swappiness

Large Pages

DirectIO
BARE-METAL OPTIONS

  • Know      your my.cnf inside out

  • Use hardware-assisted paging + Large Pages! (InnoDB: large-
    pages)

  • Make      use of paravirtualized HW options

  • Take   care of all your caching levels

  • Use    DirectIO (innodb_flush_method=O_DIRECT)

smalls mistakes in a native environment get bigger in virtual one
memory allocations are expensive
optimize your my.cnf!!!
tools.percona.com good starting point
connection-specific buffers (join-buffer, sort-buffer, etc)
sweet spot = test!!

SWAPPING = EVIL
swappiness

Large Pages

DirectIO
HARDWARE CHOICES
 • Choosing      the right CPU’s

       • Intel5500/7500 and later types
         (Nehalem) / All AMD quadcore
         Opterons (HW-assisted/MMU
         virtualization)

 • Choosing      the right NIC’s (VMDQ)

 • Choosing    the right storage system
    (iSCSI vs FC SAN)

CPU's listed here support both HW-assist and HAP

virtual machine device queueing
HOST-BASED

 • All   of the above, if possible :)

 • IO    becomes the bigger issue on standard client hardware

      • Focus  on moving database IO away from the same disk
         you run the host- and guest-OS on.

      • Consider    installing an SSD :)



Keep in mind all of the previous things
IO is a bigger issue
2 OS'es + DB running on the same disk always a problem
separate disk, maybe iSCSI lun?
buy an SSD!
CLOUD-BASED

  • No   control whatsoever over host-system :(

  • Sometimes       unreliable IO



  • Change     strategy! Design for easy sharding and replication!

  • Caching    caching caching!

  • Consider     RDS to reduce operational overhead?

Can't escape the hurt
unreliable disk IO
CACHING
sharding/replication to spread write/read load
very write-heavy may be more trouble than it's worth
asynchronous writes? not very durable
Use RDS to cut back operational cost
THANKS!

Weitere ähnliche Inhalte

Ähnlich wie Virtualized Databases?

HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe ShockwaveHES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe ShockwaveHackito Ergo Sum
 
Gash Has No Privileges
Gash Has No PrivilegesGash Has No Privileges
Gash Has No PrivilegesDavid Evans
 
Arm device tree and linux device drivers
Arm device tree and linux device driversArm device tree and linux device drivers
Arm device tree and linux device driversHoucheng Lin
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBshimosawa
 
Micro control idsecconf2010
Micro control idsecconf2010Micro control idsecconf2010
Micro control idsecconf2010idsecconf
 
Cumpute to infinity
Cumpute   to infinityCumpute   to infinity
Cumpute to infinityIan Stuart
 
Don't Tell Joanna the Virtualized Rootkit is Dead (Blackhat 2007)
Don't Tell Joanna the Virtualized Rootkit is Dead (Blackhat 2007)Don't Tell Joanna the Virtualized Rootkit is Dead (Blackhat 2007)
Don't Tell Joanna the Virtualized Rootkit is Dead (Blackhat 2007)Nate Lawson
 
Exploitation and State Machines
Exploitation and State MachinesExploitation and State Machines
Exploitation and State MachinesMichael Scovetta
 
01. introduction to embedded systems
01. introduction to embedded systems01. introduction to embedded systems
01. introduction to embedded systemsayush1313
 
Brief Introduction to Parallella
Brief Introduction to ParallellaBrief Introduction to Parallella
Brief Introduction to ParallellaSomnath Mazumdar
 
了解Cpu
了解Cpu了解Cpu
了解CpuFeng Yu
 
IT Book of Knowledge
IT Book of KnowledgeIT Book of Knowledge
IT Book of KnowledgePhil Primeau
 
Introduction to Linux Exploit Development
Introduction to Linux Exploit DevelopmentIntroduction to Linux Exploit Development
Introduction to Linux Exploit Developmentjohndegruyter
 
Jaime Peñalba - Kernel exploitation. ¿El octavo arte? [rooted2019]
Jaime Peñalba - Kernel exploitation. ¿El octavo arte? [rooted2019]Jaime Peñalba - Kernel exploitation. ¿El octavo arte? [rooted2019]
Jaime Peñalba - Kernel exploitation. ¿El octavo arte? [rooted2019]RootedCON
 
Gpu computing workshop
Gpu computing workshopGpu computing workshop
Gpu computing workshopdatastack
 
Reverse Engineering the TomTom Runner pt. 1
Reverse Engineering the TomTom Runner pt. 1 Reverse Engineering the TomTom Runner pt. 1
Reverse Engineering the TomTom Runner pt. 1 Luis Grangeia
 

Ähnlich wie Virtualized Databases? (20)

Surge2012
Surge2012Surge2012
Surge2012
 
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe ShockwaveHES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
 
Gash Has No Privileges
Gash Has No PrivilegesGash Has No Privileges
Gash Has No Privileges
 
Arm device tree and linux device drivers
Arm device tree and linux device driversArm device tree and linux device drivers
Arm device tree and linux device drivers
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
 
Micro control idsecconf2010
Micro control idsecconf2010Micro control idsecconf2010
Micro control idsecconf2010
 
Cumpute to infinity
Cumpute   to infinityCumpute   to infinity
Cumpute to infinity
 
Eusecwest
EusecwestEusecwest
Eusecwest
 
Don't Tell Joanna the Virtualized Rootkit is Dead (Blackhat 2007)
Don't Tell Joanna the Virtualized Rootkit is Dead (Blackhat 2007)Don't Tell Joanna the Virtualized Rootkit is Dead (Blackhat 2007)
Don't Tell Joanna the Virtualized Rootkit is Dead (Blackhat 2007)
 
Exploitation and State Machines
Exploitation and State MachinesExploitation and State Machines
Exploitation and State Machines
 
01. introduction to embedded systems
01. introduction to embedded systems01. introduction to embedded systems
01. introduction to embedded systems
 
Brief Introduction to Parallella
Brief Introduction to ParallellaBrief Introduction to Parallella
Brief Introduction to Parallella
 
了解Cpu
了解Cpu了解Cpu
了解Cpu
 
IT Book of Knowledge
IT Book of KnowledgeIT Book of Knowledge
IT Book of Knowledge
 
Introduction to Linux Exploit Development
Introduction to Linux Exploit DevelopmentIntroduction to Linux Exploit Development
Introduction to Linux Exploit Development
 
Jaime Peñalba - Kernel exploitation. ¿El octavo arte? [rooted2019]
Jaime Peñalba - Kernel exploitation. ¿El octavo arte? [rooted2019]Jaime Peñalba - Kernel exploitation. ¿El octavo arte? [rooted2019]
Jaime Peñalba - Kernel exploitation. ¿El octavo arte? [rooted2019]
 
Virapix
VirapixVirapix
Virapix
 
Gpu computing workshop
Gpu computing workshopGpu computing workshop
Gpu computing workshop
 
Reverse Engineering the TomTom Runner pt. 1
Reverse Engineering the TomTom Runner pt. 1 Reverse Engineering the TomTom Runner pt. 1
Reverse Engineering the TomTom Runner pt. 1
 
Common computer myth’s
Common computer myth’sCommon computer myth’s
Common computer myth’s
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Kürzlich hochgeladen (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Virtualized Databases?

  • 1. VIRTUALIZED DATABASES? Approach: mechanics of virtualization "certain big players" will not be mentioned Talk is general, mostly about hardware issues which are the same for any platform
  • 2. ME • Liz van Dijk (@lizztheblizz) • Working at Sizing Servers Research Lab • First-timer at FOSDEM! • Not really a developer, not really a sysadmin, not really a DBA •I just like knowing how stuff works.
  • 3. SO... VIRTUALIZATION, HUH. • It’s far too broad a term • It’s a pretty old concept. (about half a century, actually) • Its main purposes are abstraction and security • Making use of the correct CPU execution mode • Managing Virtual Memory History! Broad term, 100 different meanings Full-system virtualization on the mainframes in the 60's IBM m44, trap and emulate Recently: * x86 did not support full virtualization, trap and emulate did not work * multicore hardware, single threaded software. Inefficient datacenters. Full Virtualization is not the only virtualization combination of different methods Who uses RAID? Who uses Virtual Memory? 2 big issues that all solutions try to work around Focus on these, the next steps should be more or less logical Problem 1: matter of privileges kernels assume full control over hardware how does the hardware deal with this? layer-based security system (onion) 2-bit code in memory address, cpu verifies the code, does or doesn't do the instruction x86: 4 layers code 00: supervisor mode code 11: user mode
  • 4. SO... VIRTUALIZATION, HUH. • It’s far too broad a term • It’s a pretty old concept. (about half a century, actually) • Its main purposes are abstraction and security • Making use of the correct CPU execution mode • Managing Virtual Memory History! Broad term, 100 different meanings Full-system virtualization on the mainframes in the 60's IBM m44, trap and emulate Recently: * x86 did not support full virtualization, trap and emulate did not work * multicore hardware, single threaded software. Inefficient datacenters. Full Virtualization is not the only virtualization combination of different methods Who uses RAID? Who uses Virtual Memory? 2 big issues that all solutions try to work around Focus on these, the next steps should be more or less logical Problem 1: matter of privileges kernels assume full control over hardware how does the hardware deal with this? layer-based security system (onion) 2-bit code in memory address, cpu verifies the code, does or doesn't do the instruction x86: 4 layers code 00: supervisor mode code 11: user mode
  • 5. SO... VIRTUALIZATION, HUH. • It’s far too broad a term • It’s a pretty old concept. (about half a century, actually) • Its main purposes are abstraction and security • Making use of the correct CPU execution mode • Managing Virtual Memory History! Broad term, 100 different meanings Full-system virtualization on the mainframes in the 60's IBM m44, trap and emulate Recently: * x86 did not support full virtualization, trap and emulate did not work * multicore hardware, single threaded software. Inefficient datacenters. Full Virtualization is not the only virtualization combination of different methods Who uses RAID? Who uses Virtual Memory? 2 big issues that all solutions try to work around Focus on these, the next steps should be more or less logical Problem 1: matter of privileges kernels assume full control over hardware how does the hardware deal with this? layer-based security system (onion) 2-bit code in memory address, cpu verifies the code, does or doesn't do the instruction x86: 4 layers code 00: supervisor mode code 11: user mode
  • 6. X86 VIRTUALIZATION • Binary Translation, aka “faking it” • Applies ring deprivileging, and translates “bad calls” on the fly • “Full” Hardware Virtualization • Introduced Ring -1: Hypervisor mode • Only intervenes when absolutely necessary BT, old awesome, employed by QEMU and wine. Less relevant now for full-virtualization ring deprivileging, look it up! Intel/AMD caught up, implemented VT-x and AMD-V ring -1: hypervisor Let OS'es do whatever they want, but use trap and emulate extra roundtrip, extra overhead CPU has more tasks to perform, but they also take longer newer cpu is better
  • 7. X86 VIRTUALIZATION • Binary Translation, aka “faking it” • Applies ring deprivileging, and translates “bad calls” on the fly • “Full” Hardware Virtualization • Introduced Ring -1: Hypervisor mode • Only intervenes when absolutely necessary BT, old awesome, employed by QEMU and wine. Less relevant now for full-virtualization ring deprivileging, look it up! Intel/AMD caught up, implemented VT-x and AMD-V ring -1: hypervisor Let OS'es do whatever they want, but use trap and emulate extra roundtrip, extra overhead CPU has more tasks to perform, but they also take longer newer cpu is better
  • 8. X86 VIRTUALIZATION • Binary Translation, aka “faking it” • Applies ring deprivileging, and translates “bad calls” on the fly • “Full” Hardware Virtualization • Introduced Ring -1: Hypervisor mode • Only intervenes when absolutely necessary BT, old awesome, employed by QEMU and wine. Less relevant now for full-virtualization ring deprivileging, look it up! Intel/AMD caught up, implemented VT-x and AMD-V ring -1: hypervisor Let OS'es do whatever they want, but use trap and emulate extra roundtrip, extra overhead CPU has more tasks to perform, but they also take longer newer cpu is better
  • 9. VIRTUAL MEMORY 0xA 0xB 0xC 0xD 0xE 0xF 0xG 0xH CPU Mem Managed by software Actual Hardware Problem 2: Virtual memory 4kb physical segments with physical addresses software: pages very easy to manage in OS, all software gets a continuous block page table keeps track of physical to virtual mapping TLB cache keeps track of these mappings, very fast needs to flush every context switch.
  • 10. VIRTUAL MEMORY Virtual 0xA Memory 0xB 1 0xC 2 0xD 3 0xE 4 0xF 5 0xG OS 6 7 0xH CPU 8 9 Mem 10 11 12 Managed by software Actual Hardware Problem 2: Virtual memory 4kb physical segments with physical addresses software: pages very easy to manage in OS, all software gets a continuous block page table keeps track of physical to virtual mapping TLB cache keeps track of these mappings, very fast needs to flush every context switch.
  • 11. VIRTUAL MEMORY Virtual Page Table 0xA Memory 0xB 1 1 | 0xD 0xC 2 2 | 0xC 0xD 3 3 | 0xF 0xE 4 4 | 0xA 0xF 5 5 | 0xH 0xG OS 6 7 6 | 0xG 7 | 0xB 0xH CPU 8 9 8 | 0xE Mem 10 11 etc. 12 Managed by software Actual Hardware Problem 2: Virtual memory 4kb physical segments with physical addresses software: pages very easy to manage in OS, all software gets a continuous block page table keeps track of physical to virtual mapping TLB cache keeps track of these mappings, very fast needs to flush every context switch.
  • 12. VIRTUAL MEMORY Virtual Page Table 0xA Memory 0xB 1 1 | 0xD 0xC 2 2 | 0xC 0xD TLB 3 3 | 0xF 0xE 1 | 0xD 4 4 | 0xA 0xF 5 | 0xH 5 5 | 0xH 0xG 2 | 0xC OS 6 7 6 | 0xG 7 | 0xB 0xH CPU 8 9 8 | 0xE Mem etc. 10 11 etc. 12 Managed by software Actual Hardware Problem 2: Virtual memory 4kb physical segments with physical addresses software: pages very easy to manage in OS, all software gets a continuous block page table keeps track of physical to virtual mapping TLB cache keeps track of these mappings, very fast needs to flush every context switch.
  • 13. SPT VS HAP “Read-only” 0xA Page Table 0xB 1 | 0xD 0xC 1 2 5 | 0xH 0xD VM A 3 2 | 0xC 0xE 0xF 4 5 0xG N 0xH CPU 1 12 | 0xB 10 | 0xE Mem 2 VM B 3 9 | 0xA 4 12 etc. Managed by VM OS Managed by hypervisor Actual Hardware 2 methods locked page table, access generates trap, VMM handles memory access much slower memory access EPT/RVI/HAP Make TLB much bigger, make it smarter, VM-aware much more complex to fill up, though. slow initial memory access filled TLB is very fast, tho.
  • 14. SPT VS HAP “Read-only” “Shadow” 0xA Page Table Page Table 0xB 1 | 0xD 1 | 0xG 0xC 1 5 | 0xH 5 | 0xD 0xD 2 VM A 3 2 | 0xC 2 | 0xF 0xE 0xF 4 5 N A 0xG 0xH CPU 1 12 | 0xB 10 | 0xE 12 | 0xE 10 | 0xB Mem 2 VM B 3 9 | 0xA 9 | 0xC 4 12 etc. B Managed by VM OS Managed by hypervisor Actual Hardware 2 methods locked page table, access generates trap, VMM handles memory access much slower memory access EPT/RVI/HAP Make TLB much bigger, make it smarter, VM-aware much more complex to fill up, though. slow initial memory access filled TLB is very fast, tho.
  • 15. SPT VS HAP “Read-only” 0xA Page Table TLB 0xB 1 | 0xD 0xC A1 | 0xD 1 5 | 0xH 0xD A5 | 0xH 2 VM A 3 2 | 0xC 0xE 0xF A2 | 0xC B12 | 0xB 4 5 0xG B10 | 0xE N 0xH B9 | 0xA CPU 1 12 | 0xB 10 | 0xE Mem 2 VM B 3 9 | 0xA 4 12 etc. etc. Managed by VM OS Managed by hypervisor Actual Hardware 2 methods locked page table, access generates trap, VMM handles memory access much slower memory access EPT/RVI/HAP Make TLB much bigger, make it smarter, VM-aware much more complex to fill up, though. slow initial memory access filled TLB is very fast, tho.
  • 16. WHAT DOES THIS TEACH US? • All “kernel” activity is a lot more costly: • Interrupts • System Calls (I/O) • Memory page management so, 3 actions are slower in virtualization Interrupts - hardware asking for attention System Calls - software asking for kernel attention Page Management - memory access
  • 17. IN THE WILD... • From best to worst case scenario... • Bare-metal (Xen, KVM, ESX, Hyper-V) • Host-based (VirtualBox, VMware Workstation, etc.) • Cloud-based (Amazon, Terremark, etc.)
  • 18. BARE-METAL OPTIONS • Know your my.cnf inside out • Use hardware-assisted paging + Large Pages! (InnoDB: large- pages) • Make use of paravirtualized HW options • Take care of all your caching levels • Use DirectIO (innodb_flush_method=O_DIRECT) smalls mistakes in a native environment get bigger in virtual one memory allocations are expensive optimize your my.cnf!!! tools.percona.com good starting point connection-specific buffers (join-buffer, sort-buffer, etc) sweet spot = test!! SWAPPING = EVIL swappiness Large Pages DirectIO
  • 19. BARE-METAL OPTIONS • Know your my.cnf inside out • Use hardware-assisted paging + Large Pages! (InnoDB: large- pages) • Make use of paravirtualized HW options • Take care of all your caching levels • Use DirectIO (innodb_flush_method=O_DIRECT) smalls mistakes in a native environment get bigger in virtual one memory allocations are expensive optimize your my.cnf!!! tools.percona.com good starting point connection-specific buffers (join-buffer, sort-buffer, etc) sweet spot = test!! SWAPPING = EVIL swappiness Large Pages DirectIO
  • 20. BARE-METAL OPTIONS • Know your my.cnf inside out • Use hardware-assisted paging + Large Pages! (InnoDB: large- pages) • Make use of paravirtualized HW options • Take care of all your caching levels • Use DirectIO (innodb_flush_method=O_DIRECT) smalls mistakes in a native environment get bigger in virtual one memory allocations are expensive optimize your my.cnf!!! tools.percona.com good starting point connection-specific buffers (join-buffer, sort-buffer, etc) sweet spot = test!! SWAPPING = EVIL swappiness Large Pages DirectIO
  • 21. HARDWARE CHOICES • Choosing the right CPU’s • Intel5500/7500 and later types (Nehalem) / All AMD quadcore Opterons (HW-assisted/MMU virtualization) • Choosing the right NIC’s (VMDQ) • Choosing the right storage system (iSCSI vs FC SAN) CPU's listed here support both HW-assist and HAP virtual machine device queueing
  • 22. HOST-BASED • All of the above, if possible :) • IO becomes the bigger issue on standard client hardware • Focus on moving database IO away from the same disk you run the host- and guest-OS on. • Consider installing an SSD :) Keep in mind all of the previous things IO is a bigger issue 2 OS'es + DB running on the same disk always a problem separate disk, maybe iSCSI lun? buy an SSD!
  • 23. CLOUD-BASED • No control whatsoever over host-system :( • Sometimes unreliable IO • Change strategy! Design for easy sharding and replication! • Caching caching caching! • Consider RDS to reduce operational overhead? Can't escape the hurt unreliable disk IO CACHING sharding/replication to spread write/read load very write-heavy may be more trouble than it's worth asynchronous writes? not very durable Use RDS to cut back operational cost