SlideShare ist ein Scribd-Unternehmen logo
1 von 48
The e820 trap of Linux kernelThe e820 trap of Linux kernel
hibernationhibernation
AugAug, 2015, COSCUP 2015, Taipei, 2015, COSCUP 2015, Taipei
Joey Lee, SUSE Labs Taipei
2
Agenda
• Fundamental
• Hibernation (suspen to disk)
• e820, EFI memmap
• e820 shift
• Platform vs. Shutdown
• memory size changing
• EFI memmap shift
• setup_data and nosave regions
• EFI runtime services broken after S4
• Challenges
• Q&A
FundamentalFundamental
4
Memory (physical)
pfn = 0
pfn = max_pfn
5
Memory (runtime)
0
max_pfn
6
Hibernation (suspend to disk)
• Create snapshot image of runtime memory.
• Store snapshot image to swap partition or file.
• Restore snapshot image to memory.
7
Hibernation (restore)
0
max_pfn
0
max_pfn
Memory restored
8
Memory (physical)
pfn = 0
pfn = max_pfn
9
Memory (BIOS memory map)
0
max_pfn
0
max_pfn
Boot
Boot
10
e820
• Wikipedia:
• e820 is shorthand to refer to the facility by which the
BIOS of x86-based computer systems reports the
memory map to the operating system or boot loader.
• It is accessed via the int 15h call, by setting the AX
register to value E820 in hexadecimal. It reports which
memory address ranges are usable and which are
reserved for use by the BIOS.
11
12
e820 entry type
Type Kernel Define String in dmesg Description
Type 1 E820_RAM usable,
System RAM
Usable (normal) RAM
Type 2 E820_RESERVED reserved,
reserved
Reserved - unusable
Type 3 E820_ACPI ACPI data,
ACPI Tables
ACPI reclaimable memory
Type 4 E820_NVS* ACPI NVS,
ACPI Non-volatile Storage
ACPI NVS memory,
ACPI Non-Volatile-Sleeping
Memory (NVS)
Type 5 E820_UNUSABLE Unusable,
Unusable memory
Area containing bad
memory
* drivers/acpi/nvs.c::suspend_nvs_*() handle ACPI NVS for S4
13
Memory (BIOS memory map)
0
max_pfn
0
max_pfn
Boot
Boot
14
Memory (runtime)
0
max_pfn
0
max_pfn
Boot
ACPI NVS
reserved
ACPI data
reserved
Boot
useable
useable
useable
useable
useable
useable
0
max_pfn
Boot
ACPI NVS
reserved
ACPI data
reserved
useable
useable
useable
useable
useable
useable
OS
15
EFI memory map
• EFI spec v2.5
• EFI_BOOT_SERVICES.GetMemoryMap()
• Returns the current memory map.
• 6.2 Memory Allocation Services
• Table 25. Memory Type Usage before
ExitBootServices()
• Table 26. Memory Type Usage after ExitBootServices()
16
17
e820 entry type vs. EFI memory region type
E820 Type E820 entry type EFI memory region type
Type 1 E820_RAM EFI_LOADER_CODE (type 1)
EFI_LOADER_DATA (type 2)
EFI_BOOT_SERVICES_CODE (type 3)
EFI_BOOT_SERVICES_DATA (type 4)
EFI_CONVENTIONAL_MEMORY (type 7)
Type 2 E820_RESERVED EFI_RESERVED_TYPE (type 0)
EFI_RUNTIME_SERVICES_CODE (type 5)
EFI_RUNTIME_SERVICES_DATA (type 6)
EFI_MEMORY_MAPPED_IO (type 11)
EFI_MEMORY_MAPPED_IO_PORT_SPACE
(type 12)
EFI_PAL_CODE (type 13)
Type 3 E820_ACPI EFI_ACPI_RECLAIM_MEMORY (type 9)
Type 4 E820_NVS EFI_ACPI_MEMORY_NVS (type 10)
Type 5 E820_UNUSABLE EFI_UNUSABLE_MEMORY (type 8)
New* E820_PMEM EFI_PERSISTENT_MEMORY (type 14)
* v4.2-rc4
arch/x86/boot/compressed/eboot.c::setup_e820()
e820 shifte820 shift
19
20
21
e820 shift (1)
Boot 1:
Boot 2:
22
e820 shift (2)
• Boot:
• [ 0.000000] BIOS-e820: [mem 0x0000000068f45000-0x0000000069d4ffff]
usable
• Resume Boot:
• [ 0.000000] BIOS-e820: [mem 0x0000000069d4f000-0x0000000069e12fff]
reserved
• [ 0.000000] PM: Registered nosave memory: [mem 0x69d4f000-0x69e12fff]
• [ 17.410733] PM: Image loading progress: 0%
• [ 17.929495] BUG: unable to handle kernel paging request at ffff880069d4f000
• [ 17.933469] IP: [<ffffffff810a1cf0>] load_image_lzo+0x810/0xe40
• Page fault address is in usable memory entry when boot,
but in reserved memory entry when resume boot.
23
e820 shift (3)
0
max_pfn
Boot
ACPI NVS
reserved
ACPI data
reserved
useable
useable
useable
useable
useable
useable
max_pfn
Boot
ACPI NVS
reserved
ACPI data
reserved
useable
useable
useable
useable
useable
useable
0
Boot Resume Boot
Useable address
in reserved region
24
Checking e820 shift:
• Lee, Chun-Yi [PATCH] PM / hibernate: avoid unsafe pages
in e820 reserved regions:
• 84c91b7ae commit in v3.17-rc1
• https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=84c91b7
• Reverted by f82daee49 commit in v4.0
• Waiting “Yinghai Lu<> [PATCH]x86: Kill E820_RESERVED_KERN”
• Lee, Chun-Yi [PATCH] Hibernate: save e820 table to
snapshot header for comparison
• https://lkml.org/lkml/2014/8/11/166
25
Platform vs. Shutdown (1)
• Different modes of hibernation:
• cat /sys/power/disk
[platform] shutdown reboot suspend
• Platform mode depends on _S4 support by BIOS:
[ 1.080004] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [_S4_]
(20130725/hwxface-571)
• ACPI spec 6.0:
• Table 7-234 BIOS-Supplied Control Methods for System-Level Functions
• _S4: Package that defines system _S4 state mode.
• 16.3.2 BIOS Initialization of Memory (since ACPI v1.0):
• Note: The memory information returned from the system address map
reporting interfaces should be the same before and after an S4 sleep.
OSPM will invoke E820 interfaces on IA-PC-based legacy systems or the
GetMemoryMap() interface on UEFI-enabled systems
26
Platform vs. Shutdown (2)
• Documentation/power/swsusp.txt in kernel
• Q: What is the difference between "platform" and "shutdown"?
• A: "platform" is actually right thing to do where supported, but
"shutdown" is most reliable (except on ACPI systems).
• Linux Kernel bug #77571:
• https://bugzilla.kernel.org/show_bug.cgi?id=77571
• The same page fault when writing snapshot image to page buffer.
• Bug reporter uses “shutdown” but not “platform”.
After using “platform”, bug reporter can not reproduce issue.
• That's better using platform when BIOS support _S4.
User should aware that has risk when using “shutdown”.
27
Memory size mismatch (1)
• PM: Loading and decompressing image data (495448 pages)...
[ 3.834831] PM: Image mismatch: memory size
[ 3.834851] PM: Read 1981792 kbytes in 0.01 seconds (198179.20 MB/s)
[ 3.836147] PM: Error -1 resuming
[ 3.836162] PM: Failed to load hibernation image, recovering.
• Normally: On node 0 totalpages: 4177255
When issue happened: On node 0 totalpages: 4177256 <== mismatch
• for_each_online_node(nid)
phys_pages += node_present_pages(nid);
• kernel/power/snapshot.c::check_header()
if (!reason && info->num_physpages != get_num_physpages())
reason = "memory size";
if (reason) {
printk(KERN_ERR "PM: Image mismatch: %sn", reason);
return -EPERM;
}
28
Memory size mismatch (2)
• Boot Memory map of Boot
29
Memory size mismatch (3)
• Resume Boot
Memory map of Resume Boot
EFI memmap shiftEFI memmap shift
31
Misidentification of nosave region (1)
1 page
In usable
Not align
EFI_LOADER_DATA
32
setup_data and E820_RESERVED_KERN
• setup_data: a linked list for carrying data with boot_params
to later boot stage.
• Allocated in EFI stub, reserved via memblock and e820.
• Yinghai Lu<> [PATCH] x86, boot: clean up setup_data
handling
• https://lkml.org/lkml/2015/2/28/272
• SETUP_E820_EXT, SETUP_EFI SETUP_DTB,
SETUP_PCI SETUP_KASLR
• Those setup_data chunks are not page align when
allocating. That causes hole between e820 entries, then
kernel register it as 1 page nosave regions. <== random
address per boot!
33
Misidentification of nosave region (2)
• arch/x86/kernel/e820.c
Register hole between two
e820 region to nosave as
1 page region
34
Kill E820_RESERVED_KERN
• Yinghai Lu [PATCH] x86: Kill E820_RESERVED_KERN
• https://lkml.org/lkml/2015/2/28/274
• Cleaning setup_data handler, remove E820_RESERVED_KERN from
e820 regions because setup_data are already protected by memblock.
• Avoid wasting memory, fix page align problem in e820.
• Linux Kernel bug #96111 Unreliable hibernation on Lenovo X230
• https://bugzilla.kernel.org/show_bug.cgi?id=96111
• 84c91b7ae commit in v3.17-rc1
Reverted by f82daee49 commit in v4.0
• Chen, Yu C [RFC PATCH] PM / hibernate: make sure each resuming
page is in current memory zones
• Waiting Yinghai Lu's patch for kill E820_RESERVED_KERN
35
EFI runtime services broken after S4 (1)
On some machines
36
EFI runtime services broken after S4 (2)
• Resume Boot:
VA 0xffffffefd244e60 is in Runtime Data region after hibernate resumed:
[ 0.125865] efi: mem26: [Runtime Data |RUN| | | | |WB|WT|WC|UC]
pa=[0x00000000bb3e5000-0x00000000bb445000) va=[0xfffffffefd1e5000-
0xfffffffefd245000) (0MB)
• Boot:
VA 0xffffffefd244e60 didn't mapped to any PA in hibernating kernel (image kernel):
[ 0.111002] efi: mem24: [Runtime Code |RUN| | | | |WB|WT|WC|UC]
pa=[0x00000000bb385000-0x00000000bb3e5000) va=[0xfffffffefd585000-
0xfffffffefd5e5000) (0MB)
[ 0.125883] efi: mem25: [Runtime Data |RUN| | | | |WB|WT|WC|UC]
pa=[0x00000000bb3e5000-0x00000000bb445000) va=[0xfffffffefd3e5000-
0xfffffffefd445000) (0MB)
[ 0.140764] efi: mem29: [Boot Data | | | | | |WB|WT|WC|UC]
pa=[0x00000000bb7ff000-0x00000000bb800000) va=[0xfffffffefd1ff000-
0xfffffffefd200000) (0MB)
37
Memory mapping of EFI runtime services (1)
• Borislav Petkov [PATCH] EFI: Runtime services virtual mapping
• d2f7cbe7 merged since v3.14 kernel
• We map the EFI regions needed for runtime services non-
contiguously, with preserved alignment on virtual addresses
starting from -4G down for a total max space of 64G.
• Documentation/x86/x86_64/mm.txt
->trampoline_pgd:
We map EFI runtime services in the aforementioned PGD in the
virtual range of 64Gb (arbitrarily set, can be raised if needed)
0xffffffef00000000 - 0xffffffff00000000
38
Memory mapping of EFI runtime services (2)
• Virtual memory map x86_64 of runtime service –
trampoline_pgd
Runtime Code
Runtime Data
0xffffffffffffffff
0x0000000000000000
0x00000000bb385000
0xffffffff00000000
4 G
64 G
0x00000000bb3e5000
0xffffffef00000000
Boot Data
Boot Code1:1 mapping
workaround
1:1 mapping
workaround
1:1 mapping
workaround
1:1 mapping
workaround
Boot Data
Boot Data
arch/x86/platform/efi/efi_64.c::efi_map_region()
39
Memory mapping of EFI runtime services (3)
• In -4G area:
Runtime Code
Runtime Data
0xffffffff00000000
0xffffffef00000000
Boot Data
Boot Code
64 G
Boot Data
Boot Data
2M-aligned
arch/x86/platform/efi/efi_64.c::efi_map_region()
40
Should fix runtime services address after S4
• Lee, Chun-Yi [PATCH] x86_64/efi: Mapping Boot and
Runtime EFI memory regions to different starting virtual
address
• VA of EFI runtime services should may changed
between hibernation, but that's fine when PA doesn't
change.
• Should checking more detail about EFI page table when
hibernation recovery.
ChallengesChallenges
42
Hibernation's Challenge
• KASLR (Kernel address space layout randomization)
• Exclusive with hibernation
• Intel Rapid Start
• A replacement of kernel hibernation
• May also conflict with KASLR
• NVDIMM
• Do not need hibernation anymore
Q&AQ&A
SUSE is HiringSUSE is Hiring
Please search “SUSE Careers”Please search “SUSE Careers”
andand
http://www.104.com.tw/http://www.104.com.tw/
SUMMIT 2015
OPENSUSE ASIA
Taipei,R.O.C(Taiwan)
Bring you to the free world
46
47
48
Join us on:
www.opensuse.org

Weitere ähnliche Inhalte

Was ist angesagt?

malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in LinuxAdrian Huang
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernelAdrian Huang
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdfAdrian Huang
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)shimosawa
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionGene Chang
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory ManagementNi Zo-Ma
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Adrian Huang
 
Reverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux KernelReverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux KernelAdrian Huang
 
Memory Compaction in Linux Kernel.pdf
Memory Compaction in Linux Kernel.pdfMemory Compaction in Linux Kernel.pdf
Memory Compaction in Linux Kernel.pdfAdrian Huang
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)shimosawa
 
Virtualization Support in ARMv8+
Virtualization Support in ARMv8+Virtualization Support in ARMv8+
Virtualization Support in ARMv8+Aananth C N
 
Memory management in Linux kernel
Memory management in Linux kernelMemory management in Linux kernel
Memory management in Linux kernelVadim Nikitin
 
Physical Memory Models.pdf
Physical Memory Models.pdfPhysical Memory Models.pdf
Physical Memory Models.pdfAdrian Huang
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Pankaj Suryawanshi
 
Linux Kernel Module - For NLKB
Linux Kernel Module - For NLKBLinux Kernel Module - For NLKB
Linux Kernel Module - For NLKBshimosawa
 
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelKernel TLV
 
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)Linaro
 

Was ist angesagt? (20)

malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in Linux
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdf
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
 
Memory management in linux
Memory management in linuxMemory management in linux
Memory management in linux
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...
 
Reverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux KernelReverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux Kernel
 
Qemu Pcie
Qemu PcieQemu Pcie
Qemu Pcie
 
Memory Compaction in Linux Kernel.pdf
Memory Compaction in Linux Kernel.pdfMemory Compaction in Linux Kernel.pdf
Memory Compaction in Linux Kernel.pdf
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
 
Virtualization Support in ARMv8+
Virtualization Support in ARMv8+Virtualization Support in ARMv8+
Virtualization Support in ARMv8+
 
spinlock.pdf
spinlock.pdfspinlock.pdf
spinlock.pdf
 
Memory management in Linux kernel
Memory management in Linux kernelMemory management in Linux kernel
Memory management in Linux kernel
 
Physical Memory Models.pdf
Physical Memory Models.pdfPhysical Memory Models.pdf
Physical Memory Models.pdf
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)
 
Linux Kernel Module - For NLKB
Linux Kernel Module - For NLKBLinux Kernel Module - For NLKB
Linux Kernel Module - For NLKB
 
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux Kernel
 
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
 

Andere mochten auch

LCU13: ACPI power state mapping
LCU13: ACPI power state mappingLCU13: ACPI power state mapping
LCU13: ACPI power state mappingLinaro
 
Extracting Linux kernel feature model changes with FMDiff
Extracting Linux kernel feature model changes with FMDiff Extracting Linux kernel feature model changes with FMDiff
Extracting Linux kernel feature model changes with FMDiff NicoDintzner
 
Comp tia flashcards set 1 (15 cards) acpi cmos
Comp tia flashcards set 1 (15 cards) acpi   cmosComp tia flashcards set 1 (15 cards) acpi   cmos
Comp tia flashcards set 1 (15 cards) acpi cmosSue Long Smith
 
Kernel Recipes 2015: Representing device-tree peripherals in ACPI
Kernel Recipes 2015: Representing device-tree peripherals in ACPIKernel Recipes 2015: Representing device-tree peripherals in ACPI
Kernel Recipes 2015: Representing device-tree peripherals in ACPIAnne Nicolas
 
Q2.12: Power Management Across OSs
Q2.12: Power Management Across OSsQ2.12: Power Management Across OSs
Q2.12: Power Management Across OSsLinaro
 
Note - (EDK2) Acpi Tables Compile and Install
Note - (EDK2) Acpi Tables Compile and InstallNote - (EDK2) Acpi Tables Compile and Install
Note - (EDK2) Acpi Tables Compile and Installboyw165
 
BIOS, Linux and Firmware Test Suite in-between
BIOS, Linux and  Firmware Test Suite in-betweenBIOS, Linux and  Firmware Test Suite in-between
BIOS, Linux and Firmware Test Suite in-betweenAlex Hung
 
Las16 200 - firmware summit - ras what is it- why do we need it
Las16 200 - firmware summit - ras what is it- why do we need itLas16 200 - firmware summit - ras what is it- why do we need it
Las16 200 - firmware summit - ras what is it- why do we need itLinaro
 
Hardware Probing in the Linux Kernel
Hardware Probing in the Linux KernelHardware Probing in the Linux Kernel
Hardware Probing in the Linux KernelKernel TLV
 
High Performance Storage Devices in the Linux Kernel
High Performance Storage Devices in the Linux KernelHigh Performance Storage Devices in the Linux Kernel
High Performance Storage Devices in the Linux KernelKernel TLV
 
Power aware operating system
Power aware operating systemPower aware operating system
Power aware operating systemRajan Kumar Yadav
 
BUD17-TR01: Philosophy of Open Source
BUD17-TR01: Philosophy of Open SourceBUD17-TR01: Philosophy of Open Source
BUD17-TR01: Philosophy of Open SourceLinaro
 

Andere mochten auch (15)

LCU13: ACPI power state mapping
LCU13: ACPI power state mappingLCU13: ACPI power state mapping
LCU13: ACPI power state mapping
 
Status update-qemu-pcie
Status update-qemu-pcieStatus update-qemu-pcie
Status update-qemu-pcie
 
70 271 Stu Chap07
70 271 Stu Chap0770 271 Stu Chap07
70 271 Stu Chap07
 
Extracting Linux kernel feature model changes with FMDiff
Extracting Linux kernel feature model changes with FMDiff Extracting Linux kernel feature model changes with FMDiff
Extracting Linux kernel feature model changes with FMDiff
 
Comp tia flashcards set 1 (15 cards) acpi cmos
Comp tia flashcards set 1 (15 cards) acpi   cmosComp tia flashcards set 1 (15 cards) acpi   cmos
Comp tia flashcards set 1 (15 cards) acpi cmos
 
Kernel Recipes 2015: Representing device-tree peripherals in ACPI
Kernel Recipes 2015: Representing device-tree peripherals in ACPIKernel Recipes 2015: Representing device-tree peripherals in ACPI
Kernel Recipes 2015: Representing device-tree peripherals in ACPI
 
Q2.12: Power Management Across OSs
Q2.12: Power Management Across OSsQ2.12: Power Management Across OSs
Q2.12: Power Management Across OSs
 
Note - (EDK2) Acpi Tables Compile and Install
Note - (EDK2) Acpi Tables Compile and InstallNote - (EDK2) Acpi Tables Compile and Install
Note - (EDK2) Acpi Tables Compile and Install
 
BIOS, Linux and Firmware Test Suite in-between
BIOS, Linux and  Firmware Test Suite in-betweenBIOS, Linux and  Firmware Test Suite in-between
BIOS, Linux and Firmware Test Suite in-between
 
Las16 200 - firmware summit - ras what is it- why do we need it
Las16 200 - firmware summit - ras what is it- why do we need itLas16 200 - firmware summit - ras what is it- why do we need it
Las16 200 - firmware summit - ras what is it- why do we need it
 
Hardware Probing in the Linux Kernel
Hardware Probing in the Linux KernelHardware Probing in the Linux Kernel
Hardware Probing in the Linux Kernel
 
High Performance Storage Devices in the Linux Kernel
High Performance Storage Devices in the Linux KernelHigh Performance Storage Devices in the Linux Kernel
High Performance Storage Devices in the Linux Kernel
 
UEFI presentation
UEFI presentationUEFI presentation
UEFI presentation
 
Power aware operating system
Power aware operating systemPower aware operating system
Power aware operating system
 
BUD17-TR01: Philosophy of Open Source
BUD17-TR01: Philosophy of Open SourceBUD17-TR01: Philosophy of Open Source
BUD17-TR01: Philosophy of Open Source
 

Ähnlich wie The e820 trap of Linux kernel hibernation

Когда предрелизный не только софт
Когда предрелизный не только софтКогда предрелизный не только софт
Когда предрелизный не только софтCEE-SEC(R)
 
Troubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device DriversTroubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device DriversSatpal Parmar
 
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1Jagadisha Maiya
 
Operating Systems (slides)
Operating Systems (slides)Operating Systems (slides)
Operating Systems (slides)wx672
 
My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)Gustavo Rene Antunez
 
Computer Organization and Architecture 10th - William Stallings, Ch01.pdf
Computer Organization and Architecture 10th - William Stallings, Ch01.pdfComputer Organization and Architecture 10th - William Stallings, Ch01.pdf
Computer Organization and Architecture 10th - William Stallings, Ch01.pdfShahdAbdElsamea2
 
Advanced Root Cause Analysis
Advanced Root Cause AnalysisAdvanced Root Cause Analysis
Advanced Root Cause AnalysisEric Sloof
 
Trusted firmware deep_dive_v1.0_
Trusted firmware deep_dive_v1.0_Trusted firmware deep_dive_v1.0_
Trusted firmware deep_dive_v1.0_Linaro
 
intel_x86_pm.pptx
intel_x86_pm.pptxintel_x86_pm.pptx
intel_x86_pm.pptxAtul Vaish
 
1 study of motherboard
1 study of motherboard1 study of motherboard
1 study of motherboardAnkit Dubey
 
How to-boot-linuxl-on-your-soc-boards
How to-boot-linuxl-on-your-soc-boardsHow to-boot-linuxl-on-your-soc-boards
How to-boot-linuxl-on-your-soc-boardsLiang Yan
 
Cpu And Memory Events
Cpu And Memory EventsCpu And Memory Events
Cpu And Memory EventsAero Plane
 
Embedded Fest 2019. Руслан Биловол. Linux Boot: The Big Bang theory
Embedded Fest 2019. Руслан Биловол. Linux Boot: The Big Bang theoryEmbedded Fest 2019. Руслан Биловол. Linux Boot: The Big Bang theory
Embedded Fest 2019. Руслан Биловол. Linux Boot: The Big Bang theoryEmbeddedFest
 
Linux Kernel Tour
Linux Kernel TourLinux Kernel Tour
Linux Kernel Toursamrat das
 

Ähnlich wie The e820 trap of Linux kernel hibernation (20)

Когда предрелизный не только софт
Когда предрелизный не только софтКогда предрелизный не только софт
Когда предрелизный не только софт
 
Troubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device DriversTroubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device Drivers
 
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
 
Operating Systems (slides)
Operating Systems (slides)Operating Systems (slides)
Operating Systems (slides)
 
My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)
 
Computer Organization and Architecture 10th - William Stallings, Ch01.pdf
Computer Organization and Architecture 10th - William Stallings, Ch01.pdfComputer Organization and Architecture 10th - William Stallings, Ch01.pdf
Computer Organization and Architecture 10th - William Stallings, Ch01.pdf
 
Advanced Root Cause Analysis
Advanced Root Cause AnalysisAdvanced Root Cause Analysis
Advanced Root Cause Analysis
 
Analisis_avanzado_vmware
Analisis_avanzado_vmwareAnalisis_avanzado_vmware
Analisis_avanzado_vmware
 
C C N A Day2
C C N A  Day2C C N A  Day2
C C N A Day2
 
Defense_Presentation
Defense_PresentationDefense_Presentation
Defense_Presentation
 
Trusted firmware deep_dive_v1.0_
Trusted firmware deep_dive_v1.0_Trusted firmware deep_dive_v1.0_
Trusted firmware deep_dive_v1.0_
 
Introduction to Modern U-Boot
Introduction to Modern U-BootIntroduction to Modern U-Boot
Introduction to Modern U-Boot
 
intel_x86_pm.pptx
intel_x86_pm.pptxintel_x86_pm.pptx
intel_x86_pm.pptx
 
linux-memory-explained.pdf
linux-memory-explained.pdflinux-memory-explained.pdf
linux-memory-explained.pdf
 
OS_Intro_Chap_1.ppt
OS_Intro_Chap_1.pptOS_Intro_Chap_1.ppt
OS_Intro_Chap_1.ppt
 
1 study of motherboard
1 study of motherboard1 study of motherboard
1 study of motherboard
 
How to-boot-linuxl-on-your-soc-boards
How to-boot-linuxl-on-your-soc-boardsHow to-boot-linuxl-on-your-soc-boards
How to-boot-linuxl-on-your-soc-boards
 
Cpu And Memory Events
Cpu And Memory EventsCpu And Memory Events
Cpu And Memory Events
 
Embedded Fest 2019. Руслан Биловол. Linux Boot: The Big Bang theory
Embedded Fest 2019. Руслан Биловол. Linux Boot: The Big Bang theoryEmbedded Fest 2019. Руслан Биловол. Linux Boot: The Big Bang theory
Embedded Fest 2019. Руслан Биловол. Linux Boot: The Big Bang theory
 
Linux Kernel Tour
Linux Kernel TourLinux Kernel Tour
Linux Kernel Tour
 

Kürzlich hochgeladen

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 

Kürzlich hochgeladen (20)

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 

The e820 trap of Linux kernel hibernation

  • 1. The e820 trap of Linux kernelThe e820 trap of Linux kernel hibernationhibernation AugAug, 2015, COSCUP 2015, Taipei, 2015, COSCUP 2015, Taipei Joey Lee, SUSE Labs Taipei
  • 2. 2 Agenda • Fundamental • Hibernation (suspen to disk) • e820, EFI memmap • e820 shift • Platform vs. Shutdown • memory size changing • EFI memmap shift • setup_data and nosave regions • EFI runtime services broken after S4 • Challenges • Q&A
  • 4. 4 Memory (physical) pfn = 0 pfn = max_pfn
  • 6. 6 Hibernation (suspend to disk) • Create snapshot image of runtime memory. • Store snapshot image to swap partition or file. • Restore snapshot image to memory.
  • 8. 8 Memory (physical) pfn = 0 pfn = max_pfn
  • 9. 9 Memory (BIOS memory map) 0 max_pfn 0 max_pfn Boot Boot
  • 10. 10 e820 • Wikipedia: • e820 is shorthand to refer to the facility by which the BIOS of x86-based computer systems reports the memory map to the operating system or boot loader. • It is accessed via the int 15h call, by setting the AX register to value E820 in hexadecimal. It reports which memory address ranges are usable and which are reserved for use by the BIOS.
  • 11. 11
  • 12. 12 e820 entry type Type Kernel Define String in dmesg Description Type 1 E820_RAM usable, System RAM Usable (normal) RAM Type 2 E820_RESERVED reserved, reserved Reserved - unusable Type 3 E820_ACPI ACPI data, ACPI Tables ACPI reclaimable memory Type 4 E820_NVS* ACPI NVS, ACPI Non-volatile Storage ACPI NVS memory, ACPI Non-Volatile-Sleeping Memory (NVS) Type 5 E820_UNUSABLE Unusable, Unusable memory Area containing bad memory * drivers/acpi/nvs.c::suspend_nvs_*() handle ACPI NVS for S4
  • 13. 13 Memory (BIOS memory map) 0 max_pfn 0 max_pfn Boot Boot
  • 14. 14 Memory (runtime) 0 max_pfn 0 max_pfn Boot ACPI NVS reserved ACPI data reserved Boot useable useable useable useable useable useable 0 max_pfn Boot ACPI NVS reserved ACPI data reserved useable useable useable useable useable useable OS
  • 15. 15 EFI memory map • EFI spec v2.5 • EFI_BOOT_SERVICES.GetMemoryMap() • Returns the current memory map. • 6.2 Memory Allocation Services • Table 25. Memory Type Usage before ExitBootServices() • Table 26. Memory Type Usage after ExitBootServices()
  • 16. 16
  • 17. 17 e820 entry type vs. EFI memory region type E820 Type E820 entry type EFI memory region type Type 1 E820_RAM EFI_LOADER_CODE (type 1) EFI_LOADER_DATA (type 2) EFI_BOOT_SERVICES_CODE (type 3) EFI_BOOT_SERVICES_DATA (type 4) EFI_CONVENTIONAL_MEMORY (type 7) Type 2 E820_RESERVED EFI_RESERVED_TYPE (type 0) EFI_RUNTIME_SERVICES_CODE (type 5) EFI_RUNTIME_SERVICES_DATA (type 6) EFI_MEMORY_MAPPED_IO (type 11) EFI_MEMORY_MAPPED_IO_PORT_SPACE (type 12) EFI_PAL_CODE (type 13) Type 3 E820_ACPI EFI_ACPI_RECLAIM_MEMORY (type 9) Type 4 E820_NVS EFI_ACPI_MEMORY_NVS (type 10) Type 5 E820_UNUSABLE EFI_UNUSABLE_MEMORY (type 8) New* E820_PMEM EFI_PERSISTENT_MEMORY (type 14) * v4.2-rc4 arch/x86/boot/compressed/eboot.c::setup_e820()
  • 19. 19
  • 20. 20
  • 22. 22 e820 shift (2) • Boot: • [ 0.000000] BIOS-e820: [mem 0x0000000068f45000-0x0000000069d4ffff] usable • Resume Boot: • [ 0.000000] BIOS-e820: [mem 0x0000000069d4f000-0x0000000069e12fff] reserved • [ 0.000000] PM: Registered nosave memory: [mem 0x69d4f000-0x69e12fff] • [ 17.410733] PM: Image loading progress: 0% • [ 17.929495] BUG: unable to handle kernel paging request at ffff880069d4f000 • [ 17.933469] IP: [<ffffffff810a1cf0>] load_image_lzo+0x810/0xe40 • Page fault address is in usable memory entry when boot, but in reserved memory entry when resume boot.
  • 23. 23 e820 shift (3) 0 max_pfn Boot ACPI NVS reserved ACPI data reserved useable useable useable useable useable useable max_pfn Boot ACPI NVS reserved ACPI data reserved useable useable useable useable useable useable 0 Boot Resume Boot Useable address in reserved region
  • 24. 24 Checking e820 shift: • Lee, Chun-Yi [PATCH] PM / hibernate: avoid unsafe pages in e820 reserved regions: • 84c91b7ae commit in v3.17-rc1 • https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=84c91b7 • Reverted by f82daee49 commit in v4.0 • Waiting “Yinghai Lu<> [PATCH]x86: Kill E820_RESERVED_KERN” • Lee, Chun-Yi [PATCH] Hibernate: save e820 table to snapshot header for comparison • https://lkml.org/lkml/2014/8/11/166
  • 25. 25 Platform vs. Shutdown (1) • Different modes of hibernation: • cat /sys/power/disk [platform] shutdown reboot suspend • Platform mode depends on _S4 support by BIOS: [ 1.080004] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [_S4_] (20130725/hwxface-571) • ACPI spec 6.0: • Table 7-234 BIOS-Supplied Control Methods for System-Level Functions • _S4: Package that defines system _S4 state mode. • 16.3.2 BIOS Initialization of Memory (since ACPI v1.0): • Note: The memory information returned from the system address map reporting interfaces should be the same before and after an S4 sleep. OSPM will invoke E820 interfaces on IA-PC-based legacy systems or the GetMemoryMap() interface on UEFI-enabled systems
  • 26. 26 Platform vs. Shutdown (2) • Documentation/power/swsusp.txt in kernel • Q: What is the difference between "platform" and "shutdown"? • A: "platform" is actually right thing to do where supported, but "shutdown" is most reliable (except on ACPI systems). • Linux Kernel bug #77571: • https://bugzilla.kernel.org/show_bug.cgi?id=77571 • The same page fault when writing snapshot image to page buffer. • Bug reporter uses “shutdown” but not “platform”. After using “platform”, bug reporter can not reproduce issue. • That's better using platform when BIOS support _S4. User should aware that has risk when using “shutdown”.
  • 27. 27 Memory size mismatch (1) • PM: Loading and decompressing image data (495448 pages)... [ 3.834831] PM: Image mismatch: memory size [ 3.834851] PM: Read 1981792 kbytes in 0.01 seconds (198179.20 MB/s) [ 3.836147] PM: Error -1 resuming [ 3.836162] PM: Failed to load hibernation image, recovering. • Normally: On node 0 totalpages: 4177255 When issue happened: On node 0 totalpages: 4177256 <== mismatch • for_each_online_node(nid) phys_pages += node_present_pages(nid); • kernel/power/snapshot.c::check_header() if (!reason && info->num_physpages != get_num_physpages()) reason = "memory size"; if (reason) { printk(KERN_ERR "PM: Image mismatch: %sn", reason); return -EPERM; }
  • 28. 28 Memory size mismatch (2) • Boot Memory map of Boot
  • 29. 29 Memory size mismatch (3) • Resume Boot Memory map of Resume Boot
  • 30. EFI memmap shiftEFI memmap shift
  • 31. 31 Misidentification of nosave region (1) 1 page In usable Not align EFI_LOADER_DATA
  • 32. 32 setup_data and E820_RESERVED_KERN • setup_data: a linked list for carrying data with boot_params to later boot stage. • Allocated in EFI stub, reserved via memblock and e820. • Yinghai Lu<> [PATCH] x86, boot: clean up setup_data handling • https://lkml.org/lkml/2015/2/28/272 • SETUP_E820_EXT, SETUP_EFI SETUP_DTB, SETUP_PCI SETUP_KASLR • Those setup_data chunks are not page align when allocating. That causes hole between e820 entries, then kernel register it as 1 page nosave regions. <== random address per boot!
  • 33. 33 Misidentification of nosave region (2) • arch/x86/kernel/e820.c Register hole between two e820 region to nosave as 1 page region
  • 34. 34 Kill E820_RESERVED_KERN • Yinghai Lu [PATCH] x86: Kill E820_RESERVED_KERN • https://lkml.org/lkml/2015/2/28/274 • Cleaning setup_data handler, remove E820_RESERVED_KERN from e820 regions because setup_data are already protected by memblock. • Avoid wasting memory, fix page align problem in e820. • Linux Kernel bug #96111 Unreliable hibernation on Lenovo X230 • https://bugzilla.kernel.org/show_bug.cgi?id=96111 • 84c91b7ae commit in v3.17-rc1 Reverted by f82daee49 commit in v4.0 • Chen, Yu C [RFC PATCH] PM / hibernate: make sure each resuming page is in current memory zones • Waiting Yinghai Lu's patch for kill E820_RESERVED_KERN
  • 35. 35 EFI runtime services broken after S4 (1) On some machines
  • 36. 36 EFI runtime services broken after S4 (2) • Resume Boot: VA 0xffffffefd244e60 is in Runtime Data region after hibernate resumed: [ 0.125865] efi: mem26: [Runtime Data |RUN| | | | |WB|WT|WC|UC] pa=[0x00000000bb3e5000-0x00000000bb445000) va=[0xfffffffefd1e5000- 0xfffffffefd245000) (0MB) • Boot: VA 0xffffffefd244e60 didn't mapped to any PA in hibernating kernel (image kernel): [ 0.111002] efi: mem24: [Runtime Code |RUN| | | | |WB|WT|WC|UC] pa=[0x00000000bb385000-0x00000000bb3e5000) va=[0xfffffffefd585000- 0xfffffffefd5e5000) (0MB) [ 0.125883] efi: mem25: [Runtime Data |RUN| | | | |WB|WT|WC|UC] pa=[0x00000000bb3e5000-0x00000000bb445000) va=[0xfffffffefd3e5000- 0xfffffffefd445000) (0MB) [ 0.140764] efi: mem29: [Boot Data | | | | | |WB|WT|WC|UC] pa=[0x00000000bb7ff000-0x00000000bb800000) va=[0xfffffffefd1ff000- 0xfffffffefd200000) (0MB)
  • 37. 37 Memory mapping of EFI runtime services (1) • Borislav Petkov [PATCH] EFI: Runtime services virtual mapping • d2f7cbe7 merged since v3.14 kernel • We map the EFI regions needed for runtime services non- contiguously, with preserved alignment on virtual addresses starting from -4G down for a total max space of 64G. • Documentation/x86/x86_64/mm.txt ->trampoline_pgd: We map EFI runtime services in the aforementioned PGD in the virtual range of 64Gb (arbitrarily set, can be raised if needed) 0xffffffef00000000 - 0xffffffff00000000
  • 38. 38 Memory mapping of EFI runtime services (2) • Virtual memory map x86_64 of runtime service – trampoline_pgd Runtime Code Runtime Data 0xffffffffffffffff 0x0000000000000000 0x00000000bb385000 0xffffffff00000000 4 G 64 G 0x00000000bb3e5000 0xffffffef00000000 Boot Data Boot Code1:1 mapping workaround 1:1 mapping workaround 1:1 mapping workaround 1:1 mapping workaround Boot Data Boot Data arch/x86/platform/efi/efi_64.c::efi_map_region()
  • 39. 39 Memory mapping of EFI runtime services (3) • In -4G area: Runtime Code Runtime Data 0xffffffff00000000 0xffffffef00000000 Boot Data Boot Code 64 G Boot Data Boot Data 2M-aligned arch/x86/platform/efi/efi_64.c::efi_map_region()
  • 40. 40 Should fix runtime services address after S4 • Lee, Chun-Yi [PATCH] x86_64/efi: Mapping Boot and Runtime EFI memory regions to different starting virtual address • VA of EFI runtime services should may changed between hibernation, but that's fine when PA doesn't change. • Should checking more detail about EFI page table when hibernation recovery.
  • 42. 42 Hibernation's Challenge • KASLR (Kernel address space layout randomization) • Exclusive with hibernation • Intel Rapid Start • A replacement of kernel hibernation • May also conflict with KASLR • NVDIMM • Do not need hibernation anymore
  • 44. SUSE is HiringSUSE is Hiring Please search “SUSE Careers”Please search “SUSE Careers” andand http://www.104.com.tw/http://www.104.com.tw/
  • 46. 46
  • 47. 47

Hinweis der Redaktion

  1. Theory Mathematics