6. 6
Hibernation (suspend to disk)
• Create snapshot image of runtime memory.
• Store snapshot image to swap partition or file.
• Restore snapshot image to memory.
10. 10
e820
• Wikipedia:
• e820 is shorthand to refer to the facility by which the
BIOS of x86-based computer systems reports the
memory map to the operating system or boot loader.
• It is accessed via the int 15h call, by setting the AX
register to value E820 in hexadecimal. It reports which
memory address ranges are usable and which are
reserved for use by the BIOS.
24. 24
Checking e820 shift:
• Lee, Chun-Yi [PATCH] PM / hibernate: avoid unsafe pages
in e820 reserved regions:
• 84c91b7ae commit in v3.17-rc1
• https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=84c91b7
• Reverted by f82daee49 commit in v4.0
• Waiting “Yinghai Lu<> [PATCH]x86: Kill E820_RESERVED_KERN”
• Lee, Chun-Yi [PATCH] Hibernate: save e820 table to
snapshot header for comparison
• https://lkml.org/lkml/2014/8/11/166
25. 25
Platform vs. Shutdown (1)
• Different modes of hibernation:
• cat /sys/power/disk
[platform] shutdown reboot suspend
• Platform mode depends on _S4 support by BIOS:
[ 1.080004] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [_S4_]
(20130725/hwxface-571)
• ACPI spec 6.0:
• Table 7-234 BIOS-Supplied Control Methods for System-Level Functions
• _S4: Package that defines system _S4 state mode.
• 16.3.2 BIOS Initialization of Memory (since ACPI v1.0):
• Note: The memory information returned from the system address map
reporting interfaces should be the same before and after an S4 sleep.
OSPM will invoke E820 interfaces on IA-PC-based legacy systems or the
GetMemoryMap() interface on UEFI-enabled systems
26. 26
Platform vs. Shutdown (2)
• Documentation/power/swsusp.txt in kernel
• Q: What is the difference between "platform" and "shutdown"?
• A: "platform" is actually right thing to do where supported, but
"shutdown" is most reliable (except on ACPI systems).
• Linux Kernel bug #77571:
• https://bugzilla.kernel.org/show_bug.cgi?id=77571
• The same page fault when writing snapshot image to page buffer.
• Bug reporter uses “shutdown” but not “platform”.
After using “platform”, bug reporter can not reproduce issue.
• That's better using platform when BIOS support _S4.
User should aware that has risk when using “shutdown”.
32. 32
setup_data and E820_RESERVED_KERN
• setup_data: a linked list for carrying data with boot_params
to later boot stage.
• Allocated in EFI stub, reserved via memblock and e820.
• Yinghai Lu<> [PATCH] x86, boot: clean up setup_data
handling
• https://lkml.org/lkml/2015/2/28/272
• SETUP_E820_EXT, SETUP_EFI SETUP_DTB,
SETUP_PCI SETUP_KASLR
• Those setup_data chunks are not page align when
allocating. That causes hole between e820 entries, then
kernel register it as 1 page nosave regions. <== random
address per boot!
33. 33
Misidentification of nosave region (2)
• arch/x86/kernel/e820.c
Register hole between two
e820 region to nosave as
1 page region
34. 34
Kill E820_RESERVED_KERN
• Yinghai Lu [PATCH] x86: Kill E820_RESERVED_KERN
• https://lkml.org/lkml/2015/2/28/274
• Cleaning setup_data handler, remove E820_RESERVED_KERN from
e820 regions because setup_data are already protected by memblock.
• Avoid wasting memory, fix page align problem in e820.
• Linux Kernel bug #96111 Unreliable hibernation on Lenovo X230
• https://bugzilla.kernel.org/show_bug.cgi?id=96111
• 84c91b7ae commit in v3.17-rc1
Reverted by f82daee49 commit in v4.0
• Chen, Yu C [RFC PATCH] PM / hibernate: make sure each resuming
page is in current memory zones
• Waiting Yinghai Lu's patch for kill E820_RESERVED_KERN
36. 36
EFI runtime services broken after S4 (2)
• Resume Boot:
VA 0xffffffefd244e60 is in Runtime Data region after hibernate resumed:
[ 0.125865] efi: mem26: [Runtime Data |RUN| | | | |WB|WT|WC|UC]
pa=[0x00000000bb3e5000-0x00000000bb445000) va=[0xfffffffefd1e5000-
0xfffffffefd245000) (0MB)
• Boot:
VA 0xffffffefd244e60 didn't mapped to any PA in hibernating kernel (image kernel):
[ 0.111002] efi: mem24: [Runtime Code |RUN| | | | |WB|WT|WC|UC]
pa=[0x00000000bb385000-0x00000000bb3e5000) va=[0xfffffffefd585000-
0xfffffffefd5e5000) (0MB)
[ 0.125883] efi: mem25: [Runtime Data |RUN| | | | |WB|WT|WC|UC]
pa=[0x00000000bb3e5000-0x00000000bb445000) va=[0xfffffffefd3e5000-
0xfffffffefd445000) (0MB)
[ 0.140764] efi: mem29: [Boot Data | | | | | |WB|WT|WC|UC]
pa=[0x00000000bb7ff000-0x00000000bb800000) va=[0xfffffffefd1ff000-
0xfffffffefd200000) (0MB)
37. 37
Memory mapping of EFI runtime services (1)
• Borislav Petkov [PATCH] EFI: Runtime services virtual mapping
• d2f7cbe7 merged since v3.14 kernel
• We map the EFI regions needed for runtime services non-
contiguously, with preserved alignment on virtual addresses
starting from -4G down for a total max space of 64G.
• Documentation/x86/x86_64/mm.txt
->trampoline_pgd:
We map EFI runtime services in the aforementioned PGD in the
virtual range of 64Gb (arbitrarily set, can be raised if needed)
0xffffffef00000000 - 0xffffffff00000000
38. 38
Memory mapping of EFI runtime services (2)
• Virtual memory map x86_64 of runtime service –
trampoline_pgd
Runtime Code
Runtime Data
0xffffffffffffffff
0x0000000000000000
0x00000000bb385000
0xffffffff00000000
4 G
64 G
0x00000000bb3e5000
0xffffffef00000000
Boot Data
Boot Code1:1 mapping
workaround
1:1 mapping
workaround
1:1 mapping
workaround
1:1 mapping
workaround
Boot Data
Boot Data
arch/x86/platform/efi/efi_64.c::efi_map_region()
39. 39
Memory mapping of EFI runtime services (3)
• In -4G area:
Runtime Code
Runtime Data
0xffffffff00000000
0xffffffef00000000
Boot Data
Boot Code
64 G
Boot Data
Boot Data
2M-aligned
arch/x86/platform/efi/efi_64.c::efi_map_region()
40. 40
Should fix runtime services address after S4
• Lee, Chun-Yi [PATCH] x86_64/efi: Mapping Boot and
Runtime EFI memory regions to different starting virtual
address
• VA of EFI runtime services should may changed
between hibernation, but that's fine when PA doesn't
change.
• Should checking more detail about EFI page table when
hibernation recovery.
42. 42
Hibernation's Challenge
• KASLR (Kernel address space layout randomization)
• Exclusive with hibernation
• Intel Rapid Start
• A replacement of kernel hibernation
• May also conflict with KASLR
• NVDIMM
• Do not need hibernation anymore