Resource: LCE13
Name: LEG - RAS Daemon and ACPI
Date: 11-07-2013
Speaker: Robert Richter and Tomasz Nowicki
Video: https://www.youtube.com/watch?v=Ot2VBF-oerg&feature=c4-overview&list=UUIVqQKxCyQLJS6xvSmfndLA
1. Group photograph at Linaro Connect in Copenhagen
Monday 29 Oct 2012
LCE13, 10th July 2013
Robert Richter <robert.richter@linaro.org>
Tomasz Nowicki <tomasz.nowicki@linaro.org>
RAS Daemon and ACPI
2. www.linaro.org
Agenda
● RAS Daemon
○ problem statement and alternatives
○ perf-based solution, recall the arch
○ reference driver (non ACPI)
○ status, patchset
○ next steps
● APEI and ACPI error injection
○ how does it work?
■ error reporting mechanism
■ error injecting mechanism
○ status
○ NMI equivalent
3. www.linaro.org
RAS - Reliability, Availability and Serviceability
● Reliability: ensure correctness of data; detect, correct (if
possible) and isolate errors
● Availability: disable the malfunctioning component and
continue to operate with reduced resources
● Serviceability: early detect faulty hardware and reduce
time to replace it
4. www.linaro.org
Hardware Error Handling
Hardware error handling is important for higher levels of RAS:
● Error logging
● Error prediction
● Error recovery
We focus on hardware error logging and reporting:
Give data center operators a tool to examine hw errors in the
system for further analysis and actions (identify, disable and
replace hardware components)
5. www.linaro.org
RAS implementation - goals
● Provide a RAS framework in the kernel to collect hardware
errors from various sources and report them to userland for
further processing.
● Reference implementation of a RAS daemon to enable data
center operators to integrate it into their tools
● Use of perf_event_open syscall to access kernel event buffers
● Architecture independent, works on ARM and x86
(Collaborative work with Borislav Petkov from Suse)
● Upstream acceptance
7. www.linaro.org
RAS Components - Kernel
● Persistent perf events
○ running in the background and detached from any process
○ buffers are accessed readonly
○ shared buffers, multiple users for one event
○ can be enabled by the kernel during early boot, no userland
necessary to setup events
○ persistent pmu registering event dynamically in sysfs allows out-
of-the-box event setup with perf tools
● Tracepoints
○ Raw data sample, allows to transfer data structures from kernel to
userland
● Hardware drivers for error detection
○ Vendor specific
○ ACPI/APEI
8. www.linaro.org
RAS Components - Userland
● perf tools and libraries
○ perf syscall to access event buffers
○ Extend event parser for sysfs support of persistent events
○ Move necessary perf functions into liblk
● RAS daemon
○ Reading event buffers for reporting, analysis and actions
○ Report to syslogd
9. www.linaro.org
Persistent event patch set V2
● Posted in June, http://lkml.org/lkml/2013/6/11/394)
● Review by perf maintainers Ingo and PeterZ on lkml complete
● They generally agree with the approach
● A couple of items need to be addressed for upstream
acceptance, esp. to create persistent events on-the-fly:
○ ioctl functions to bind/unbind persistent events and processes
○ make all kind of events persistent, not only system-wide events
○ support all perf_event types
○ providing event information in the mmap header page?
○ see LEG-501
● If we can address all the issues, the patches are *likely* to go
upstream in the next merge window
10. www.linaro.org
Perf tools patches
● Patch set sent that modifies perf tools to use persistent
events (not yet upstream)
● Most important part: update event parser to be able to
describe every event in sysfs, esp. flags (currently limited
to config values of the event):
/sys/bus/event_source/devices/persistent/events/mce_record:persistent,config=106
/sys/bus/event_source/devices/persistent/format/persistent:attr5:23
● Persistent events run then out-of-the-box:
# perf top -e persistent/mce_record/
# perf record -e perstent/mce_record/ ...
11. www.linaro.org
Raw data defined as tracepoint
TRACE_EVENT(mce_record,
TP_PROTO(struct mce *m),
TP_ARGS(m),
TP_STRUCT__entry(
__field( u64, mcgcap )
__field( u64, mcgstatus )
__field( u64, status )
__field( u64, addr )
__field( u64, misc )
__field( u64, ip )
__field( u64, tsc )
__field( u64, walltime )
__field( u32, cpu )
__field( u32, cpuid )
__field( u32, apicid )
__field( u32, socketid )
__field( u8, cs )
__field( u8, bank )
__field( u8, cpuvendor )
),
...
12. www.linaro.org
RAS Prototype - Implementation
● Tracepoints collected with persistent events, recorded with
perf-record and processed with perf-script
● Daemon basically doing:
# perf record -e persistent/mce_record/ cat /dev/null | perf script -s rasd.pl
● Perl script for event processing:
# cat rasd.pl
sub process_event
{
my ($event, $attr, $sample, $raw_data) = @_;
pintf("--- mce_record: ----------------n");
print("raw_data: ",
( join '',
map { sprintf("%02x ", $_); } unpack("C*", $event) ), "n");
# more decoding:
...
}
16. www.linaro.org
RAS - Next Steps
● Work on persistent event patch set V3
● Enable RAS framework on ARM
● Add memory controller hardware driver
● Integrate ACPI/APEI
● Split perf tool code into liblk
● Implement RAS daemon
17. www.linaro.org
APEI and ACPI error injection
See: https://wiki.linaro.org/LEG/Engineering/Kernel/ACPI/EINJ for more details
GHES reporting mechanism
18. www.linaro.org
APEI and ACPI error injection
See: https://wiki.linaro.org/LEG/Engineering/Kernel/ACPI/EINJ for more details
Error injection mechanism
19. www.linaro.org
● APEI status
○ HEST
■ GHES is supported and tested
■ need to implement:
● NMI error notify source type equivalent since NMI is specific for x86
architecture, may require SoC-specific code
● record errors to region specified in BERT table within NMI context
● forward errors from GHES to RAS daemon as end point of ACPI path
○ ERST - not tested
○ BERT
■ not implemented yet
■ initial implementation exists but has not accepted yet
○ EINJ - ACPI error injection
■ Lack of mapping GPIO line (as error trigger source) to the appropriate
AML method (GPIO-signaled ACPI events)
■ Hack driver to enable error injection from user space (AML method is
called directly without hardware interaction)
● Difficulties with testing ACPI
○ ACPI part of UEFI is not available for ARMv7/v8 now
APEI and ACPI error injection
20. www.linaro.org
APEI and ACPI error injection
NMI equivalent, candidates?
● FIQ - Fast Interrupt for ARMv7/ARMv8
○ route to Hypervisor mode but still masked on Hypervisor mode
○ route to Secure Monitor but still masked on Secure Monitor mode
● Non-maskable FIQ (NMFI)
○ available for some ARMv7 implementation
○ still can be masked by interrupt entry
● SError/Abort
○ can not be triggered by external pin like FIQ
● Others?