SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
HKG18-TR10: Best Practices For
Getting Devices
Into LAVA
Steve McIntyre
Agenda
● Introduction
● LAVA
● CI
● Device Requirements for Automation
● Recommendations
Introduction
Over the seven years of automation in Linaro, the LAVA team have identified a
series of automated testing failures which can be traced to decisions made during
hardware design or firmware development.
Here is a summary of our experience. The aim is to provide background
information about why common failures occur, and recommendations on how to
design hardware and firmware to reduce problems in the future.
We describe some device design features as hard requirements to enable
successful automation, and some which are guaranteed to block automation.
History of LAVA
Linaro created the LAVA project in 2010
to automate testing of software using
real hardware
Linaro
Automated
Validation
Architecture
Hardest part of the job has always been
integrating new device types
Issues relate to hardware design and
firmware implementations
● Millions of test jobs
● 100+ very different types of
devices
○ ARM
○ x86
○ emulated
● Varied primary boot methods
○ U-Boot
○ UEFI
○ Fastboot
○ IoT
○ PXE
● 150+ devices in the Linaro lab,
covering more than 40 different
device types
● MultiNode and VLAN support
The CI process
A developer makes a change; that
change triggers a build; that build
triggers a test; that test reports back to
the developer how that change worked.
When looking for failures, beware:
● False negatives: not enough of the
software is fully tested, or if the
testing is not rigorous enough to
spot all problems.
● False positives: the test itself is
unreliable because of the test
software, the test infrastructure or
the test hardware.
Goals for LAVA
LAVA is one part of the CI loop - it provides device automation for testing. To work
well, we have to consider:
● Reliability in the CI loop
○ Reliable software
○ Reliable infrastructure
○ Reliable test devices
● Scalability of the CI loop
○ Lots of devices
○ Lots of tests
○ In parallel
● Cost
○ Minimise expensive hardware
○ Minimise expensive admin intervention
Device Requirements for Automation
● Reliability
● Uniqueness
● Scalability
● Deployment
Reliability
● Test jobs need to run reliably
○ Intermittent faults easily consume months of engineering time
○ False positives reduce engineer trust
● Infrastructure failures need to be identified correctly
○ Distinguish between infrastructure and test failures
● Infrastructure failures need to be minimised
○ 3 or 4 test job failures per week is too many
○ Devices may each run hundreds of test jobs in sequence
○ Do not let failures cascade
● All devices need to reach the minimum standard of reliability
○ It is not possible to automate all devices
○ It is not always possible to test all use cases of supported devices
○ Poor thermal control and stability issues under load will lower reliability
● Continuous Integration strategy remains the same for all devices
○ The problem is the same, regardless of underlying considerations
○ If the CI loop is not reliable, the work is pointless
Uniqueness
● Every unit must expose a distinct identifier across all exported interfaces
unique to the DUT - can be set later by admins OTP or write protected
○ Serial, fastboot, MAC, USB mass storage
● Identifiers must be independent of the software deployed on the device
○ Separate method of modifying identifiers from normal deployment / flashing tools
● Identifiers must be stable across multiple reboots and test jobs
○ Identifiers generated randomly at boot time are never suitable
● USB serial connections
○ If multiple UARTs exposed, all identifiers must be programmed to be unique
● MAC
○ Should be unique already, if modifiable then only by admins
What works on the developer’s desk is not sufficient when connected in a rack with
other similar devices.
Scalability
● Homogenous test platform using heterogeneous devices
○ Scalable infrastructure
● Do not complicate things
○ Try to avoid extra customised hardware
○ Relays, hardware modifications and mezzanine boards all increase complexity
○ More complexity raises failure risk nonlinearly
● Avoid unreliable protocols and connections
○ e.g. WiFi is nota reliable deployment method, especially inside a large lab with lots of
competing signals and devices
● Not demanding enterprise or server grade support
○ But automation cannot scale with unreliable components
○ Server support typically includes automation requirements as a subset: RAS, performance,
efficiency, scalability, reliability, connectivity and uniqueness
○ Automation racks have similar requirements to data centres - must work reliably at scale
Deployment
● Must provide a reliable and automated method to deploy, boot and test new
software
○ e.g. automatic deployment over the network using TFTP
○ Manually writing software to SD is not suitable for automation
● Must always be able to interrupt a device at a level lower than the software
being tested
○ e.g. to test U-Boot, must be able to deploy U-Boot without relying on a currently working
U-Boot.
○ Consider how the device can be recovered automatically
● Automation typically requires controlling the device over serial
○ Not via a touchscreen or other human interface device
○ Not possible to drive firmware and kernel through a graphical interface
○ Parsing graphical menus is difficult and likely to be unreliable
● May not be able to automate lower levels - chain load?
● Consider how deployment tools interact
Recommendations
● Some ideas to consider during device design:
○ Hardware
○ Firmware
○ Overall
● Recommendations, not hard and fast rules!
○ Every device is different
○ Not every device is destined to be automated
○ If you want to automate things, there are some obvious ideas
○ And there are some that only come after hard-won experience
● Production hardware is often unsuitable for automation
○ Typically lacks connectivity or includes features which need workarounds
● Testing will typically work differently to expected customer usage
○ Will stress different interfaces and methods
○ May break all your assumptions
Hardware Design for Automation
● Keep It Simple...
● Serial Ports
○ Either real serial ports connecting to a serial console server, or good USB Serial
● Remote power control
○ Non-removable batteries are problematic
○ Immediate boot when power is applied, and shut down cleanly when removed
● Standard interfaces and form factors
○ Physical connections like RJ45, USB
○ Electrical specifications like power supply and serial voltage
● Onboard Networking
○ Physical ethernet with unique, stable MAC
● Storage reliability
○ Hundreds of test jobs per device may reduce the useful life of the device
● Protect essential components
● Don't over-optimise for board cost!
Firmware Design for Automation
● Adopt existing solutions where these exist
○ Makes it easier to work with the device
○ Do not write your own bootloader
● Make your firmware robust against mistakes
○ Do not depend on settings or data written by users
○ Make it easy to recover
● Serial support
● Ensure connectivity is available in the firmware
○ Firmware and bootloader drivers for networking and storage
● Consider chain loading
● Unique and stable identifiers for all exposed interfaces
○ e.g. stable MAC address
● Unique and stable meanings for settings
○ For both hardware and software
Overall Design for Automation
● Design for Testing
● Consider reliability from the beginning
● Avoid NIH - try to avoid custom tools
○ Push support for your devices upstream into existing tools
● BMC support?
○ Has a cost, but can make a huge difference for some types of device
● Consider how users are going to use your hardware
○ Might be very different than your expected use cases!
More information about LAVA
https://validation.linaro.org/static/docs/v2/
Thank You
#HKG18
HKG18 keynotes and videos on: connect.linaro.org
For further information: www.linaro.org

Weitere ähnliche Inhalte

Mehr von Linaro

Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...Linaro
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramLinaro
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNLinaro
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...Linaro
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...Linaro
 
HKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionHKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionLinaro
 
HKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 ServersHKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 ServersLinaro
 
HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightLinaro
 

Mehr von Linaro (20)

Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready Program
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NN
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
 
HKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionHKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: Introduction
 
HKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 ServersHKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 Servers
 
HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with Coresight
 

Kürzlich hochgeladen

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Kürzlich hochgeladen (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

HKG18-TR10 - Best practices for getting devices in LAVA

  • 1. HKG18-TR10: Best Practices For Getting Devices Into LAVA Steve McIntyre
  • 2. Agenda ● Introduction ● LAVA ● CI ● Device Requirements for Automation ● Recommendations
  • 3. Introduction Over the seven years of automation in Linaro, the LAVA team have identified a series of automated testing failures which can be traced to decisions made during hardware design or firmware development. Here is a summary of our experience. The aim is to provide background information about why common failures occur, and recommendations on how to design hardware and firmware to reduce problems in the future. We describe some device design features as hard requirements to enable successful automation, and some which are guaranteed to block automation.
  • 4. History of LAVA Linaro created the LAVA project in 2010 to automate testing of software using real hardware Linaro Automated Validation Architecture Hardest part of the job has always been integrating new device types Issues relate to hardware design and firmware implementations ● Millions of test jobs ● 100+ very different types of devices ○ ARM ○ x86 ○ emulated ● Varied primary boot methods ○ U-Boot ○ UEFI ○ Fastboot ○ IoT ○ PXE ● 150+ devices in the Linaro lab, covering more than 40 different device types ● MultiNode and VLAN support
  • 5. The CI process A developer makes a change; that change triggers a build; that build triggers a test; that test reports back to the developer how that change worked. When looking for failures, beware: ● False negatives: not enough of the software is fully tested, or if the testing is not rigorous enough to spot all problems. ● False positives: the test itself is unreliable because of the test software, the test infrastructure or the test hardware.
  • 6. Goals for LAVA LAVA is one part of the CI loop - it provides device automation for testing. To work well, we have to consider: ● Reliability in the CI loop ○ Reliable software ○ Reliable infrastructure ○ Reliable test devices ● Scalability of the CI loop ○ Lots of devices ○ Lots of tests ○ In parallel ● Cost ○ Minimise expensive hardware ○ Minimise expensive admin intervention
  • 7. Device Requirements for Automation ● Reliability ● Uniqueness ● Scalability ● Deployment
  • 8. Reliability ● Test jobs need to run reliably ○ Intermittent faults easily consume months of engineering time ○ False positives reduce engineer trust ● Infrastructure failures need to be identified correctly ○ Distinguish between infrastructure and test failures ● Infrastructure failures need to be minimised ○ 3 or 4 test job failures per week is too many ○ Devices may each run hundreds of test jobs in sequence ○ Do not let failures cascade ● All devices need to reach the minimum standard of reliability ○ It is not possible to automate all devices ○ It is not always possible to test all use cases of supported devices ○ Poor thermal control and stability issues under load will lower reliability ● Continuous Integration strategy remains the same for all devices ○ The problem is the same, regardless of underlying considerations ○ If the CI loop is not reliable, the work is pointless
  • 9. Uniqueness ● Every unit must expose a distinct identifier across all exported interfaces unique to the DUT - can be set later by admins OTP or write protected ○ Serial, fastboot, MAC, USB mass storage ● Identifiers must be independent of the software deployed on the device ○ Separate method of modifying identifiers from normal deployment / flashing tools ● Identifiers must be stable across multiple reboots and test jobs ○ Identifiers generated randomly at boot time are never suitable ● USB serial connections ○ If multiple UARTs exposed, all identifiers must be programmed to be unique ● MAC ○ Should be unique already, if modifiable then only by admins What works on the developer’s desk is not sufficient when connected in a rack with other similar devices.
  • 10. Scalability ● Homogenous test platform using heterogeneous devices ○ Scalable infrastructure ● Do not complicate things ○ Try to avoid extra customised hardware ○ Relays, hardware modifications and mezzanine boards all increase complexity ○ More complexity raises failure risk nonlinearly ● Avoid unreliable protocols and connections ○ e.g. WiFi is nota reliable deployment method, especially inside a large lab with lots of competing signals and devices ● Not demanding enterprise or server grade support ○ But automation cannot scale with unreliable components ○ Server support typically includes automation requirements as a subset: RAS, performance, efficiency, scalability, reliability, connectivity and uniqueness ○ Automation racks have similar requirements to data centres - must work reliably at scale
  • 11. Deployment ● Must provide a reliable and automated method to deploy, boot and test new software ○ e.g. automatic deployment over the network using TFTP ○ Manually writing software to SD is not suitable for automation ● Must always be able to interrupt a device at a level lower than the software being tested ○ e.g. to test U-Boot, must be able to deploy U-Boot without relying on a currently working U-Boot. ○ Consider how the device can be recovered automatically ● Automation typically requires controlling the device over serial ○ Not via a touchscreen or other human interface device ○ Not possible to drive firmware and kernel through a graphical interface ○ Parsing graphical menus is difficult and likely to be unreliable ● May not be able to automate lower levels - chain load? ● Consider how deployment tools interact
  • 12. Recommendations ● Some ideas to consider during device design: ○ Hardware ○ Firmware ○ Overall ● Recommendations, not hard and fast rules! ○ Every device is different ○ Not every device is destined to be automated ○ If you want to automate things, there are some obvious ideas ○ And there are some that only come after hard-won experience ● Production hardware is often unsuitable for automation ○ Typically lacks connectivity or includes features which need workarounds ● Testing will typically work differently to expected customer usage ○ Will stress different interfaces and methods ○ May break all your assumptions
  • 13. Hardware Design for Automation ● Keep It Simple... ● Serial Ports ○ Either real serial ports connecting to a serial console server, or good USB Serial ● Remote power control ○ Non-removable batteries are problematic ○ Immediate boot when power is applied, and shut down cleanly when removed ● Standard interfaces and form factors ○ Physical connections like RJ45, USB ○ Electrical specifications like power supply and serial voltage ● Onboard Networking ○ Physical ethernet with unique, stable MAC ● Storage reliability ○ Hundreds of test jobs per device may reduce the useful life of the device ● Protect essential components ● Don't over-optimise for board cost!
  • 14. Firmware Design for Automation ● Adopt existing solutions where these exist ○ Makes it easier to work with the device ○ Do not write your own bootloader ● Make your firmware robust against mistakes ○ Do not depend on settings or data written by users ○ Make it easy to recover ● Serial support ● Ensure connectivity is available in the firmware ○ Firmware and bootloader drivers for networking and storage ● Consider chain loading ● Unique and stable identifiers for all exposed interfaces ○ e.g. stable MAC address ● Unique and stable meanings for settings ○ For both hardware and software
  • 15. Overall Design for Automation ● Design for Testing ● Consider reliability from the beginning ● Avoid NIH - try to avoid custom tools ○ Push support for your devices upstream into existing tools ● BMC support? ○ Has a cost, but can make a huge difference for some types of device ● Consider how users are going to use your hardware ○ Might be very different than your expected use cases!
  • 16. More information about LAVA https://validation.linaro.org/static/docs/v2/
  • 17. Thank You #HKG18 HKG18 keynotes and videos on: connect.linaro.org For further information: www.linaro.org