HKG15-111: LAVA Dispatcher Refactoring
---------------------------------------------------
Speaker: Neil Williams, Rémi Duraffort
Date: February 9, 2015
---------------------------------------------------
★ Session Summary ★
The beloved LAVA dispatcher is currently undergoing a transformation to become a lean, mean, use case supporting machine. Whilst the development is not yet complete the LAVA team would like to provide status on what has been completed, what is in progress, and what is next. Feel free to join the team for a discussion shortly after a brief presentation and help us define the future of the LAVA dispatcher!
--------------------------------------------------
★ Resources ★
Pathable: https://hkg15.pathable.com/meetings/250772
Video: https://www.youtube.com/watch?v=KOpVhAuHvfQ
Etherpad: http://pad.linaro.org/p/hkg15-111
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2015 - #HKG15
February 9-13th, 2015
Regal Airport Hotel Hong Kong Airport
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
2. Benefits of the new dispatcher
● Error identification & cleanup
● Retry, diagnostic and repeat actions
● Pipeline of idempotent actions
● All data returned in the result bundle
● Clearer log structure
● Faster, smaller & more modular
● Wider range of support and capabilities
3. Scope of the refactoring
Changing
● Ground-up rewrite
○ new actions
○ new strategies
● Job definitions
● Result bundle format
Keeping
● Test definition compatibility
● Current use cases
● Packaging
Not planned
● automatic migration from
JSON to YAML
Removed
● Binary formats (hwpack)
● old conventions
○ linaro-* actions
4. New mechanisms of deployment
● Choosing media
○ USB media
○ SATA
○ NFS
○ ramdisk
● Kexec
● Third party images
● Installer testing
● GRUB & UEFI support (testing is subject to hardware)
● SD carries LAVA bootloader
6. ● YAML input
● 3rd party images
● NFS masters
● Gold standard images
● tighter I/O control
● new media support
● full control within tests
● re-use deployments
● overlays + extensions
Comparison of old and new
● JSON input
● hwpacks
● master image on SD
● test rootfs on SD
● no alternative media
● linaro-media-create
● sync
● rip up images using tar
7. ● XMLRPC
● YAML files
● Data per job
● Single master scheduler
● Complete pipeline data
● ID based on job+pipeline
● Fail early & diagnose
● Easier log viewing
● Expanded unit tests
Comparison of old and new - 2
● Persistent SQL connection
● Worker database (unused)
● SALT config
● Slave scheduler per
worker
● Reduced bundle data
● UUID urls
8. New design principles
● Pipeline of idempotent actions● Pipeline of idempotent actions
Deploy Boot Test Test Gather
Download
rootfs...
Extract
Connect to
serial
Insert tests
data
Connect to
UBoot
Wait for
shell
prompt
Auto login
9. New design principles
Deploy
Insert tests
data
Deploy Boot
Repeat
8 times
Test
Download
rootfs...
Retry 3
times, if
error
Extract
● Actions may have internal pipelines
● Repeat single actions or blocks & diagnose on failure
11. Next stages - dispatcher
● Multinode
○ port to pipeline with all new actions
● VM Groups
○ port to pipeline with all new actions
○ remove constraints on VM command line
● Android
○ port to pipeline with all new actions
○ Use complete images (no SELinux tar requirements)
○ Obtain gold standard images
○ Drop all lava-android test support on dispatchers - KVM only.
12. Next stages - server
● Device configuration in postgresql
○ Templates to support overrides
○ No configuration files for devices - templates in code & SQL data
● Server integration
○ submission of pipeline results & storing new metadata
○ folding and profile support in log views
● Scheduler support
○ New mechanisms to select devices and strategies
○ New validation checks prior to submission
○ Single instance scheduler
● Lock down postgresql access
14. When? (no dates but not soon)
● Developer access only
○ Need all the server side changes before allowing submissions
○ Details of migration are not defined at this stage
■ aim to retain old methods alongside new
■ all Multinode jobs must use either old or new
● Convert devices to support both
○ Test porting all configuration & test on staging
○ Multinode scheduling requires all devices use the same methods.
○ Current devices used in new ways with new constraints.
● Test to ensure compatibility in test definitions
○ No support for converting existing JSON to new YAML