SlideShare a Scribd company logo
1 of 29
HAB Software Woes
John Graham-Cumming
September 2012

Or “My capsule didn‟t crash but my software did”
Background
        > 30 years of
         programming
         experience

        One HAB flight
         ◦ GAGA-1
http://blog.jgc.org/2011/04/gaga-1-flight.html
https://github.com/jgrahamc/gaga
Where‟s your flight‟s
complexity?
   Example: GAGA-1
    ◦ One balloon, parachute, polystyrene box
    ◦ Many metres of cord attached with knots
    ◦ An off-the-shelf camera

    ◦ 2,836 lines of code
    ◦ Common to see defect rates of 2 to 4 per
      KLOC
    ◦ So GAGA-1 likely has 5 to 10 errors in it
Real Stuff Seen on HAB
flights
 Complete computer crash
 Altitude going negative
 Latitude and longitude garbled
 Cutdown triggered in back of car
 Long periods of no transmission
 Not setting the GPS up before launch
 Not turning the camera on
 Running out of camera disk space
 Altitude jumping around rhythmically
The Curse and Joy of
Determinism
   Computers do what you tell them to
    ◦ Precisely what you tell them to
    ◦ Not what you think you told them to do
   A Curse
    ◦ Will do things you don‟t expect
    ◦ Will process bogus input without
      complaint
   The Joy
    ◦ Easy to test that it does what‟s expected
HAB Is A Harsh Environment
 Cold
 Vibration
 Stuff breaks in flight


 Software needs to be able to cope with
  failing hardware
 Very important to think about failure
  modes
 YOUR CODE IS ON ITS OWN OUT
  THERE
Deadly Sins
 The “It works!” Fallacy
 The Last Minute Change
 Being Far Too Clever
 Overlooking Odd Behaviour
 Copying Other People‟s Code
 Assuming Finding A Bug Solves The
  Problem
The “It works!” Fallacy
   If you‟re an inexperienced (and
    sometimes experienced)
    programmer…
    ◦ You hack some code together
    ◦ It works once
    ◦ You assume it will always work

   Only solution to this is
    ◦ Testing
    ◦ Paranoia
The Last Minute Change
 Never, ever change anything in code
  at the last minute no matter how
  simple.
 Example: HABE 1
    ◦ Complete camera failure
    ◦ Maximum integer size in uBASIC on
      CHDK is 999,999
    ◦ Last minute change of integer from
      600,000 to 1,000,000 caused total failure
Being Far Too Clever
       Example: GAGA-1
        ◦ Entered the wrong value of 2 * pi in code
          to do GPS position conversion from
          radians to degrees

        ◦ Caught before flight because I verified the
          location of my own back garden

        ◦ Note to self: 2 * pi != 6.2818.


https://github.com/jgrahamc/gaga/blob/master/gaga-1/flight/gaga1/gps.cpp#L113
Overlooking Odd Behaviour
       Example: GAGA-1
        ◦ In tests RTTY output was fine some of the
          time, garbled at other times
        ◦ Turned out to be interrupts from the GPS
          messing up the RTTY timing
        ◦ Solution: disable GPS serial interface while
          sending RTTY string

     ALWAYS BE HONEST WITH
      YOURSELF ABOUT YOUR CODE
     EXPECT THE SPANISH INQUISITION!

https://github.com/jgrahamc/gaga/blob/master/gaga-1/flight/gaga1/tsip.cpp#L229
Copying Other People‟s Code
     Don‟t do this, you have no idea what
      you are copying or who they copied it
      from
     Better practice is to look at other
      people‟s code and…
        ◦   Write your own version
        ◦   That you understand
        ◦   That you are able to test
        ◦   Example: GAGA-1
              Read lots of people‟s RTTY code, wrote my
               own
https://github.com/jgrahamc/gaga/blob/master/gaga-
APRS Tracker using copied
     code




   If the altitude in metres contained an 8 or a 9 the altitude reported would
   be wrong

http://sharon.esrac.ele.tue.nl/users/pe1rxq/aprstracker/aprstracker.html
Assuming Finding The Bug
Solves The Problem
 Just because you‟ve found A bug
  doesn‟t mean it was THE bug
 Lots of research in computer science
  shows bugs tend to cluster
 Example: CLOUD1, CLOUD2
    ◦ Three bugs in printing latitude, longitude
      and altitude
    ◦ One fixed on CLOUD1, …
“The One Thing I Didn‟t Test”




 http://ukhas.org.uk/guides:common_coding_errors_payload_testing
Common problems with uC
 Lack of floating point support
 Small integers
You might never be a
great programmer…

… but you can be a
paranoid tester!
Good Things To Do
 No infinite loops
 Self-Checking
 Unexpected Error Handling
 Handle Exceptions
 Simulation
 Simplify, Simplify, Simplify
 Unit Test
 Write Log Files
No Infinite Loops
 Never sit in a loop waiting forever
 Example: ATLAS 3
while (1) {
  // Make sure data is available to read
  if (Serial.available()) {
    b = Serial.read();

         if(bytePos == 8){
           navmode = b;
           return true;
         }

         bytePos++;
        }
        // Timeout if no valid response in 3 seconds
        if (millis() - startTime > 3000) {
          navmode = 0;
          return false;
        }
    }
}
             https://github.com/jamescoxon/Atlas-Flight-Computer/blob/master/Atlas3/Atlas3_3.pde#L
Self-Checking
  -- Now enter a self-check of the manual mode settings

  log( "Self-check started" )

  assert_prop( 49, -32764, "Not in manual mode" )
  assert_prop( 5,     0, "AF Assist Beam should be Off" )
  assert_prop( 6,     0, "Focus Mode should be Normal" )
  assert_prop( 8,     0, "AiAF Mode should be On" )
  assert_prop( 21,     0, "Auto Rotate should be Off" )
  assert_prop( 29,     0, "Bracket Mode should be None" )
  assert_prop( 57,     0, "Picture Mode should be Superfine" )
  assert_prop( 66,     0, "Date Stamp should be Off" )
  assert_prop( 95,     0, "Digital Zoom should be None" )
  assert_prop( 102,     0, "Drive Mode should be Single" )
  assert_prop( 133,     0, "Manual Focus Mode should be Off" )
  assert_prop( 143,     2, "Flash Mode should be Off" )
  assert_prop( 149, 100, "ISO Mode should be 100" )
  assert_prop( 218,     0, "Picture Size should be L" )
  assert_prop( 268,     0, "White Balance Mode should be Auto" )
  assert_gt( get_time("Y"), 2009, "Unexpected year" )
  assert_gt( get_time("h"), 6, "Hour appears too early" )
  assert_lt( get_time("h"), 20, "Hour appears too late" )
  assert_gt( get_vbatt(), 3000, "Batteries seem low" )
  assert_gt( get_jpg_count(), ns, "Insufficient card space" )
https://github.com/jgrahamc/gaga/blob/master/gaga-1/camera/gaga-1.lua#L96
Self-Checking
      Example: ALTAS 3
      Makes sure uBlox GPS will work at
       high altitude; fixes it if not
    if((count % 10) == 0) {
     digitalWrite(6, LOW);
     checkNAV();
     delay(1000);
     if(navmode != 6){
       setupGPS();
       delay(1000);
     }
     checkNAV();
     delay(1000);
     digitalWrite(6, HIGH);
   }


https://github.com/jamescoxon/Atlas-Flight-Computer/blob/master/Atlas3/Atlas3_3.pde#L3
Unexpected Error Handling
    def temperature():
      t = at.cmd( 'AT#TEMPMON=1' )

      # Command returns something like:
      #
      # #TEMPMEAS: 0,28
      #
      # OK
      #
      # So split on whitespace first to isolate the temperate 0,28
      # and then split on comma to get the temperature

      w = t.split()
      if len(w) < 2:
          logger.log( "Temperature read returned %s" % t )
          return -1000

      m = w[1].split(',')
      if len(m) != 2:
          logger.log( "Temperature read returned %s" % t )
          return -1000
      else:
          return int(m[1])


https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/util.py
Handle Exceptions
     If your language can generate
      exceptions then you‟d better handle
      them!
     Example: GAGA-1
       ◦ Recovery computer used Python
       ◦ Exception could have killed it
       ◦ Global exception handler
    except:
        logger.log( "Caught exception in main loop: %s" %
   sys.exc_info()[1] )



       Bonus: What‟s wrong with that code?
https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/gaga-1.py#L144
Simulation
 Simulate a flight
 Example: UKHAS wiki has example of
  using a PC as a fake GPS
http://www.ukhas.org.uk/guides:common_coding_errors_payload_testing

   Example: GAGA-1
    ◦ To test the embedded Telit module wrote
      modules that faked the entire Telit Python
      interface.
https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/GPS.py
https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/MDM.py
Simplify, Simplify, Simplify
 Make your code as simple as possible
 Never have „duplicated‟ or „copy and
  paste‟ code
 Break it up into small functions that
  you understand
 Make sure you understand the
  limitations of the functions you call
Unit Test
 Break your program up into small,
  separate functions
 Write tests that call that function and
  make sure it does what you expect.
 Lots of ways to do this
    ◦ Use something like cpptest
    ◦ ArduinoUnit
    ◦ Write your own test program
Unit Test Example
 In the bad APRS program
 Turn metres to feet code into a
  separate function: int m_to_f(int m)
    assertEquals(m_to_f(1000),3300)
    assertEquals(m_to_f(2000),6600)
    assertEquals(m_to_f(3000),9900)
    assertEquals(m_to_f(4000),13200)
    assertEquals(m_to_f(5000),16500)
    assertEquals(m_to_f(6000),19800)
    assertEquals(m_to_f(7000),23100)
    assertEquals(m_to_f(8000),26400)
    assertEquals(m_to_f(9000),29700)
    assertEquals(m_to_f(10000),33000)
Write Log Files
 Write detailed log files to non-volatile
  memory for post flight debugging
 Data sent via RTTY or APRS is limited
 Log exceptions and errors in detail
 Make sure you have a timestamp
Perform system testing
   Test your entire system before flight
    ◦ Put your tracker in the garden
    ◦ Get a GPS lock
    ◦ Listen to the RTTY on your radio
    ◦ Look at the decoded RTTY on your
      computer
    ◦ Test uploaded data on the tracker*


    ◦ *I didn‟t do that step, on the day people
      had to fix the tracker for me.

More Related Content

What's hot

Lua London Meetup 2013
Lua London Meetup 2013Lua London Meetup 2013
Lua London Meetup 2013
Cloudflare
 
marko_go_in_badoo
marko_go_in_badoomarko_go_in_badoo
marko_go_in_badoo
Marko Kevac
 
rx.js make async programming simpler
rx.js make async programming simplerrx.js make async programming simpler
rx.js make async programming simpler
Alexander Mostovenko
 
Go Concurrency
Go ConcurrencyGo Concurrency
Go Concurrency
Cloudflare
 

What's hot (20)

Lua London Meetup 2013
Lua London Meetup 2013Lua London Meetup 2013
Lua London Meetup 2013
 
marko_go_in_badoo
marko_go_in_badoomarko_go_in_badoo
marko_go_in_badoo
 
Apache Hadoop for System Administrators
Apache Hadoop for System AdministratorsApache Hadoop for System Administrators
Apache Hadoop for System Administrators
 
rx.js make async programming simpler
rx.js make async programming simplerrx.js make async programming simpler
rx.js make async programming simpler
 
The algebra of library design
The algebra of library designThe algebra of library design
The algebra of library design
 
What's Special About Elixir
What's Special About ElixirWhat's Special About Elixir
What's Special About Elixir
 
GoとElixir、同時開発した時の気づき
GoとElixir、同時開発した時の気づきGoとElixir、同時開発した時の気づき
GoとElixir、同時開発した時の気づき
 
Event Loop in Javascript
Event Loop in JavascriptEvent Loop in Javascript
Event Loop in Javascript
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in R
 
Go Concurrency
Go ConcurrencyGo Concurrency
Go Concurrency
 
ES2015 (ES6) Overview
ES2015 (ES6) OverviewES2015 (ES6) Overview
ES2015 (ES6) Overview
 
Deploying Prometheus stacks with Juju
Deploying Prometheus stacks with JujuDeploying Prometheus stacks with Juju
Deploying Prometheus stacks with Juju
 
(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014
(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014
(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014
 
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018
 
Доклад Антона Поварова "Go in Badoo" с Golang Meetup
Доклад Антона Поварова "Go in Badoo" с Golang MeetupДоклад Антона Поварова "Go in Badoo" с Golang Meetup
Доклад Антона Поварова "Go in Badoo" с Golang Meetup
 
Apache Hadoop Shell Rewrite
Apache Hadoop Shell RewriteApache Hadoop Shell Rewrite
Apache Hadoop Shell Rewrite
 
2015 555 kharchenko_ppt
2015 555 kharchenko_ppt2015 555 kharchenko_ppt
2015 555 kharchenko_ppt
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using Tracing
 
Odoo Online platform: architecture and challenges
Odoo Online platform: architecture and challengesOdoo Online platform: architecture and challenges
Odoo Online platform: architecture and challenges
 
"Metrics: Where and How", Vsevolod Polyakov
"Metrics: Where and How", Vsevolod Polyakov"Metrics: Where and How", Vsevolod Polyakov
"Metrics: Where and How", Vsevolod Polyakov
 

Similar to HAB Software Woes

20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris
imec.archive
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
Jeff Larkin
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded Programming
Sri Prasanna
 
Intro to Arduino Programming.pdf
Intro to Arduino Programming.pdfIntro to Arduino Programming.pdf
Intro to Arduino Programming.pdf
HimanshuDon1
 

Similar to HAB Software Woes (20)

Actor Concurrency
Actor ConcurrencyActor Concurrency
Actor Concurrency
 
Android Things Linux Day 2017
Android Things Linux Day 2017 Android Things Linux Day 2017
Android Things Linux Day 2017
 
IOT Firmware: Best Pratices
IOT Firmware:  Best PraticesIOT Firmware:  Best Pratices
IOT Firmware: Best Pratices
 
Advanced iOS Debbuging (Reloaded)
Advanced iOS Debbuging (Reloaded)Advanced iOS Debbuging (Reloaded)
Advanced iOS Debbuging (Reloaded)
 
20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic Analysis
 
2015-GopherCon-Talk-Uptime.pdf
2015-GopherCon-Talk-Uptime.pdf2015-GopherCon-Talk-Uptime.pdf
2015-GopherCon-Talk-Uptime.pdf
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded Programming
 
Workshop 1: Good practices in JavaScript
Workshop 1: Good practices in JavaScriptWorkshop 1: Good practices in JavaScript
Workshop 1: Good practices in JavaScript
 
FPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchFPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow Switch
 
Java gpu computing
Java gpu computingJava gpu computing
Java gpu computing
 
maXbox Starter 45 Robotics
maXbox Starter 45 RoboticsmaXbox Starter 45 Robotics
maXbox Starter 45 Robotics
 
Intro to Arduino Programming.pdf
Intro to Arduino Programming.pdfIntro to Arduino Programming.pdf
Intro to Arduino Programming.pdf
 
100 bugs in Open Source C/C++ projects
100 bugs in Open Source C/C++ projects 100 bugs in Open Source C/C++ projects
100 bugs in Open Source C/C++ projects
 
Us 17-krug-hacking-severless-runtimes
Us 17-krug-hacking-severless-runtimesUs 17-krug-hacking-severless-runtimes
Us 17-krug-hacking-severless-runtimes
 
Velocity 2015: Building Self-Healing Systems
Velocity 2015: Building Self-Healing SystemsVelocity 2015: Building Self-Healing Systems
Velocity 2015: Building Self-Healing Systems
 
Velocity 2015 building self healing systems (slide share version)
Velocity 2015 building self healing systems (slide share version)Velocity 2015 building self healing systems (slide share version)
Velocity 2015 building self healing systems (slide share version)
 
Bugs from Outer Space | while42 SF #6
Bugs from Outer Space | while42 SF #6Bugs from Outer Space | while42 SF #6
Bugs from Outer Space | while42 SF #6
 

More from jgrahamc (8)

Better living through microcontrollers
Better living through microcontrollersBetter living through microcontrollers
Better living through microcontrollers
 
Big O London Meetup April 2015
Big O London Meetup April 2015Big O London Meetup April 2015
Big O London Meetup April 2015
 
Go Containers
Go ContainersGo Containers
Go Containers
 
How to launch and defend against a DDoS
How to launch and defend against a DDoSHow to launch and defend against a DDoS
How to launch and defend against a DDoS
 
Software Debugging for High-altitude Balloons
Software Debugging for High-altitude BalloonsSoftware Debugging for High-altitude Balloons
Software Debugging for High-altitude Balloons
 
Highlights of Go 1.1
Highlights of Go 1.1Highlights of Go 1.1
Highlights of Go 1.1
 
That'll never work!
That'll never work!That'll never work!
That'll never work!
 
Javascript Security
Javascript SecurityJavascript Security
Javascript Security
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 

HAB Software Woes

  • 1. HAB Software Woes John Graham-Cumming September 2012 Or “My capsule didn‟t crash but my software did”
  • 2. Background  > 30 years of programming experience  One HAB flight ◦ GAGA-1 http://blog.jgc.org/2011/04/gaga-1-flight.html https://github.com/jgrahamc/gaga
  • 3. Where‟s your flight‟s complexity?  Example: GAGA-1 ◦ One balloon, parachute, polystyrene box ◦ Many metres of cord attached with knots ◦ An off-the-shelf camera ◦ 2,836 lines of code ◦ Common to see defect rates of 2 to 4 per KLOC ◦ So GAGA-1 likely has 5 to 10 errors in it
  • 4. Real Stuff Seen on HAB flights  Complete computer crash  Altitude going negative  Latitude and longitude garbled  Cutdown triggered in back of car  Long periods of no transmission  Not setting the GPS up before launch  Not turning the camera on  Running out of camera disk space  Altitude jumping around rhythmically
  • 5. The Curse and Joy of Determinism  Computers do what you tell them to ◦ Precisely what you tell them to ◦ Not what you think you told them to do  A Curse ◦ Will do things you don‟t expect ◦ Will process bogus input without complaint  The Joy ◦ Easy to test that it does what‟s expected
  • 6. HAB Is A Harsh Environment  Cold  Vibration  Stuff breaks in flight  Software needs to be able to cope with failing hardware  Very important to think about failure modes  YOUR CODE IS ON ITS OWN OUT THERE
  • 7. Deadly Sins  The “It works!” Fallacy  The Last Minute Change  Being Far Too Clever  Overlooking Odd Behaviour  Copying Other People‟s Code  Assuming Finding A Bug Solves The Problem
  • 8. The “It works!” Fallacy  If you‟re an inexperienced (and sometimes experienced) programmer… ◦ You hack some code together ◦ It works once ◦ You assume it will always work  Only solution to this is ◦ Testing ◦ Paranoia
  • 9. The Last Minute Change  Never, ever change anything in code at the last minute no matter how simple.  Example: HABE 1 ◦ Complete camera failure ◦ Maximum integer size in uBASIC on CHDK is 999,999 ◦ Last minute change of integer from 600,000 to 1,000,000 caused total failure
  • 10. Being Far Too Clever  Example: GAGA-1 ◦ Entered the wrong value of 2 * pi in code to do GPS position conversion from radians to degrees ◦ Caught before flight because I verified the location of my own back garden ◦ Note to self: 2 * pi != 6.2818. https://github.com/jgrahamc/gaga/blob/master/gaga-1/flight/gaga1/gps.cpp#L113
  • 11. Overlooking Odd Behaviour  Example: GAGA-1 ◦ In tests RTTY output was fine some of the time, garbled at other times ◦ Turned out to be interrupts from the GPS messing up the RTTY timing ◦ Solution: disable GPS serial interface while sending RTTY string  ALWAYS BE HONEST WITH YOURSELF ABOUT YOUR CODE  EXPECT THE SPANISH INQUISITION! https://github.com/jgrahamc/gaga/blob/master/gaga-1/flight/gaga1/tsip.cpp#L229
  • 12. Copying Other People‟s Code  Don‟t do this, you have no idea what you are copying or who they copied it from  Better practice is to look at other people‟s code and… ◦ Write your own version ◦ That you understand ◦ That you are able to test ◦ Example: GAGA-1  Read lots of people‟s RTTY code, wrote my own https://github.com/jgrahamc/gaga/blob/master/gaga-
  • 13. APRS Tracker using copied code If the altitude in metres contained an 8 or a 9 the altitude reported would be wrong http://sharon.esrac.ele.tue.nl/users/pe1rxq/aprstracker/aprstracker.html
  • 14. Assuming Finding The Bug Solves The Problem  Just because you‟ve found A bug doesn‟t mean it was THE bug  Lots of research in computer science shows bugs tend to cluster  Example: CLOUD1, CLOUD2 ◦ Three bugs in printing latitude, longitude and altitude ◦ One fixed on CLOUD1, …
  • 15. “The One Thing I Didn‟t Test” http://ukhas.org.uk/guides:common_coding_errors_payload_testing
  • 16. Common problems with uC  Lack of floating point support  Small integers
  • 17. You might never be a great programmer… … but you can be a paranoid tester!
  • 18. Good Things To Do  No infinite loops  Self-Checking  Unexpected Error Handling  Handle Exceptions  Simulation  Simplify, Simplify, Simplify  Unit Test  Write Log Files
  • 19. No Infinite Loops  Never sit in a loop waiting forever  Example: ATLAS 3 while (1) { // Make sure data is available to read if (Serial.available()) { b = Serial.read(); if(bytePos == 8){ navmode = b; return true; } bytePos++; } // Timeout if no valid response in 3 seconds if (millis() - startTime > 3000) { navmode = 0; return false; } } } https://github.com/jamescoxon/Atlas-Flight-Computer/blob/master/Atlas3/Atlas3_3.pde#L
  • 20. Self-Checking -- Now enter a self-check of the manual mode settings log( "Self-check started" ) assert_prop( 49, -32764, "Not in manual mode" ) assert_prop( 5, 0, "AF Assist Beam should be Off" ) assert_prop( 6, 0, "Focus Mode should be Normal" ) assert_prop( 8, 0, "AiAF Mode should be On" ) assert_prop( 21, 0, "Auto Rotate should be Off" ) assert_prop( 29, 0, "Bracket Mode should be None" ) assert_prop( 57, 0, "Picture Mode should be Superfine" ) assert_prop( 66, 0, "Date Stamp should be Off" ) assert_prop( 95, 0, "Digital Zoom should be None" ) assert_prop( 102, 0, "Drive Mode should be Single" ) assert_prop( 133, 0, "Manual Focus Mode should be Off" ) assert_prop( 143, 2, "Flash Mode should be Off" ) assert_prop( 149, 100, "ISO Mode should be 100" ) assert_prop( 218, 0, "Picture Size should be L" ) assert_prop( 268, 0, "White Balance Mode should be Auto" ) assert_gt( get_time("Y"), 2009, "Unexpected year" ) assert_gt( get_time("h"), 6, "Hour appears too early" ) assert_lt( get_time("h"), 20, "Hour appears too late" ) assert_gt( get_vbatt(), 3000, "Batteries seem low" ) assert_gt( get_jpg_count(), ns, "Insufficient card space" ) https://github.com/jgrahamc/gaga/blob/master/gaga-1/camera/gaga-1.lua#L96
  • 21. Self-Checking  Example: ALTAS 3  Makes sure uBlox GPS will work at high altitude; fixes it if not if((count % 10) == 0) { digitalWrite(6, LOW); checkNAV(); delay(1000); if(navmode != 6){ setupGPS(); delay(1000); } checkNAV(); delay(1000); digitalWrite(6, HIGH); } https://github.com/jamescoxon/Atlas-Flight-Computer/blob/master/Atlas3/Atlas3_3.pde#L3
  • 22. Unexpected Error Handling def temperature(): t = at.cmd( 'AT#TEMPMON=1' ) # Command returns something like: # # #TEMPMEAS: 0,28 # # OK # # So split on whitespace first to isolate the temperate 0,28 # and then split on comma to get the temperature w = t.split() if len(w) < 2: logger.log( "Temperature read returned %s" % t ) return -1000 m = w[1].split(',') if len(m) != 2: logger.log( "Temperature read returned %s" % t ) return -1000 else: return int(m[1]) https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/util.py
  • 23. Handle Exceptions  If your language can generate exceptions then you‟d better handle them!  Example: GAGA-1 ◦ Recovery computer used Python ◦ Exception could have killed it ◦ Global exception handler except: logger.log( "Caught exception in main loop: %s" % sys.exc_info()[1] )  Bonus: What‟s wrong with that code? https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/gaga-1.py#L144
  • 24. Simulation  Simulate a flight  Example: UKHAS wiki has example of using a PC as a fake GPS http://www.ukhas.org.uk/guides:common_coding_errors_payload_testing  Example: GAGA-1 ◦ To test the embedded Telit module wrote modules that faked the entire Telit Python interface. https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/GPS.py https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/MDM.py
  • 25. Simplify, Simplify, Simplify  Make your code as simple as possible  Never have „duplicated‟ or „copy and paste‟ code  Break it up into small functions that you understand  Make sure you understand the limitations of the functions you call
  • 26. Unit Test  Break your program up into small, separate functions  Write tests that call that function and make sure it does what you expect.  Lots of ways to do this ◦ Use something like cpptest ◦ ArduinoUnit ◦ Write your own test program
  • 27. Unit Test Example  In the bad APRS program  Turn metres to feet code into a separate function: int m_to_f(int m) assertEquals(m_to_f(1000),3300) assertEquals(m_to_f(2000),6600) assertEquals(m_to_f(3000),9900) assertEquals(m_to_f(4000),13200) assertEquals(m_to_f(5000),16500) assertEquals(m_to_f(6000),19800) assertEquals(m_to_f(7000),23100) assertEquals(m_to_f(8000),26400) assertEquals(m_to_f(9000),29700) assertEquals(m_to_f(10000),33000)
  • 28. Write Log Files  Write detailed log files to non-volatile memory for post flight debugging  Data sent via RTTY or APRS is limited  Log exceptions and errors in detail  Make sure you have a timestamp
  • 29. Perform system testing  Test your entire system before flight ◦ Put your tracker in the garden ◦ Get a GPS lock ◦ Listen to the RTTY on your radio ◦ Look at the decoded RTTY on your computer ◦ Test uploaded data on the tracker* ◦ *I didn‟t do that step, on the day people had to fix the tracker for me.