ICSE17Tech Briefing - Automated GUI Testing of Android Apps: From Research to Practice

AUTOMATED GUI TESTING
OF ANDROID APPS:
FROM RESEARCH TO
PRACTICE
Kevin Moran
Mario Linares-Vásquez
Denys Poshyvanyk

Mario Linares-Vásquez
Assistant Professor
Universidad de los Andes
Bogotá, Colombia
m.linaresv@uniandes.edu.co
http://sistemas.uniandes.edu.co/~mlinaresv
Kevin Moran
Ph.D. candidate
College of William and Mary
Williamsburg, VA, USA
kpmoran@cs.wm.edu
http://www.cs.wm.edu/~kpmoran
Denys Poshyvanyk
Associate Professor
College of William and Mary
Williamsburg, VA, USA
denys@cs.wm.edu
http://www.cs.wm.edu/~denys

PART 1: CURRENT STATE OF RESEARCH & PRACTICE
PART 2: MOBILE TESTING CHALLENGES
PART 0: BACKGROUND AND CORE CONCEPTS
PART 3: PRELIMINARY SOLUTIONS & RESEARCH
VISION

PART 0: BACKGROUND & CORE
CONCEPTS

The Importance of GUI Testing
• Several different types of testing are important
for ensuring software quality:

Unit Testing Performance Testing
Regression Testing
Integration Testing
Compatibility Testing

Unit Testing Performance Testing
Regression Testing
Integration Testing
Compatibility Testing
• For Mobile, GUI-Based Testing subsumes many other types
of testing
• GUI-Testing is typically expensive, and test scripts are
difﬁcult to maintain
• There is a clear opportunity for automation to Improve
development workflows

GUI Testing: The Main Idea
UI Events

UI Events
Oracle

Output, layout, exceptions,
presentation logic, quality attributes, …
UI Events
Oracle

Test
result
GUI Testing: Core Concepts
Oracle

GUI Testing: Core Concepts
Oracle Test
result

GUI Testing: Example
Detecting and Localizing Internationalization Presentation Failures in Web Applications. Abdulmajeed Alameer, Sonal Mahajan, William G.J. Halfond. In
Proceeding of the 9th IEEE International Conference on Software Testing, Veriﬁcation, and Validation (ICST). April 2016.

Inputs: combinatorial explosion
GUI Testing (Challenges)

Inputs: combinatorial explosion
Internationalization

Inputs: combinatorial explosion Internationalization

Responsive design

Responsive design
Unexpected usage scenarios

Responsive design Unexpected usage scenarios

MONKEY TESTING !!
AUTOMATED TESTING !!

Automated GUI Testing
Output, layout, exceptions, presentation
logic, quality attributes, …
UI Events
Monkey

Unique Challenges in Mobile Development

Thousands of apps are released and
updated every day on the online store

apps - Google Play
2.8M
downloads - Google Play65B
releases (Android) since 2008
25

Large volume of crowdsourced requirements
and ratings

Fragmentation at device and OS level

Pressure for continuous delivery

Manual testing is still preferred

Mobile-speciﬁc quality attributes, inputs, and
scenarios

PART 1: STATE OF RESEARCH &
PRACTICE FOR MOBILE TESTING

Overview of Tools & Services
• Automation Frameworks & APIs

• Record & Replay Tools

• Automated Input Generation Tools

• Bug & Error Reporting

• Crowdsourced Testing

• Cloud Testing Services

• Device Streaming Tools

}Traditional Android Testing
Tools and Approaches

}
}
Traditional Android Testing
Tools and Approaches
Bug Reporting,
Crowdsourcing and Services

ANDROID TESTING TOOLS
& APPROACHES

Automation Frameworks/APIs (AF/A)
TESTS
JUnit, Espresso, UI Automator, Robotium
Monkey

Testing Automation Frameworks/APIs
UI Automator

Testing Automation Frameworks/APIs
https://github.com/googlesamples/android-testing

Pros and Cons
Automation
Frameworks
✓ Easy reproduction
✓ High level syntax
✓ Black box testing
- Learning curve
- User-deﬁned oracles
- Expensive maintenance

Record and Replay (R&R)
AUT/SUT
UI Events

AUT/SUT
UI Events
Recorder Script

AUT/SUT
UI Events
Recorder Script
Scripts

AUT/SUT
UI Events
Recorder Script
Scripts
UI Events
Monkey AUT/SUT
UI Events

Tools: Barista
http://checkdroid.com/barista/

Tools: ODBR
www.android-dev-tools.com/odbr

Pros and Cons
Automation
Frameworks
- Learning curve
Record &
Replay ✓ Easy reproduction
- Expensive collection and
maintenance
- Coupled to locations

Automated Input Generation (AIG) Techniques

• Differing Goals:

• Code Coverage

• Code Coverage
• Crashes

• Code Coverage
• Crashes
• Mimic Real Usages

• Code Coverage
• Crashes
• Three Main Types:

• Code Coverage
• Crashes
• Random-Based

• Code Coverage
• Crashes
• Random-Based
• Systematic

• Code Coverage
• Crashes
• Random-Based
• Systematic
• Model-Based

Random/Fuzz Testing (R/FT)
Monkey
X or Y ?
AUT/SUT

Monkey
X or Y ?
AUT/SUT
Event x
Invalid

Monkey
X or Y ?
AUT/SUT
Event x
Invalid
Event Y
Valid

./adb shell monkey -p com.evancharlton.mileage 10000

Pros and Cons
Automation
Frameworks
- Learning curve
Record &
maintenance
AIG: Random
Based
✓ Fast execution
✓ Good at ﬁnding crashes
- Invalid events
- Lack of expressiveness

Aside: GUI Ripping
Ripper/Extractor ModelAUT/SUT Monkey

Aside: GUI Ripping
Snapshot 1

Aside: GUI Ripping
Snapshot 1
Snapshot 1

Aside: GUI Ripping
Snapshot 1
Snapshot 1
GUI State 1

Aside: GUI Ripping
Snapshot 1
Snapshot 1
GUI State 1
A or B ?

Aside: GUI Ripping
Snapshot 1
Snapshot 1
GUI State 1
Event 1
A or B ?

Aside: GUI Ripping
Snapshot 1
Snapshot 1
GUI State 1
Event 1
Snapshot 2
Snapshot 2
GUI State 2
A or B ?

GUI State extraction
Ripper/Extractor
Computer/Mobile device
OS
- Framework
- API
- Utilities
GUI State
Events
Monkey

Monkey
A or B ?
Systematic Exploration

Monkey
A or B ?
Breadth-First (BF)Depth-First (DF)

Monkey
A or B ?
Breadth-First (BF)Depth-First (DF)
Random (Uniform) Random (A-priori distr.)
Other options (online decision)

Tools: Google Robo Test
https://firebase.google.com/docs/test-lab/robo-ux-test

Pros and Cons
Automation
Frameworks
- Learning curve
Record &
maintenance
AIG: Random
Based
✓ Fast execution
- Invalid events
AIG:
Systematic
✓ Achieves Reasonable Coverage
✓ May miss crashes
- Can be time consuming
- Typically cannot
exercise complex
features

Model-Based Testing (MBT)
UI Events
Model Monkey AUT/SUT

Model-Based Testing (MBT)
UI Events
Model Monkey AUT/SUT
- Manually generated
- Automatically generated (source code)
- Ripped at runtime (upfront)
- Ripped at runtime (interactive)

Pros and Cons
Automation
Frameworks
- Learning curve
Record &
maintenance
AIG: Random
Based
✓ Fast execution
- Invalid events
AIG:
Systematic
✓ Achieves Reasonable Coverage
✓ May miss crashes
- Can be time consuming
- Typically cannot exercise
complex features
AIG: Model
Based
✓ Event sequences
✓ Automatic exploration
- Some Invalid sequences
- State Explosion
- Incomplete models

Other Types of AIG Approaches
• Recently New Approaches have been introduced for AIG:
• Search-Based Approaches1
• Symbolic/Concolic Execution2
• NLP for Text Input Generation - @ICSE17
1Ke Mao, Mark Harman, and Yue Jia. 2016. Sapienz: multi-objective automated testing for Android applications. In Proceedings of
the 25th International Symposium on Software Testing and Analysis (ISSTA 2016)
2Nariman Mirzaei, Joshua Garcia, Hamid Bagheri, Alireza Sadeghi, and Sam Malek. 2016. Reducing combinatorics in GUI testing
of android applications. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16)

BUG REPORTING,
CROWDSOURCING, & SERVICES

Bug Reporting Services
AUT/SUTThird Party Library
Web Service

• Features of Bug Reporting
Services:

Services:
• Video Recording

Services:
• Video Recording
• App Analytics

Services:
• Video Recording
• App Analytics
• Automated Crash Reporting

Pros and Cons
Bug Reporting
Services
✓ Allows for more details
about ﬁeld failures
✓ App Analytics can help
with UI/UX design
- Can be expensive
- Requires integrating library
- Typically do report GUI-
traces

Crowdsourced Testing
AUT/SUTThird Party Service

• Types of Crowdsourced Testing Services:

• Expert Testing

• Expert Testing
• Functional Testing

• Expert Testing
• UX Testing

• Expert Testing
• UX Testing
• Security Testing

• Expert Testing
• UX Testing
• Security Testing
• Localization Testing

Usability Testing
https://www.thoughtworks.com/insights/blog/recording-mobile-device-usability-testing-
sessions-–-guerrilla-style

Usability Testing
https://www.mrtappy.com

Crowdsourced Testing: Examples
https://www.pay4bugs.com
Expert Testers
Functional
Testing
UX Testing
Security
Testing
Localization
Testing

http://testarmy.com/crowd-testing
Expert Testers
Functional
Testing
UX Testing
Security
Testing
Localization
Testing

https://www.applause.com/testing/
Expert Testers
Functional
Testing
UX Testing
Security
Testing
Localization
Testing

Pros and Cons
Crowdsourcing
Services
✓ Low effort required
from developers
✓ Expert tester might
uncover unexpected
bugs
- Can be expensive
- May not ﬁt within Agile
workflows
- Quality of Reports can
vary
Bug Reporting
Services
with UI/UX design
- Can be expensive
traces

Cloud Testing & Device Streaming (Devices farm)
Third Party Service
Third Party Service

Tools: Xamarin Test Cloud
https://www.xamarin.com/test-cloud

Tools: AWS device farm
https://aws.amazon.com/device-farm/

Tools: Google Cloud Test Lab
https://developers.google.com/cloud-test-lab/

Tools: Vysor
http://www.vysor.io

Tools: STF
http://openstf.io and SEMERU extension

Pros and Cons
Crowdsourcing
Services
✓ Low effort required
from developers
✓ Expert tester might
uncover unexpected
bugs
- Can be expensive
- May not fit within Agile
workflows
- Quality of Reports can
vary
Bug Reporting
Services
with UI/UX design
- Can be expensive
traces
Device
Streaming
✓ Allows remote users
access to controlled
devices
✓ Allows for collection of
detailed user information
- Can be difficult to configure
- Relies on strong network
connection
- Cannot simulate mobile
specific contexts like
sensors

PART 2: CURRENT CHALLENGES IN
MOBILE TESTING

Accidental Challenges in Mobile GUI-Testing

Accidental Challenges

Application Data and Cold Starts

Bugs in Framework Utilities

Bugs and Limitations in the SDK Tools

Bugs and Limitations in the SDK Tools
Challenges Scaling Concurrent Virtual Devices

Essential Challenges in Mobile GUI-Testing

Essential Challenges

Test Oracles

Test Oracles
Event Coordination

Test Oracles
Event Coordination
External Features

Test Oracles
Event Coordination
External Features
External Dependencies

Accidental Challenges: Application Data and Cold Starts

Accidental Challenges: Application Data and Cold Starts
adb shell rm <app-data-folder>
adb shell pm clear <app-package-name>

Accidental Challenges: Bugs in Framework Utilities

adb shell shell /system/bin/uiautomator
dump /sdcard/ui_dump.xml

Solution: Modify Framework Components for the AOSP

Accidental Challenges: Bugs and Limitations in the SDK Tools
adb Server 1 adb Server 2

Accidental Challenges: Bugs and Limitations in the SDK Tools
adb Server 1 adb Server 2
No more than 15 devices per adb server
instance!

Accidental Challenges: Scaling Concurrent Virtual Devices
Potential Solution: Use android-x86 virtual machines

Essential Challenges: Test Oracles
Potential Solution: Use a hash of the current screen hierarchy
Drawbacks: May miss some ﬁned grained information about the GUI
(e.g. colors, images)

Essential Challenges: Event Coordination

Essential Challenges: Event Coordination
adb shell dumpsys window -a
mAppTransitionState
APP_STATE_READY
APP_STATE_IDLE
APP_STATE_TIMEOUT
APP_STATE_RUNNING

Essential Challenges: External Features & Dependencies
Potential Solution: Use real usage data from the ﬁeld?

PART 3: FUTURE RESEARCH VISION

Fully automated mobile testing: Are we there yet?
APP
- Oracles
- Models
- Scripts
- Strategies
- Fuzzers
- Rippers
- MBT
- Automation APIs

Key Challenges
1. Open problems without solutions (e.g., oracles
generation)
2. Facets of mobile testing not investigated yet (e.g.,
mobile-speciﬁc fault models)
3. Speciﬁc “characteristics” in mobile testing that
drastically impact the testing process (e.g.,
fragmentation)

Key Challenges: Fragmentation
https://developer.android.com/about/dashboards/index.html
25
API versions
7
API versions
97%
OS Marketshare

Key Challenges: Fragmentation
http://iossupportmatrix.com/
BY
iOS 10
Code names; Whitetail
iOS 9
Code names; Monarch, Boulder,
Castlerock, Eagle
iOS 8
Code names; Okemo, OkemoTaos,
OkemoZurs, Stowe, Copper
iOS 7
Code names; Innsbruck, Sochi
iOS 6
Code names; Sundance, Brighton
iOS 5
Code names; Telluride, Hoodoo
iPhone SDK 4.0
Code names; Apex, Baker,
Jasper, Phoenix, Durango
iPhone SDK 3.0
Code names; Kirkwood,
Northstar, Wildcat
iPhone SDK 2.0
Code names; Big Bear,
Sugarbowl, Timberline
iPhone OS 1.0
Code names; Alpine, Heavenly
Little Bear, Snobird, Oktoberfest
3.0
5.0
6.0 6.0 6.0 6.0
7.0 7.0 7.0 7.0
1.0 1.1
3.1.3
▼
3.1.3
▼
4.2.1
▼
4.2.1
▼
2.0 2.1.1
4.3.5
3.1.1
3.2
iPad SDK
4.0 (GSM)
4.2.6 (CDMA) 4.2.1 4.3.5
5.1
8.1 8.0 8.0 8.1 8.4
9.0 9.0 9.0 9.1 9.3 9.3
6.1.6
▼
5.1.1
▼
5.1.1
▼
7.1.2
▼
6.1.6
▼
9.3.3
▼
9.3.3
▼
10.0
▼
10.0
▼
10.0
▼
10.0
▼
10.0
▼
10.0
▼
10.0
▼
10.0
▼
10.0
▼
10.0
▼
10.0
▼
10.0
▼
10.0
▼
10.0
▼
10.0
▼
10.0
▼
10.0
▼
9.3.3
▼
9.3.3
▼
9.3.3
▼
iPadPro9.7”
March2016
iPhoneSE
March2016
iPadPro12.9”
November2015
iPhone6sPlus
September2015
iPhone6s
September2015
iPadmini4
September2015
iPodtouch
(6thgeneration)July2015
iPadAir2
October2014
iPhone6Plus
September2014
iPhone6
September2014
iPadmini3
October2014
iPadmini2
October2013
iPadAir
October2013
iPhone5s
September2013
iPhone5c
September2013
iPadOctober2012
iPhone5
September2012
iPadmini
October2012
iPodtouch
(5thgeneration)September2012
[new]iPad
March2012
iPhone4S
October2011
iPad2
March2011
iPodtouch
(4thgeneration)September2010
iPhone4
June2010
iPadApril2010
iPodtouch
(3rdgeneration)September2009
iPhone3GS
June2009
iPodtouch
(2ndgeneration)September2008
iPodtouch
September2007
iPhone3G
July2008
iPhone
June2007
32bit/Swift/ARMv7-A
A6 64bit/Cyclone/ARMv8-A
A7 64bit/Typhoon/ARMv8-A
A8 64bit/Twister/ARMv8-A
A932bit/Cortex A8/ARMv7-A
A4 32bit/Cortex A9/ARMv7-A
A5
128MB 256MB 256MB 256MB512MB 512MB 512MB1GB 1GB 1GB 1GB 2GB 2GB 4GB2GB 2GB1GB
@isupportmatrix
iOSSupportMatrix.com
Summer 2016 — v4.0b6
B E T A
6

Key Challenges: Lack of fault models

Key Challenges: Lack of fault models
Fault model (a.k.a., fault proﬁle):
Catalog describing recurrent (or uncommon) bugs
associated with software developed for a particular
platform or domain
Fault Model
Potential locations/features
in source code to be tested

Key Challenges: History awareness in test cases

Key Challenges: History awareness in test cases
Current automated techniques do not support system/
acceptance testing because they are not history aware.
- Fuzzers are totally random
- Rippers follow heuristics for GUI traversal
- Test scripts are hard-coded

Key Challenges: Evolution/maintenance of
test scripts

Key Challenges: Evolution/maintenance of
test scripts
Location-coupled scripts
Need different versions for each device and are highly
sensitive to changes in the GUI
Device/Location-agnostic scripts
Are resilient to device fragmentation but need to be
updated when the GUI changes in terms of added/
deleted components

Key Challenges: Oracles generation

Key Challenges: Oracles generation
Manually Codiﬁed Oracles (MCO)
Usually implemented as assertions or expected
exceptions
Exceptions-As-Oracle (EAO)
Crashes and errors reported during the execution of a
test case
Gui-state-As-Oracle (GAO)
Expected transitions between windows, and GUI
screenshots

Key Challenges: Multi-goal testing

Key Challenges
Challenge Solution
Fragmentation
Fault models
History-awareness
Multi-Goal testing
Test scripts evolution
Oracles generation
Crowd/cloud based testing
Research is needed
MBT is a promise
Open problem
Open problem
Open problem

The CEL Testing Principles
A “fully" automated solution for testing
mobile apps should continually generate
test cases, underlying models, oracles, and
actionable reports of detected errors,
without human intervention.

Automated testing of mobile apps should
help developers increase software quality
within the following constraints: (i)
restricted time/budget for testing, (ii)
needs for diverse types of testing, and (iv)
pressure from users for continuous
delivery

Continuous:
- Continuous multi-goal testing under different environmental
conditions
- Any change to the source code or environment should
trigger — automatically – a testing iteration on the current
version of the app
- Test cases executed during the iteration should cover only
the impact set of the changes that triggered the iteration

Evolutionary:
- App source code and testing artifacts (i.e., the models, the
test cases, and the oracles) should not evolve independently
of one another
- The testing artifacts should adapt automatically to changes
in (i) the app, (ii) the usage patterns, and (iii) the available
devices/OSes

Large-scale:
- CEL testing should be supported on infrastructures for
parallel execution of test cases on physical or virtual devices.
- The large-scale engine should be accessible in the context of
both cloud and on- premise hardware
-This engine should enable execution of test cases that
simulate real conditions in-the-wild.

CEL Testing: Proposed Architecture
APIs evolution
monitor
Changes Monitoring Subsystem
Developers
User reviews
APIs
Users
On-device
usages monitor
Source code
changes
monitor
Markets monitor
Source code
Impact analyzer
Execution
traces+ logs
analyzer
API
changes
analyzer
User reviews +
ratings analyzer
Models
repository
Models
generator
Test cases
generator
Large Execution Engine
Containers
Manager
Reports
repository
Reports
generator
Test cases
runner
Testing Artifacts Generation
Artifacts
repository
Multi-model
generator
Multi-model
repository
Test cases + oracles
Domain, GUI,
Usage, Contextual,
Faults models
Multi-model

Multi-Model
GUI model
Usage model
Domain model
Context model
Fault model
Windows, transitions, components
Single use cases and combinations;
(un)common usages
Entities, data types, values, enumerations
External events, sensors, connectivity
(Un)Common programming errors

Multi-Model
+
GUI model
Usage model
Domain model
Context model
Fault model

Multi-Model
+
GUI model
Usage model
Domain model
Context model
Fault model
Multi-model

Multi-Model
+
GUI model
Usage model
Domain model
Context model
Fault model
Multi-model
- Multi-goal testing
- Oracles generation
- On-demand test cases
generation

SEMERU-Tools
- MonkeyLab: New method for Modeling Actions
- CrashScope: Practical Automated Testing
- Fusion: Enhanced Bug Reporting
- CrashDroid: Automated Crash Reproduction
- GEMMA: Multi-Objective Energy Optimization
- ODBR: On-Device Bug Reporting

THANK YOU !!
QUESTIONS/DISCUSSION?
kpmoran@cs.wm.edu m.linaresv@uniandes.edu.co denys@cs.wm.edu

Discussion Questions
• Potential solutions to challenges we covered?
• Other future research directions?
• How does the rise of native web-programming
frameworks for mobile apps impact testing?

ICSE17Tech Briefing - Automated GUI Testing of Android Apps: From Research to Practice

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie ICSE17Tech Briefing - Automated GUI Testing of Android Apps: From Research to Practice

Ähnlich wie ICSE17Tech Briefing - Automated GUI Testing of Android Apps: From Research to Practice (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

ICSE17Tech Briefing - Automated GUI Testing of Android Apps: From Research to Practice