ACM Distinguished Program: Cooperative Testing and Analysis: Human-Tool, Tool-Tool, and Human-Human Cooperations to Get the Job Done

Human-Tool, Tool-Tool, and Human-Human
Cooperations to Get the Job Done

Tao Xie

North Carolina State University
Raleigh, NC, USA

IBM's Deep Blue defeated chess champion
Garry Kasparov in 1997

IBM Watson defeated top human Jeopardy!
players in 2011

Category U.S. CITIES: “Its largest airport was named for a World War II
hero; its second largest, for a World War II battle”
Responses of Rutter and Jennings: “What is Chicago?”
Response of Watson: "What is Toronto?????"

"Completely Automated
Public Turing test to tell
Computers and Humans
Apart"

iPad

Movie: Minority Report

CNN News

http://www.dagstuhl.de/programm/kalender/semhp/?semnr=1011
2010 Dagstuhl Seminar 10111
Practical Software Testing: Tool Automation and Human Factors

Human Factors

http://www.dagstuhl.de/programm/kalender/semhp/?semnr=1011
2010 Dagstuhl Seminar 10111
Practical Software Testing: Tool Automation and Human Factors

 Recent advanced technique: Dynamic
Symbolic Execution/Concolic Testing
 Instrument code to explore feasible paths

 Example tool: Pex from Microsoft
Research (for .NET programs)

Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: directed automated random
testing. In Proc. PLDI 2005
Koushik Sen, Darko Marinov, and Gul Agha. CUTE: a concolic unit testing engine for C. In Proc.
ESEC/FSE 2005
Nikolai Tillmann and Jonathan de Halleux. Pex - White Box Test Generation for .NET. In Proc.
TAP 2008 10

Choose next path
Code to generate inputs for: Solve Execute&Monitor
void CoverMe(int[] a)
Constraints to solve Data Observed constraints
{
if (a == null) return; null a==null
if (a.Length > 0)
if (a[0] == 1234567890) a!=null {} a!=null &&
!(a.Length>0)
throw new Exception("bug");
a!=null && {0} a!=null &&
} a.Length>0 Negated condition
a.Length>0 &&
a[0]!=1 3 5 7 9
24680

F a==null a!=null && {2…
13} a!=null &&
T a.Length>0 && a.Length>0 &&
a[0]==1 3 5 7 9
24680 a[0]==1 3 5 7 9
24680
F a.Length>0 T
Done: There is no path left.

a[0]==123…
F T

@NCSU ASE

 Method sequences
 MSeqGen/Seeker [Thummalapenta et al. OOSPLA 11, ESEC/FSE 09],
Covana [Xiao et al. ICSE 2011], OCAT [Jaygarl et al. ISSTA 10],
Evacon [Inkumsah et al. ASE 08], Symclat [d'Amorim et al. ASE 06]
 Environments e.g., db, file systems, network, …
 DBApp Testing [Taneja et al. ESEC/FSE 11], [Pan et al. ASE 11]
 CloudApp Testing [Zhang et al. IEEE Soft 12]
 Loops
 Fitnex [Xie et al. DSN 09]
 Code evolution
 eXpress [Taneja et al. ISSTA 11]

Download counts (20 months)
(Feb. 2008 - Oct. 2009 )
Academic: 17,366
Devlabs: 13,022
Total: 30,388

http://research.microsoft.com/projects/pex/

http://pexase.codeplex.com/
Publications: http://research.microsoft.com/en-us/projects/pex/community.aspx#publications

Running Symbolic PathFinder ...
…
=====================================
================= results
no errors detected
=====================================
================= statistics
elapsed time: 0:00:02
states:
end=2
search:
new=4, visited=0, backtracked=4,
maxDepth=3, constraints=0
choice generators: thread=1, data=2
…
heap: gc=3, new=271, free=22
instructions: 2875
max memory: 81MB
loaded code: classes=71, methods=884

15

 Example: Dynamic Symbolic Execution/Concolic Testing
 Challenge: path explosion

Total block coverage achieved is 50%, lowest coverage 16%.

 object-creation problems (OCP) - 65%
 external-method call problems (EMCP) – 27%
16

00: class Graph : IVEListGraph { …
03: public void AddVertex (IVertex v) { [Thummalapenta et al. OOPSLA 11]
04: vertices.Add(v); // B1 }
06: public Edge AddEdge (IVertex v1, IVertex v2) {
07: if (!vertices.Contains(v1))
 A graph example from
08: throw new VNotFoundException(""); QuickGraph library
09: // B2
11: throw new VNotFoundException("");  Includes two classes
12: // B3 Graph
14: Edge e = new Edge(v1, v2);
15: edges.Add(e); } }
DFSAlgorithm

//DFS:DepthFirstSearch
18: class DFSAlgorithm { …
 Graph
23: public void Compute (IVertex s) { ... AddVertex
24: if (graph.GetEdges().Size() > 0) { // B4
25: isComputed = true;
AddEdge: requires
26: foreach (Edge e in graph.GetEdges()) { both vertices to be
27: ... // B5 in graph
28: }
29: } } } 17
17

 Test target: Cover true
00: class Graph : IVEListGraph { … branch (B4) of Line 24
03: public void AddVertex (IVertex v) { [Thummalapenta et al. OOPSLA 11]
04: vertices.Add(v); // B1 }
06: public Edge AddEdge (IVertex v1, IVertex v2) {
 Desired object
08: throw new VNotFoundException(""); state: graph should
09: // B2 include at least one
10: if (!vertices.Contains(v2)) edge
11: throw new VNotFoundException("");
12: // B3  Target sequence:
14: Edge e = new Edge(v1, v2);
15: edges.Add(e); } } Graph ag = new Graph();
Vertex v1 = new Vertex(0);
//DFS:DepthFirstSearch Vertex v2 = new Vertex(1);
18: class DFSAlgorithm { … ag.AddVertex(v1);
23: public void Compute (IVertex s) { ... ag.AddVertex(v2);
24: if (graph.GetEdges().Size() > 0) { // B4 ag.AddEdge(v1, v2);
25: isComputed = true; DFSAlgorithm algo = new
26: foreach (Edge e in graph.GetEdges()) { DFSAlgorithm(ag);
27: ... // B5 algo.Compute(v1);
28: }
29: } } } 18
18

 Example: Dynamic Symbolic Execution/Concolic (Pex)
 Challenge: path explosion

Total block coverage achieved is 50%, lowest coverage 16%.

 object-creation problems (OCP) - 65%
 external-method call problems (EMCP) – 27%
19

 Example 1:
 File.Exists has data dependencies
on program input
 Subsequent branch at Line 1 using 1
the return value of File.Exists.

 Example 2:
 Path.GetFullPath has data
dependencies on program input
 Path.GetFullPath throws
2
exceptions.

 Example 3: String.Format do
not cause any problem

3
20

Tackle object-creation problems with Factory Methods

21

Tackle external-method call problems with Mock Methods or
Method Instrumentation
Mocking System.IO.File.ReadAllText

22

Tools Typically Don’t
Communicate Challenges
Faced by Them to Enable
Running Symbolic PathFinder ... Cooperation between Tools
…
=====================================
================= results
no errors detected
and Users
=====================================
================= statistics
elapsed time: 0:00:02
states:
end=2
search:
new=4, visited=0, backtracked=4,
maxDepth=3, constraints=0
choice generators: thread=1, data=2
…
heap: gc=3, new=271, free=22
instructions: 2875
max memory: 81MB
loaded code: classes=71, methods=884

23

 Machine is better at task set A
 Mechanical, tedious, repetitive tasks, …
 Ex. solving constraints along a long path

 Human is better at task set B
 Intelligence, human intent, abstraction, domain
knowledge, …
 Ex. local reasoning after a loop, recognizing naming
semantics

=A U B 24

 Human-Assisted Computing
 Driver: tool Helper: human
 Ex. Covana [Xiao et al. ICSE 2011]

 Human-Centric Computing
 Driver: human  Helper: tool
 Ex. Coding duels @Pex for Fun

Interfaces are important. Contents are important too!
25

 Motivation
 Tools are often not powerful enough
 Human is good at some aspects that tools are not

 What difficulties does the tool face?
 How to communicate info to the user to get help?

Iterations to form Feedback Loop

 How does the user help the tool based on the info?
26

 Motivation
 Tools are often not powerful enough
 Human is good at some aspects that tools are not

 What difficulties does the tool face?
 How to communicate info to the user to get help?

Iterations to form Feedback Loop

 How does the user help the tool based on the info?
27

external-method call problems (EMCP)

object-creation problems (OCP)

28

 Existing solution
 identify all executed external-method calls
 report all object types of program inputs and fields

 Limitations
 the number is often high
 some identified problem are irrelevant for achieving
higher structural coverage

29

Reported EMCPs: 44
Reported OCPs: 18
vs.
Real EMCPs: 0
Real OCPs: 5
30

[Xiao et al. ICSE 11]

 Goal: Precisely identify problems faced by tools
when achieving structural coverage

 Insight: Partially-Covered Statements have
data dependency on real problem candidates

Xusheng Xiao, Tao Xie, Nikolai Tillmann, and Jonathan de Halleux. Precise Identification of
Problems for Structural Test Generation. In Proc. ICSE 2011 31

Runtime Problem
Events Candidate
Program Identification

Generated Forward
Test Inputs Symbolic Problem
Execution Candidates

Runtime
Coverage
Information

Data
Identified Dependence
Problems Analysis 32

 External-method calls whose arguments have data
dependencies on program inputs

Data Dependencies

33

Symbolic Expression:
return(File.Exists) == true

Element of
EMCP Candidate:
return(File.Exists)

 Partially-covered branch
Branch Statement Line 1 has data statements have data
dependency on File.Exists at Line 1 dependencies on EMCP
candidates for return values
34

 Subjects:
 xUnit: unit testing framework for .NET
▪ 223 classes and interfaces with 11.4 KLOC
 QuickGraph: C# graph library
▪ 165 classes and interfaces with 8.3 KLOC

 Evaluation setup:
 Apply Pex to generate tests for program under test
 Feed the program and generated tests to Covana
 Compare existing solution and Covana
35

 RQ1: How effective is Covana in identifying
the two main types of problems, EMCPs and
OCPs?

 RQ2: How effective is Covana in pruning
irrelevant problem candidates of EMCPs and
OCPs?

36

Covana identifies
• 43 EMCPs with only 1 false positive and 2 false negatives
• 155 OCPs with 20 false positives and 30 false negatives. 37

Covana prunes
• 97% (1567 in 1610) EMCP candidates with 1 false positive and 2 false negatives
• 66% (296 in 451) OCP candidates with 20 false positives and 30 false negatives
38

[Xiao et al. ICSE 2011]
 Task: What need to automate?
 Test-input generation
 What difficulties does the tool face?
 Doesn’t know which methods to instrument and explore
 Doesn’t know how to generate effective method sequences
 How to communicate info to the user to get her help?
 Report encountered problems
 How does the user help the tool based on the info?
 Instruct which external methods to instrument/write mock objects
 Write factory methods for generating objects
 Iterations to form feedback loop?
 Yes, till the user is happy with coverage or impatient

 Driver: tool Helper: human
 Ex. Covana [Xiao et al. ICSE 2011]

 Driver: human  Helper: tool
 Ex. Coding duels @Pex for Fun

Interfaces are important. Contents are important too!
40

www.pexforfun.com

1,083,640 clicked 'Ask Pex!'

The contributed concept of
Coding Duel games as major
game type of Pex for Fun since
Summer 2010
41

behavior
Secret Impl == Player Impl Player Implementation

Secret Implementation class Player {
public static int Puzzle(int x) {
class Secret { return x;
}
if (x <= 0) return 1;
}
return x * Puzzle(x-1);
}
}

class Test {
public static void Driver(int x) {
if (Secret.Puzzle(x) != Player.Puzzle(x))
throw new Exception(“Mismatch”);
}
} 42

 Coding duels at http://www.pexforfun.com/
 Task for Human: write behavior-equiv code class Player {
return x;
 Human  Tool }
}

 Does my new code behave differently? How exactly?

 Human  Tool
 Could you fix your code to handle failed/passed tests?

 Iterations to form feedback loop?
 Yes, till tool generates no failed tests/player is impatient

 Coding duels at http://www.pexforfun.com/
 Brain exercising/learning while having fun
 Fun: iterative, adaptive/personalized, w/ win criterion
 Abstraction/generalization, debugging, problem solving

Brain exercising

Especially valuable in Massive Open Online Courses (MOOC)

 Everyone can
contribute
Internet  Coding duels
 Duel solutions

class Secret {
if (x <= 0) return 1;
return x * Puzzle(x-1); } } 47

Puzzle Games Made from
Internet
Difficult Constraints or Object-
Creation Problems

Ning Chen and Sunghun Kim. Puzzle-based Automatic Testing: bringing humans into the loop by
solving puzzles. In Proc. ASE 2012 Supported by MSR SEIF Award

http://www.cs.washington.edu/verigames/

StackMine [Han et al. ICSE 12]

Pattern Matching

Bug update

Internet Problematic Bug Database
Pattern Repository
Bug
filing

Trace collection Trace Storage
Trace analysis
Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large
via Mining Millions of Stack Traces. In Proc. ICSE 2012 50

“We believe that the MSRA tool is highly valuable and
much more efficient for mass trace (100+ traces) analysis.
For 1000 traces, we believe the tool saves us 4-6 weeks of
time to create new signatures, which is quite a significant
productivity boost.”
- from Development Manager in Windows

Highly effective new issue discovery on
Windows mini-hang

Continuous impact on future Windows versions

Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large
via Mining Millions of Stack Traces. In Proc. ICSE 2012

 Static analysis + dynamic analysis
 Static checking + Test generation
 …
 Dynamic analysis + static analysis
 Fix generation + fix validation
 …
 Static analysis + static analysis
 …
 Dynamic analysis + dynamic analysis
 …
Example: Xiaoyin Wang, Lu Zhang, Tao Xie, Yingfei Xiong, and Hong Mei. Automating Presentation
Changes in Dynamic Web Applications via Collaborative Hybrid Analysis. In Proc. FSE 2012 52

 Covana

 Pex for Fun

 Human-Human Cooperation
 StackMine

 Wonderful current/former students@NCSU ASE

 Collaborators, especially those from Microsoft
Research Redmond/Asia, Peking University
 Colleagues who gave feedback and inspired me
NSF grants CCF-0845272, CCF-0915400, CNS-0958235, ARO grant W911NF-08-1-0443, an
NSA Science of Security, Lablet grant, a NIST grant, a 2011 Microsoft Research SEIF Award

Questions ?

https://sites.google.com/site/asergrp/

ACM Distinguished Program: Cooperative Testing and Analysis: Human-Tool, Tool-Tool, and Human-Human Cooperations to Get the Job Done

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie ACM Distinguished Program: Cooperative Testing and Analysis: Human-Tool, Tool-Tool, and Human-Human Cooperations to Get the Job Done

Ähnlich wie ACM Distinguished Program: Cooperative Testing and Analysis: Human-Tool, Tool-Tool, and Human-Human Cooperations to Get the Job Done (20)

Mehr von Tao Xie

Mehr von Tao Xie (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

ACM Distinguished Program: Cooperative Testing and Analysis: Human-Tool, Tool-Tool, and Human-Human Cooperations to Get the Job Done

Hinweis der Redaktion