Ensuring Technical Readiness For Copilot in Microsoft 365
ACM Distinguished Program: Cooperative Testing and Analysis: Human-Tool, Tool-Tool, and Human-Human Cooperations to Get the Job Done
1. Human-Tool, Tool-Tool, and Human-Human
Cooperations to Get the Job Done
Tao Xie
North Carolina State University
Raleigh, NC, USA
2.
3. IBM's Deep Blue defeated chess champion
Garry Kasparov in 1997
IBM Watson defeated top human Jeopardy!
players in 2011
4. Category U.S. CITIES: “Its largest airport was named for a World War II
hero; its second largest, for a World War II battle”
Responses of Rutter and Jennings: “What is Chicago?”
Response of Watson: "What is Toronto?????"
9. Human Factors
http://www.dagstuhl.de/programm/kalender/semhp/?semnr=1011
2010 Dagstuhl Seminar 10111
Practical Software Testing: Tool Automation and Human Factors
10. Recent advanced technique: Dynamic
Symbolic Execution/Concolic Testing
Instrument code to explore feasible paths
Example tool: Pex from Microsoft
Research (for .NET programs)
Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: directed automated random
testing. In Proc. PLDI 2005
Koushik Sen, Darko Marinov, and Gul Agha. CUTE: a concolic unit testing engine for C. In Proc.
ESEC/FSE 2005
Nikolai Tillmann and Jonathan de Halleux. Pex - White Box Test Generation for .NET. In Proc.
TAP 2008 10
11. Choose next path
Code to generate inputs for: Solve Execute&Monitor
void CoverMe(int[] a)
Constraints to solve Data Observed constraints
{
if (a == null) return; null a==null
if (a.Length > 0)
if (a[0] == 1234567890) a!=null {} a!=null &&
!(a.Length>0)
throw new Exception("bug");
a!=null && {0} a!=null &&
} a.Length>0 Negated condition
a.Length>0 &&
a[0]!=1 3 5 7 9
24680
F a==null a!=null && {2…
13} a!=null &&
T a.Length>0 && a.Length>0 &&
a[0]==1 3 5 7 9
24680 a[0]==1 3 5 7 9
24680
F a.Length>0 T
Done: There is no path left.
a[0]==123…
F T
12. @NCSU ASE
Method sequences
MSeqGen/Seeker [Thummalapenta et al. OOSPLA 11, ESEC/FSE 09],
Covana [Xiao et al. ICSE 2011], OCAT [Jaygarl et al. ISSTA 10],
Evacon [Inkumsah et al. ASE 08], Symclat [d'Amorim et al. ASE 06]
Environments e.g., db, file systems, network, …
DBApp Testing [Taneja et al. ESEC/FSE 11], [Pan et al. ASE 11]
CloudApp Testing [Zhang et al. IEEE Soft 12]
Loops
Fitnex [Xie et al. DSN 09]
Code evolution
eXpress [Taneja et al. ISSTA 11]
20. Example 1:
File.Exists has data dependencies
on program input
Subsequent branch at Line 1 using 1
the return value of File.Exists.
Example 2:
Path.GetFullPath has data
dependencies on program input
Path.GetFullPath throws
2
exceptions.
Example 3: String.Format do
not cause any problem
3
20
22. Tackle external-method call problems with Mock Methods or
Method Instrumentation
Mocking System.IO.File.ReadAllText
22
23. Tools Typically Don’t
Communicate Challenges
Faced by Them to Enable
Running Symbolic PathFinder ... Cooperation between Tools
…
=====================================
================= results
no errors detected
and Users
=====================================
================= statistics
elapsed time: 0:00:02
states:
end=2
search:
new=4, visited=0, backtracked=4,
maxDepth=3, constraints=0
choice generators: thread=1, data=2
…
heap: gc=3, new=271, free=22
instructions: 2875
max memory: 81MB
loaded code: classes=71, methods=884
23
24. Machine is better at task set A
Mechanical, tedious, repetitive tasks, …
Ex. solving constraints along a long path
Human is better at task set B
Intelligence, human intent, abstraction, domain
knowledge, …
Ex. local reasoning after a loop, recognizing naming
semantics
=A U B 24
25. Human-Assisted Computing
Driver: tool Helper: human
Ex. Covana [Xiao et al. ICSE 2011]
Human-Centric Computing
Driver: human Helper: tool
Ex. Coding duels @Pex for Fun
Interfaces are important. Contents are important too!
25
26. Motivation
Tools are often not powerful enough
Human is good at some aspects that tools are not
What difficulties does the tool face?
How to communicate info to the user to get help?
Iterations to form Feedback Loop
How does the user help the tool based on the info?
26
27. Motivation
Tools are often not powerful enough
Human is good at some aspects that tools are not
What difficulties does the tool face?
How to communicate info to the user to get help?
Iterations to form Feedback Loop
How does the user help the tool based on the info?
27
29. Existing solution
identify all executed external-method calls
report all object types of program inputs and fields
Limitations
the number is often high
some identified problem are irrelevant for achieving
higher structural coverage
29
31. [Xiao et al. ICSE 11]
Goal: Precisely identify problems faced by tools
when achieving structural coverage
Insight: Partially-Covered Statements have
data dependency on real problem candidates
Xusheng Xiao, Tao Xie, Nikolai Tillmann, and Jonathan de Halleux. Precise Identification of
Problems for Structural Test Generation. In Proc. ICSE 2011 31
32. Runtime Problem
Events Candidate
Program Identification
Generated Forward
Test Inputs Symbolic Problem
Execution Candidates
Runtime
Coverage
Information
Data
Identified Dependence
Problems Analysis 32
33. External-method calls whose arguments have data
dependencies on program inputs
Data Dependencies
33
34. Symbolic Expression:
return(File.Exists) == true
Element of
EMCP Candidate:
return(File.Exists)
Partially-covered branch
Branch Statement Line 1 has data statements have data
dependency on File.Exists at Line 1 dependencies on EMCP
candidates for return values
34
35. Subjects:
xUnit: unit testing framework for .NET
▪ 223 classes and interfaces with 11.4 KLOC
QuickGraph: C# graph library
▪ 165 classes and interfaces with 8.3 KLOC
Evaluation setup:
Apply Pex to generate tests for program under test
Feed the program and generated tests to Covana
Compare existing solution and Covana
35
36. RQ1: How effective is Covana in identifying
the two main types of problems, EMCPs and
OCPs?
RQ2: How effective is Covana in pruning
irrelevant problem candidates of EMCPs and
OCPs?
36
37. Covana identifies
• 43 EMCPs with only 1 false positive and 2 false negatives
• 155 OCPs with 20 false positives and 30 false negatives. 37
38. Covana prunes
• 97% (1567 in 1610) EMCP candidates with 1 false positive and 2 false negatives
• 66% (296 in 451) OCP candidates with 20 false positives and 30 false negatives
38
39. [Xiao et al. ICSE 2011]
Task: What need to automate?
Test-input generation
What difficulties does the tool face?
Doesn’t know which methods to instrument and explore
Doesn’t know how to generate effective method sequences
How to communicate info to the user to get her help?
Report encountered problems
How does the user help the tool based on the info?
Instruct which external methods to instrument/write mock objects
Write factory methods for generating objects
Iterations to form feedback loop?
Yes, till the user is happy with coverage or impatient
40. Human-Assisted Computing
Driver: tool Helper: human
Ex. Covana [Xiao et al. ICSE 2011]
Human-Centric Computing
Driver: human Helper: tool
Ex. Coding duels @Pex for Fun
Interfaces are important. Contents are important too!
40
41. www.pexforfun.com
1,083,640 clicked 'Ask Pex!'
The contributed concept of
Coding Duel games as major
game type of Pex for Fun since
Summer 2010
41
42. behavior
Secret Impl == Player Impl Player Implementation
Secret Implementation class Player {
public static int Puzzle(int x) {
class Secret { return x;
public static int Puzzle(int x) {
}
if (x <= 0) return 1;
}
return x * Puzzle(x-1);
}
}
class Test {
public static void Driver(int x) {
if (Secret.Puzzle(x) != Player.Puzzle(x))
throw new Exception(“Mismatch”);
}
} 42
43. Coding duels at http://www.pexforfun.com/
Task for Human: write behavior-equiv code class Player {
public static int Puzzle(int x) {
return x;
Human Tool }
}
Does my new code behave differently? How exactly?
Human Tool
Could you fix your code to handle failed/passed tests?
Iterations to form feedback loop?
Yes, till tool generates no failed tests/player is impatient
44. Coding duels at http://www.pexforfun.com/
Brain exercising/learning while having fun
Fun: iterative, adaptive/personalized, w/ win criterion
Abstraction/generalization, debugging, problem solving
Brain exercising
47. Everyone can
contribute
Internet Coding duels
Duel solutions
class Secret {
public static int Puzzle(int x) {
if (x <= 0) return 1;
return x * Puzzle(x-1); } } 47
48. Puzzle Games Made from
Internet
Difficult Constraints or Object-
Creation Problems
Ning Chen and Sunghun Kim. Puzzle-based Automatic Testing: bringing humans into the loop by
solving puzzles. In Proc. ASE 2012 Supported by MSR SEIF Award
50. StackMine [Han et al. ICSE 12]
Pattern Matching
Bug update
Internet Problematic Bug Database
Pattern Repository
Bug
filing
Trace collection Trace Storage
Trace analysis
Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large
via Mining Millions of Stack Traces. In Proc. ICSE 2012 50
51. “We believe that the MSRA tool is highly valuable and
much more efficient for mass trace (100+ traces) analysis.
For 1000 traces, we believe the tool saves us 4-6 weeks of
time to create new signatures, which is quite a significant
productivity boost.”
- from Development Manager in Windows
Highly effective new issue discovery on
Windows mini-hang
Continuous impact on future Windows versions
Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large
via Mining Millions of Stack Traces. In Proc. ICSE 2012
52. Static analysis + dynamic analysis
Static checking + Test generation
…
Dynamic analysis + static analysis
Fix generation + fix validation
…
Static analysis + static analysis
…
Dynamic analysis + dynamic analysis
…
Example: Xiaoyin Wang, Lu Zhang, Tao Xie, Yingfei Xiong, and Hong Mei. Automating Presentation
Changes in Dynamic Web Applications via Collaborative Hybrid Analysis. In Proc. FSE 2012 52
53. Human-Assisted Computing
Covana
Human-Centric Computing
Pex for Fun
Human-Human Cooperation
StackMine
54. Wonderful current/former students@NCSU ASE
Collaborators, especially those from Microsoft
Research Redmond/Asia, Peking University
Colleagues who gave feedback and inspired me
NSF grants CCF-0845272, CCF-0915400, CNS-0958235, ARO grant W911NF-08-1-0443, an
NSA Science of Security, Lablet grant, a NIST grant, a 2011 Microsoft Research SEIF Award
(1) the external-methodcallproblem (EMCP), where tools cannot deal with methodcalls to external libraries; (2) the object-creation problem(OCP), where tools fails to generate method-call sequencesto produce desirable object states.
External-method calls whose arguments have data dependencies on program inputs (e.g., NOT method calls that print constant strings or put a thread to sleep for some time)
Computing data dependencies of branch statements containing not-covered branches (partially-covered branch statements) on problem candidatesFor each collected symbolic expression sym found in the predicates of a branch statement b, Covana extracts elements of the problem candidates elem from sym.From elem, Covana extracts the corresponding problem candidates P and considers b has data dependency on P. Using the collected structural coverage, Covana further computes data dependencies of partially-covered branch statement on problem candidates.