SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
Research Methods
     in Computer Science
(Serge Demeyer — University of Antwerp)

              AnSyMo
Antwerp Systems and software Modelling
       http://ansymo.ua.ac.be/




     Universiteit Antwerpen
Helicopter View




                                         (Ph.D.)
                                        Research




       How to perform research ?                    How to write research ?
      (and get “empirical” results)                (and get papers accepted)

          How many of you have
        done / will do a case-study ?




1. Research Methods                                                            2
Zürich Kunsthaus   Antwerp Middelheim
1. Research Methods
Introduction
  • Origins of Computer Science
  • Research Philosophy
Research Methods
  • 1. Feasibility study
  • 2. Pilot Case
  • 3. Comparative study
  • 4. Observational Study [a.k.a. Etnography]
  • 5. Literature survey
  • 6. Formal Model
  • 7. Simulation
Conclusion
  • Studying a Case
    vs. Performing a Case Study
    + Proposition
    + Unit of Analysis
    + Threats to Validity



1. Research Methods                              4
What is (Ph.d.) Research ?
        http://gizmodo.com/5613794/what-is-exactly-a-doctorate




       Human             Elementary
      Knowledge            School               High School      Bachelor




                                   Ph.D.                              Ph.D.
            Master             (early stages)                      (finished)




1. Research Methods                                                             5
Computer Science


     All science is either physics or stamp collecting (E. Rutherford)
                         We study artifacts produced by humans




            Computer science is no more about computers than
               astronomy is about telescopes. (E. Dijkstra)


                  Computer science
                                                      Computer engineering


                           Informatics
                                           Software Engineering




1. Research Methods                                                          6
Science vs. Engineering


                       Science                          Engineering



                      Physics
                                                     Civil Engineering
                                          ???
                           Chemistry   Computer             Electronics
                                        Science
             Biology                      ???
                                       Software Chemistry and Materials
                          Mathematics Engineering
                                          ???
               Geography                                Electro-Mechanical
                                                           Engineering




1. Research Methods                                                          7
Mathematical Origins
Turing Machines
  • Halting problem                       (inductive) Reasoning
                                            • logical argumentation
Algorithmic Complexity                          + formal models,
  • P = ? NP                                       theorem proving, …
                                                + axioms & lemma’s
Compilers                                       + foo, bar type of examples
  • Chomsky hierarchy                       • “deep” and generic universal
                                              knowledge
Databases
  • Relational model



   Gödel theorem: consistency of the system is not provable in the system.

                      A complete and consistent set of axioms
                       for all of mathematics is impossible



1. Research Methods                                                           8
Engineering Origins
Computer Engineering                     Empirical Approach
  • Moore’s law: “the number of           • Tom De Marco: “you cannot
    transistors on a chip will double       control what you cannot
    about every two years”                  measure”
       + Self-fulfilling prophesy              + quantify
  • Hardware technology                        + mathematical model
       + RISC vs. CISC                    • Pareto principle
       + MPSoC                                 + 80 % - 20 % rule
  • Compiler optimization                        (80% of the effects come
       + peephole optimization                   from 20% of the causes)
       + branch prediction



                     As good as your next observation.
      Premise: The sun has risen in the east every morning up until now.
      Conclusion: The sun will also rise in the east tomorrow. … Or Not ?




1. Research Methods                                                         9
Influence of Society
                           Lives are at stake
                           (e.g., automatic pilot,
                           nuclear power plants)

                                         Huge amounts of money
                                         are at stake
                                         (e.g., Ariane V crash,
                                         Denver Airport Baggage)



                      Software became Ubiquitous
                        … its not a hobby anymore


                                                      Corporate success or failure
                                                      is at stake (e.g., telephone
                                                      billing, VTM launching 2nd
                                                      channel)




1. Research Methods                                                                  10
Interdisciplinary Nature

   “Hard”
                        Science                Engineering
  Sciences




                                   Computer
                                    Science




                                                              Action
                                                             Research
   “Soft”
                      Economics                Sociology
  Sciences
                                  Psychology

1. Research Methods                                                     11
The Oak Forest
Robert Zünd - 1882
Franz and Luciano
Franz Gertsch - 1973
Objective             Subjective

   • Plato’s cave




   • Scientific Paradigm (Kuhn)
       + Dominant paradigm / Competing paradigms / Paradigm shift
            ➡ Normal science vs. Revolutionary science



1. Research Methods                                                 14
Dominant view on Research Methods
Physics                                   Medicine
(“The” Scientific method)                 (Double-blind treatment)
  • form hypothesis about a                 • form hypothesis about a
    phenomenon                                treatment
  • design experiment                       • select experimental and control
  • collect data                              groups that are comparable
  • compare data to hypothesis                except for the treatment
  • accept or reject hypothesis             • collect data
       + … publish (in Nature)              • commit statistics on the data
  • get someone else to repeat              • treatment      difference
    experiment (replication)                  (statistically significant)

                      Cannot answer the “big” questions
                      … in timely fashion
                      •smoking is unhealthy
                      •climate change
                      •darwin theory vs. intelligent design
                      •…
                      •agile methods


1. Research Methods                                                         15
Experiment principles
  source: C. Wohlin, P. Runeson, M. Höst, M. Ohlsson, B. Regnell, and A. Wesslén.
  Experimentation in Software Engineering - An Introduction. Kluwer Academic Publishers,
  2000                                                              “Bo
                                Experiment objective               • To ring
                                                                       o m to r
                                                                   res           ea
  THEORY                                                               ear uch f d” s
                                                                           ch    ocu yn
                                                                              pro s o drom
                                                                                 ced n p     e
                                                                                    ure rop
                                       cause-effect                                         er
                                          construct
                       Cause                                     Effect
                      construct                                construct




  OBSERVATION
                                         treatment-
                                          outcome
                                          construct
                      Treatment                                Outcome

             Independent variable                        Dependent variable
                                  Experiment operation
1. Research Methods                                                                              16
!"""!"#$%&'()*+%$,(-&./%$(01(23456"

Research Methods in Computer Science
Different Sources                                                           Static analysis

   • Marvin V. Zelkowitz and Dolores R.                                   Lesso ns learned

     Wallace, "Experimental Models for                                        Legacy data

     Validating Technology", IEEE                                        Literat ure search


     Computer, May 1998.                                                        Field st u dy




                                                  Validation method
                                                                                 Assertio n

                                                                                Case st u dy
   • Easterbrook, S. M., Singer, J., Storey,
                                                                       Project mo nit orin g
     M, and Damian, D. Selecting Empirical                                      Simulatio n                                            1995 (152 papers)
     Methods for Software Engineering                                    Dynamic analysis
                                                                                                                                       1990 (217 papers)
                                                                                                                                       1985 (243 papers)
     Research. Appears in F. Shull and J.                                        Syn t hetic
     Singer (eds) "Guide to Advanced                                            Replicated
     Empirical Software Engineering",                                 No experimen tatio n
                                                                                                                                                                           "

     Springer, 2007.                           0371(81"#$%&'"()*+,-$.&,/"0&+1"/23+$%&'/"*.,"1&-14&-1+,5"&6"$.*6-,78"
                                                                                                0   5   10     15       20        25         30        35        40

   • Gordona Dodif-Crnkovic, “Scientific            "
                                                                                                              Percen tage o f papers

     Methods in Computer Science”
                                                lection method that conforms to any one of the 12             validate the claims in the paper. For completeness we
   • Andreas Höfer, Walter F. Tichy, Status     given data collection methods.
                                                   Our 12 methods are not the only ways to classify
                                                                                                              added the following two classifications:

     of Empirical Research in Software          data collection, although we believe they are the most         1. Not applicable. Some papers did not address some
                                                comprehensive. For example, Victor Basili6 calls an               new technology, so the concept of data collection does
     Engineering, Empirical Software            experiment in vivo when it is run at a development loca-          not apply. For example, a paper summarizing a recent
                                                tion and in vitro when it is run in an isolated, controlled
     Engineering Issues, p. 10-19,
                                                                                                                  conference or workshop wouldn’t be applicable.
                                                setting. According to Basili, a project may involve one        2. No experiment. Some papers describing a new

     Springer, 2007.                            team of developers or multiple teams, and an experi-
                                                ment may involve one project or multiple projects. This
                                                                                                                  technology contained no experimental validations.

                                                variability permits eight different experiment classifi-          In our survey, we were interested in the data col-
                                                cations. On the other hand, Barbara Kitchenham7 con-          lection methods employed by the authors of the papers
                                                siders nine classifications of experiments divided into        in order to determine our classification scheme’s com-
                                                three general categories: a quantitative experiment to        prehensiveness. We tried to distinguish between data
                                                identify measurable benefits of using a method or tool,        used as a demonstration of concept (which may
                                                a qualitative experiment to assess the features provided      involve some measurements as a “proof of concept,”
                                                by a method or tool, and a benchmarking experiment            but not a full validation of the method) and a true
                                                to determine performance.                                     attempt at validation of their results.
                                                                                                                 As in the study by Walter Tichy,8 we considered a
1. Research Methods                             MODEL VALIDATION
                                                     To test whether the classification presented here
                                                                                                                                                             17
                                                                                                              demonstration of technology via example as part of
                                                                                                              the analytical phase. The paper had to go beyond that "
Case studies - Spectrum

 case studies are widely used in computer science                      7. Simulation
   “studying a case” vs. “doing a case study”
                                                                       • what if ?
                                                           6. Formal Model
                                                           • underlying concepts ?
                                             5. Literature survey
                                             • what is known/unknown ?
                                 4. Observational Study
                                 • What is “it” ?
                      3. Comparative study
                      • is it better ?
         2. Pilot Case, Demonstrator
         • is it appropriate ?
 1. Feasibility study
 • is it possible ?
                                                    Source: Personal experience
                                                    (Guidelines for Master Thesis Research –
                                                    University of Antwerp)

1. Research Methods                                                                            18
The sixteenth of september
            Rene Margritte
Feasibility Study
Here is a new idea, is it possible ?
           ➡ Metaphor: Christopher Columbus and western route to India

   • Is it possible to solve a specific kind of problem … effectively ?
        + computer science perspective (P = NP, Turing test, …)
        + engineering perspective (build efficiently; fast — small)
        + economic perspective (cost effective; profitable)
   • Is the technique new / novel / innovative ?
        + compare against alternatives
             ➡ See literature survey; comparative study

   • Proof by construction
       + build a prototype
       + often by applying on a “CASE”

   • Conclusions
       + primarily qualitative; "lessons learned"
       + quantitative
         - economic perspective: cost - benefit
         - engineering perspective: speed - memory footprint

1. Research Methods                                                       20
The Prophet
Pablo Gargallo
Pilot Case (a.k.a. Demonstrator)
Here is an idea that has proven valuable; does it work for us ?
           ➡ Metaphor: Portugal (Amerigo Vespucci) explores western route

   • proven valuable
       + accepted merits (e.g. “lessons learned” from feasibility study)
       + there is some (implicit) theory explaining why the idea has merit
   • does it work for us
       + context is very important

   • Demonstrated on a simple yet representative “CASE”
       + “Pilot case” ≠ “Pilot Study”

   • Proof by construction
       + build a prototype
       + apply on a “case”

   • Conclusions
       + primarily qualitative; "lessons learned"
       + quantitative; preferably with predefined criteria
           ➡ compare to context before applying the idea !!

1. Research Methods                                                          22
Walking man
  Standing Figure
– Alberto Giacometti
Comparative Study
Here are two techniques, which one is better ?
  • for a given purpose !
       + (Not necessarily absolute ranking)
  • Where are the differences ? What are the tradeoffs ?

   • Criteria check-list
       + predefined
          - should not favor one technique
       + qualitative and quantitative
          - qualitative: how to remain unbiased ?
          - quantitative: represent what you want to know ?
       + Criteria check-list should be complete and reusable !
             ➡ If done well, most important contribution (replication !)
             ➡ See literature survey

   • Score criteria check-list
       + Often by applying the technique on a “CASE”

   • Compare
       + typically in the form of a table

1. Research Methods                                                        24
Observational Study [Ethnography]
Understand phenomena through observations
         ➡ Metaphor: Diane Fossey “Gorillas in the Mist”

   • systematic collection of data derived from direct observation of the
     everyday life
       + phenomena is best understood in the fullest possible context
           ➡ observation & participation
           ➡ interviews & questionnaires

   • Observing a series of cases “CASE”
       + observation vs. participation ?

   • example: Action Research
         + Action research is carried out by people who usually recognize a problem or limitation in
           their workplace situation and, together, devise a plan to counteract the problem,
           implement the plan, observe what happens, reflect on these outcomes, revise the plan,
           implement it, reflect, revise and so on.


   • Conclusions
       + primarily qualitative: classifications/observations/…


1. Research Methods                                                                                    26
Torben Giehler   Paul Klee
Matterhorn         Niesen
Literature Survey
What is known ? What questions are still open ?

   • source: B. A. Kitchenham, “Procedures for Performing Systematic
     Reviews”, Keele University Technical Report EBSE-2007-01, 2007

Systematic
  • “comprehensive”
           ➡ precise research question is prerequisite
      + defined search strategy (rigor, completeness, replication)
      + clearly defined scope
        - criteria for inclusion and exclusion
      + specify information to be obtained
        - the “CASES” are the selected papers

   • outcome is organized

                 classification   taxonomy       conceptual model

                      table         tree            frequency



1. Research Methods                                                    28
Literature survey - example
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 0, NO. Survey of Program Comprehension through Dynamic Analysis
                         Cornelissen et al. - An Systematic 0, JANUARY 2000                                  5



                                     !"#$%&'()#)*%"$+                                      ;!#1$'#0!#"$+1'-
                                                                                                                                                              Source
              <#"$+1'-0*#,.
                                            $)$"$!10!#"$+1'
                                                                                                                                                              Bas Cornelissen, Andy Zaidman, Arie van
            #'1'9!)"09')&'-                                    7)$"$!10-'1'+"$,)           #'*'#')+'0+4'+3$)5            :$)!10-'1'+"$,)
           =&1>0?@@0A0=&)'0BCD
                                              -'1'+"$,)                                                                                                       Deursen, Leon Moonen, Rainer Koschke. A
                                                                                                                                                              Systematic Survey of Program
                                                                                           <#"$+1'-0*#,.                                                      Comprehension through Dynamic Analysis
                                                                                           ,"4'#09')&'-
          !"#$%&'()'&'%#$*+                                                                                                                                   IEEE Transactions on Software Engineering
                                                                                                                                                              (TSE): 35(5): 684-702, 2009.
                 <""#$%&"'                    !""#$%&"'                      7)$"$!1                            !""#$%&"'
               *#!.'2,#3                    5')'#!1$/!"$,)                 !""#$%&"'-                         $(')"$*$+!"$,)

                                                                                                              !##"$,-#'($.'+#$/$%0#$*+

                        !"#$%&',(("-+.)+%
                                                                                                          89'#9$'20,*
                              !""#$%&"'                        -&..!#$/!"$,)0,*                          +4!#!+"'#$/'(
                             !--$5).')"                          -$.$1!#02,#3                               !#"$+1'-

          !"#$%&'(%10"0%#'"$20#$*+                            Cornelissen et al. - An Systematic Survey of Program Comprehension through Dynamic Analysis
                                                                 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 0, NO. 0, JANUARY 2000                                                                                   22

                                               E'+,..')(!"$,)-                          $)"'#6#'"!"$,)
                                                                                                                           3+#'"4"'#0#$*+
                                                                  &**

Fig. 1.     Overview of the systematic survey process.
                                                                   )*                                         (#
                                                                                                                                                                                              (&
                                                                         !"
                                                                                                                                           #% #(
                                                                   !*                                                                            ##
vague, as it does not specify which properties are analyzed. To allow the definition to serve in
                                                                              "#
multiple problem domains, the exact properties under analysis are left open.                                                                          ') '!
                                                                   "*
                                                       $%                 $%                                                                                  $%                                   '* $%
   While the definition of dynamic analysis is rather abstract, we can elaborate upon the benefits                                                                   $# $# $" $'
                                                        &% &%
and limitations of using dynamic analysis in program comprehension contexts. The benefits that
                                             $*                                                   &'                   &&                                                                                  &$ && &*
                                                                                                          %                                                                      &*
we consider are:                                                                                                               (   (   #                                              #   #                           !

   •   The preciseness with regard to the actual*behavior of the software system, for example, in


                                                                            46>6 10
                                                                              1=3 -

                                                                                          ?
                                                                             ;- :
                                                                            5-3 234




                                                                                          ?

                                                                               > A/ 8
                                                                             0- /


                                                                           8- 2-/




                                                                                                                                                                                            1/ -
                                                                             62,8 </




                                                                                                                                                                                        31 7432
                                                                                          2


                                                                               /7 >




                                                                                                                                                                                                  5; 3




                                                                                                                                                                                        C@ 32?




                                                                                                                                                                                                     F>
                                                                             692 34




                                                                         +,/ <-6>
                                                                             5,46- >B




                                                                           D7 +>B
                                                                               4- 0 <




                                                                                          0
                                                                                         ,;


                                                                           @3 2,;/




                                                                              72, /

                                                                                     ,10




                                                                                                                                                                                          4,C -
                                                                                                                                                                                           2-0 -




                                                                                                                                                                                        ,1< /:1
                                                                             C- >




                                                                                                                                                                                                   ,34
                                                                               /4,; /
                                                                                    2;9




                                                                                       C
                                                                                    +,:




       the context of object-oriented software software with its late binding mechanism.
                                                                                      .




                                                                                   2+-




                                                                                     3;




                                                                                        -




                                                                                                                                                                                                 78
                                                                                   .-




                                                                                        ;
                                                                                        ;



                                                                                       :




                                                                                                                                                                                                   1
                                                                                    2,1




                                                                                                                                                                                      @2- 636,+
                                                                                                                                                                                                 4,1
                                                                                  76-




                                                                                     6<


                                                                       C@ /636
                                                                                       2
                                                                                 62 3




                                                                                  /6,




                                                                                                                                                                                    E1 7/62
                                                                                 -3




                                                                                23;
                                                                                      ,
                                                                                 1-




                                                                                                                                                                                    97 1=1:
                                                                                 <7




                                                                                    <
                                                                                7C
                                                                                + ,-




                                                                                -2?
                                                                                 67




                                                                                                                                                                                               ,1
                                                                                   6




                                                                                                                                                                                             :1
                                                                                93




                                                                                                                                                                                             32,
                                                                              > A3
                                                                               66>




                                                                                                                                                                                            6,
                                                                           2>=/
                                                                          + ,/




                                                                                                                                                                                       C3
       The fact that a goal-oriented strategy can be used, which entails the definition of an




                                                                                                                                                                                       :.
                                                                         /,0




                                                                          9-
                                                                        @2:




   •
                                                                        < ,/




                                                                        C7




                                                                                                                                                                                     D7


                                                                                                                                                                                     ;:
                                                                       <-




 1. Research Methods only the parts of interest of the software system are analyzed.                                                                                                                                           29
                                                                     ;:




                                                                                                                                                                                  71
    execution scenario such that
Klee
              Bergbahn




Vojin Bakic
       Bull
Formal Model
How can we understand/explain the world ?
  • make a mathematical abstraction of a certain problem
      + analytical model, stochastic model, logical model, re-write
        system, ...
      + often explained using a “CASE”
  • prove some important characteristics
      + based on inductive reasoning, axioms & lemma’s, …

Motivate
 • which factors are irrelevant (excluded) and which are not (included) ?
 • which properties are worthwhile (proven) ?
          ➡ See literature survey

                                                 Problem
                       Problem
                                                Properties

                                                    ?
                                               Mathematical
                      Abstraction
                                                Properties

1. Research Methods                                                         31
Hodler
Eiger, Mönch and Jungfrau in the Morning Sun




                                             Seurat
                                       Eiffel Tower
Simulation
What would happen if … ?

   • study circumstances of phenomena in detail
       + simulated because real world too expensive; too slow or impossible
   • make prognoses about what can happen in certain situations
       + test using real observations, typically obtained via a “CASE”

Motivate
 • which circumstances are irrelevant (excluded) and which are not
    (included) ?
 • which properties are worthwhile (to be observed/predicted) ?
           ➡ See literature survey

Examples
  • distributed systems (grid); network protocols
       + too expensive or too slow to test in real life
  • embedded systems — simulating hardware platforms
       + impossible to observe real clock-speed / memory footprint / …
            ➡ Heisenberg uncertainty principle



1. Research Methods                                                       33
Case studies - Revisited                                       7. Simulation: test
                                                              prognoses with real
 case studies are widely used in computer science           observations obtained
   “studying a case” vs. “doing a case study”                        via a “CASE”
                                                     6. Formal Model
                                                     often explained using a “CASE”

                                            5. Literature survey
                                            “CASES” = selected papers
                                  4. Observational Study
                                  Observing a series of “CASES”
                      3. Comparative study
                      Score criteria check-list; often by applying on a “CASE”
          2. Pilot Case, Demonstrator
          Demonstrated on a simple yet representative “CASE”
  1. Feasibility study
  Proof by construction; often by applying on a “CASE”




1. Research Methods                                                              34
Case Study Research
Introduction
  • Origins of Computer Science
  • Research Philosophy
Research Methods
  • 1. Feasibility study
  • 2. Pilot Case
  • 3. Comparative study
  • 4. Observational Study [a.k.a. Etnography]
  • 5. Literature survey                 Sources
  • 6. Formal Model                      • Robert K. Yin. Case Study Research:
  • 7. Simulation                          Design and Methods. 3rd Edition. SAGE
                                           Publications. California, 2009.
Conclusion                               • Bent Flyvbjerg, "Five
  • Studying a Case                        Misunderstandings About Case Study
     vs. Performing a Case Study           Research." Qualitative Inquiry, vol. 12,
    + Proposition                          no. 2, April 2006, pp. 219-245.
                                         • Runeson, P. and Höst, M. 2009.
    + Unit of Analysis                     Guidelines for conducting and reporting
    + Threats to Validity                  case study research in software
                                               engineering. Empirical Softw. Eng. 14,
                                               2 (Apr. 2009), 131-164.


1. Research Methods                                                                     35
Spectrum of cases
 created for explanation
 • foo, bar examples                        Toy-example
 • simple model;
   illustrates differences

                                                                              Martin S. Feather , Stephen Fickas ,
                      accepted teaching vehicle                               Anthony Finkelstein , Axel Van
                      • “textbook example”                 Exemplar           Lamsweerde, Requirements and
                      • simple but illustrates                                Specification Exemplars, Automated
                                                                              Software Engineering, v.4 n.4, p.
                        relevant issues                                       419-438, October 1997


                                          real-life example




                                                                                                                     Case study
Runeson, P. and Höst, M. 2009.
Guidelines for conducting and reporting
case study research in software
                                          • industrial system,            Case
engineering. Empirical Softw. Eng. 14,      open-source system
2 (Apr. 2009), 131-164.                   • context is difficult to grasp

      Mining Software Repositories Challenge.           competition (tool oriented)
      [Yearly workshop where research tools compete
      against one another on a common predefined        • approved by community                 Community case
      case.]                                            • comparing

  Susan Elliott Sim, Steve Easterbrook, and Richard C. Holt. Using             benchmark
  Benchmarking to Advance Research: A Challenge to Software
  Engineering, Proceedings of the Twenty-fifth International      Benchmark
                                                                               • approved by community
  Conference on Software Engineering, Portland, Oregon, pp.                    • known context
  74-83, 3-10 May, 2003.                                                       • “planted” issues

1. Research Methods                                                                                                      36
Case study — definition
A case study is an empirical inquiry that investigates a
contemporary phenomenon within its real-life context, especially
when the boundaries between the phenomenon and context are not
clearly evident
[Robert K. Yin. Case Study Research: Design and Methods; p. 13]

   • empirical inquiry: yes, it is empirical research
   • contemporary: (close to) real-time observations
       + incl. interviews
   • boundaries between the phenomenon and context not clear
       + as opposed to “experiment”




                                                       Context
      Treatment                Outcome
                                         Phenomenon

                  Experiment                    Case Study


1. Research Methods                                                37
Case Study — Counter evidence




                                           Context

                         Phenomenon




   - many more variables than data points
   - multiple sources of evidence; triangulation
   - theoretical propositions guide data collection
     (try to confirm or refute propositions with well-selected cases)

                                                          Case studies also look
                                                          for counter evidence




1. Research Methods                                                                38
Misunderstanding 2: Generalization
One cannot generalize on the basis of an individual case; therefore
the case study cannot contribute to scientific development.
               ➡ [Bent Flyvbjerg, "Five Misunderstandings About Case Study Research."]


   • Understanding
       + The power of examples
       + Formal generalization is overvalued
         - dominant research views of physics and medicine

   • Counterexamples
       + one black swan falsifies “all swans are white”
         - case studies generate deep understanding; what appears to be
           white often turns out to be black

   • sampling logic vs. replication logic
       + sampling logic: operational enumeration of entire universe
         - use statistics: generalize from “randomly selected” observations
       + replication logic: careful selection of boundary values
         - use logic reasoning: presence of absence of property has effect



1. Research Methods                                                                      39
Sampling Logic vs. Replication Logic




                                              Boundary value




                                           Selection of (boundary) value
      Random selection                       understand differences
        generalize for entire population          • propositions
                                                  • units of analysis



1. Research Methods                                                        40
Research questions for Case Studies
Existence:                 Exploratory            Relationship               Explanatory
  • Does X exist?                                   • Are X and Y related?
                                                    • Do occurrences of X correlate with
Description & Classification                           occurrences of Y?
  • What is X like?
  • What are its properties?                      Causality
  • How can it be categorized?                      • What causes X?
  • How can we measure it?                          • What effect does X have on Y?
  • What are its components?                        • Does X cause Y?
                                                    • Does X prevent Y?
Descriptive-Comparative
  • How does X differ from Y?                     Causality-Comparative
                                                    • Does X cause more Y than does Z?
Frequency and Distribution                          • Is X better at preventing Y than is Z?
  • How often does X occur?                         • Does X cause more Y than does Z
  • What is an average amount of X?                   under one condition but not others?

Descriptive-Process                               Design
  • How does X normally work?                       • What is an effective way to achieve
  • By what process does X happen?                     X?
  • What are the steps as X evolves?                • How can we improve X?


   Source: Empirical Research Methods in Requirements Engineering.
   Tutorial given at RE'07, New Delhi, India, Oct 2007.


1. Research Methods                                                                            41
Proposition (a.k.a. Purpose)




                                                                 Where to expect boundaries ?
                                                                  Thorough preparation is necessary !
                                                                      You need an explicit theory.
        Boundary value




                              Exploratory                                                             Confirmatory

                                                                             Confirmatory case studies are used to test existing
    Exploratory case studies are used as initial                             theories. The latter are especially important for
    investigations of some phenomena to derive new                           refuting theories: a detailed case study of a real
    hypotheses and build theories.(*)                                        situation in which a theory fails may be more
                                                                             convincing than failed experiments in the lab.(*)

    (*) Steve Easterbrook, Janice Singer, Margaret-Anne Storey, and Daniela Damian. Selecting empirical methods for soft- ware engineering
         research. In Forrest Shull, Janice Singer, and Dag I. K. Sjoberg, editors, Guide to Advanced Empirical Software Engineering, pages 285—311.
         Springer London, 2008.



1. Research Methods                                                                                                                                    42
Units of Analysis
What phenomena to analyze
 • depends on research questions
 • affects data collection & interpretation
 • affects generalizability
                                   Example: Clone Detection, Bug Prediction
                                   • the tool/algorithm
Possibilities
                                         Does it work ?
  • individual developer           • the individual developer
  • a team                               How/why does he produce bugs/clones ?
  • a decision                     • about the culture/process in the team
  • a process                            How does the team prevent bugs/clones ?
                                         How successful is this prevention ?
  • a programming language
                                   • about the programming language
  • a tool                               How vulnerable is the programming
                                         language towards clones / bugs ?
Design in advance                        (COBOL vs. AspectJ)
  • avoid “easy” units of analysis
      + cases restricted to Java because parser
        - Is the language really an issue for your research question ?
      + report size of the system (KLOC, # Classes, # Bug reports)
        - Is team composition not more important ?



1. Research Methods                                                                43
Threats to Validity (Experiments)
                                       Experiment objective

            THEORY
                                              cause-
                                               effect
                                             construct
                            Cause                               Effect
                           construct                          construct
                                                 4


             OBSERVATION           3                                  3
                                             treatment-
                                              outcome
                                              construct
                           Treatment                          Outcome

                                             1       2
                      Independent variable                Dependent variable

                                       Experiment operation
   1.   Conclusion validity
   2.   Internal validity
   3.   Construct validity
   4.   External validity


1. Research Methods                                                            44
Threats to validity (Case Studies)
   • Source: Runeson, P. and Höst, M. 2009. Guidelines for conducting and
     reporting case study research in software engineering.

1. Construct validity
  • Do the operational measures reflect what the researcher had in mind ?
2. Internal validity
  • Are there any other factors that may affect the results ?
            ➡ Mainly when investigating causality !
3. External validity
  • To what extent can the findings be generalized ?
            ➡ Precise research question & units of analysis required
4. Reliability
  • To what extent is the data and the analysis dependent on the
     researcher (the instruments, …)



Other categories have been proposed as well
  • credibility, transferability, dependability, confirmability



1. Research Methods                                                         45
Threats to validity — Examples (1/2)
1. Construct validity
  • Do the operational measures reflect what the researcher had in mind ?
  • Time recorded vs. time spent
  • Execution time, memory consumption, …
      + noise of operating system, sampling method
  • Human-assigned classifiers (bug severity, …)
      + risk for “default” values
  • Participants in interviews have pressure to answer positively

2. Internal validity
  • Are there any other factors that may affect the results ?
  • Were phenomena observed under special conditions
       + in the lab, close to a deadline, company risked bankruptcy, …
       + major turnover in team, contributors changed (open-source), …
  • Similar observations repeated over time (learning effects)




1. Research Methods                                                         46
Threats to validity — Examples (2/2)
3. External validity
  • To what extent can the findings be generalized ?
  • Does it apply to other languages ? other sizes ? other domains ?
  • Background & education of participants
  • Simplicity & scale of the team
      + small teams & flexible roles vs. large organizations & fixed roles

4. Reliability
  • To what extent is the data and the analysis dependent on the
    researcher (the instruments, …)
  • How did you cope with bugs in the tool, the instrument ?
  • Classification: if others were to classify, would they obtain the same ?
  • How did you search for evidence in mailing archives, bug reports, …




1. Research Methods                                                            47
Threats to validity = Risk Management
No experimental design can be “perfect”
… but you can limit the chance of deriving false conclusions

   • manage the risk of false conclusions as much as possible
      + likelihood
      + impact

   • state clearly what and how you alleviated the risk (replication !)
       + construct validity
          - precise metric definitions
          - GQM paradigm
       + internal & external validity
          - report the context consciously
       + Reliability
          - bugs in tools: testing, usage of well-known libraries, …
          - classification: develop guidelines & others repeat classification
          - search for evidence (mailing archives, bug reports, …):
             have an explicit search procedure




1. Research Methods                                                             48
1. Research Methods
Introduction
  • Origins of Computer Science
  • Research Philosophy
Research Methods
  • 1. Feasibility study
  • 2. Pilot Case
  • 3. Comparative study
  • 4. Observational Study [a.k.a. Etnography]
  • 5. Literature survey
  • 6. Formal Model
  • 7. Simulation
Conclusion
  • Studying a Case
    vs. Performing a Case Study
    + Proposition
    + Unit of Analysis
    + Threats to Validity



1. Research Methods                              49
Studying a case vs. Performing a case study
1. Questions
  • most likely “How” and “Why”; also sometimes “What”

2. Propositions (a.k.a. Purpose)




                                                                               –––––––––Low hanging fruit–––––––––
  • explanatory: where to look for evidence
  • exploratory: rationale and direction
       + example: Christopher Columbus asks for sponsorship
         - Why three ships (not one, not five) ?
         - Why going westward (not south ?)
  • role of “Theories”
       + possible explanations (how, why) for certain phenomena
            ➡ Obtained through literature survey

3. Unit(s) of analysis
  • What is the case ?




                                                                  Threats to
4. Logic linking data to propositions




                                                                   validity
+ 5. Criteria for interpreting findings
  • Chain of evidence from multiple sources
  • When does data confirm proposition ? When does it refute ?

1. Research Methods                                                                                 50

Weitere ähnliche Inhalte

Andere mochten auch

Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...ICSM 2011
 
Industry - Estimating software maintenance effort from use cases an indu...
Industry - Estimating software maintenance effort from use cases an      indu...Industry - Estimating software maintenance effort from use cases an      indu...
Industry - Estimating software maintenance effort from use cases an indu...ICSM 2011
 
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...ICSM 2011
 
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...ICSM 2011
 
ERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to TaskERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to TaskICSM 2011
 
Metrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces
Metrics - Using Source Code Metrics to Predict Change-Prone Java InterfacesMetrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces
Metrics - Using Source Code Metrics to Predict Change-Prone Java InterfacesICSM 2011
 
ICSM'01 Most Influential Paper - Rainer Koschke
ICSM'01 Most Influential Paper - Rainer KoschkeICSM'01 Most Influential Paper - Rainer Koschke
ICSM'01 Most Influential Paper - Rainer KoschkeICSM 2011
 
ERA - Measuring Maintainability of Spreadsheets in the Wild
ERA - Measuring Maintainability of Spreadsheets in the Wild ERA - Measuring Maintainability of Spreadsheets in the Wild
ERA - Measuring Maintainability of Spreadsheets in the Wild ICSM 2011
 
Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects ICSM 2011
 
Industry - Evolution and migration - Incremental and Iterative Reengineering ...
Industry - Evolution and migration - Incremental and Iterative Reengineering ...Industry - Evolution and migration - Incremental and Iterative Reengineering ...
Industry - Evolution and migration - Incremental and Iterative Reengineering ...ICSM 2011
 
Natural Language Analysis - Mining Java Class Naming Conventions
Natural Language Analysis - Mining Java Class Naming ConventionsNatural Language Analysis - Mining Java Class Naming Conventions
Natural Language Analysis - Mining Java Class Naming ConventionsICSM 2011
 
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...ICSM 2011
 
Components - Graph Based Detection of Library API Limitations
Components - Graph Based Detection of Library API LimitationsComponents - Graph Based Detection of Library API Limitations
Components - Graph Based Detection of Library API LimitationsICSM 2011
 
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...ICSM 2011
 
Metrics - You can't control the unfamiliar
Metrics - You can't control the unfamiliarMetrics - You can't control the unfamiliar
Metrics - You can't control the unfamiliarICSM 2011
 
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...ICSM 2011
 
Lionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 KeynoteLionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 KeynoteICSM 2011
 
Traceability - Structural Conformance Checking with Design Tests: An Evaluati...
Traceability - Structural Conformance Checking with Design Tests: An Evaluati...Traceability - Structural Conformance Checking with Design Tests: An Evaluati...
Traceability - Structural Conformance Checking with Design Tests: An Evaluati...ICSM 2011
 
ERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
ERA - A Comparison of Stemmers on Source Code Identifiers for Software SearchERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
ERA - A Comparison of Stemmers on Source Code Identifiers for Software SearchICSM 2011
 
Reliability and Quality - Predicting post-release defects using pre-release f...
Reliability and Quality - Predicting post-release defects using pre-release f...Reliability and Quality - Predicting post-release defects using pre-release f...
Reliability and Quality - Predicting post-release defects using pre-release f...ICSM 2011
 

Andere mochten auch (20)

Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
 
Industry - Estimating software maintenance effort from use cases an indu...
Industry - Estimating software maintenance effort from use cases an      indu...Industry - Estimating software maintenance effort from use cases an      indu...
Industry - Estimating software maintenance effort from use cases an indu...
 
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
 
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
 
ERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to TaskERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to Task
 
Metrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces
Metrics - Using Source Code Metrics to Predict Change-Prone Java InterfacesMetrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces
Metrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces
 
ICSM'01 Most Influential Paper - Rainer Koschke
ICSM'01 Most Influential Paper - Rainer KoschkeICSM'01 Most Influential Paper - Rainer Koschke
ICSM'01 Most Influential Paper - Rainer Koschke
 
ERA - Measuring Maintainability of Spreadsheets in the Wild
ERA - Measuring Maintainability of Spreadsheets in the Wild ERA - Measuring Maintainability of Spreadsheets in the Wild
ERA - Measuring Maintainability of Spreadsheets in the Wild
 
Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects
 
Industry - Evolution and migration - Incremental and Iterative Reengineering ...
Industry - Evolution and migration - Incremental and Iterative Reengineering ...Industry - Evolution and migration - Incremental and Iterative Reengineering ...
Industry - Evolution and migration - Incremental and Iterative Reengineering ...
 
Natural Language Analysis - Mining Java Class Naming Conventions
Natural Language Analysis - Mining Java Class Naming ConventionsNatural Language Analysis - Mining Java Class Naming Conventions
Natural Language Analysis - Mining Java Class Naming Conventions
 
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vo...
 
Components - Graph Based Detection of Library API Limitations
Components - Graph Based Detection of Library API LimitationsComponents - Graph Based Detection of Library API Limitations
Components - Graph Based Detection of Library API Limitations
 
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
 
Metrics - You can't control the unfamiliar
Metrics - You can't control the unfamiliarMetrics - You can't control the unfamiliar
Metrics - You can't control the unfamiliar
 
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
 
Lionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 KeynoteLionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 Keynote
 
Traceability - Structural Conformance Checking with Design Tests: An Evaluati...
Traceability - Structural Conformance Checking with Design Tests: An Evaluati...Traceability - Structural Conformance Checking with Design Tests: An Evaluati...
Traceability - Structural Conformance Checking with Design Tests: An Evaluati...
 
ERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
ERA - A Comparison of Stemmers on Source Code Identifiers for Software SearchERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
ERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
 
Reliability and Quality - Predicting post-release defects using pre-release f...
Reliability and Quality - Predicting post-release defects using pre-release f...Reliability and Quality - Predicting post-release defects using pre-release f...
Reliability and Quality - Predicting post-release defects using pre-release f...
 

Ähnlich wie Tutorial 3 - Research methods - Part 1

Scientific software engineering methods and their validity
Scientific software engineering methods and their validityScientific software engineering methods and their validity
Scientific software engineering methods and their validityDaniel Mendez
 
Grid07 7 Gagliardi
Grid07 7 GagliardiGrid07 7 Gagliardi
Grid07 7 Gagliardiimec.archive
 
Empirical Software Engineering - What is it and why do we need it?
Empirical Software Engineering - What is it and why do we need it?Empirical Software Engineering - What is it and why do we need it?
Empirical Software Engineering - What is it and why do we need it?Daniel Mendez
 
science _engineering__and_the_liberal_arts
science _engineering__and_the_liberal_artsscience _engineering__and_the_liberal_arts
science _engineering__and_the_liberal_artsRajendran
 
Informatics is a natural science
Informatics is a natural scienceInformatics is a natural science
Informatics is a natural scienceFrank van Harmelen
 
Foundations of Machine Learning
Foundations of Machine LearningFoundations of Machine Learning
Foundations of Machine Learningmahutte
 
2005: Natural Computing - Concepts and Applications
2005: Natural Computing - Concepts and Applications2005: Natural Computing - Concepts and Applications
2005: Natural Computing - Concepts and ApplicationsLeandro de Castro
 
The Epistemology of Software Engineering
The Epistemology of Software EngineeringThe Epistemology of Software Engineering
The Epistemology of Software Engineeringnathanmarz
 
Research in Computer Science and Engineering
Research in Computer Science and EngineeringResearch in Computer Science and Engineering
Research in Computer Science and EngineeringOdiaPua1
 
Introduction to Artificial Intelligences
Introduction to Artificial IntelligencesIntroduction to Artificial Intelligences
Introduction to Artificial IntelligencesMeenakshi Paul
 
computer science engineering spe ialized in artificial Intelligence
computer science engineering spe ialized in artificial Intelligencecomputer science engineering spe ialized in artificial Intelligence
computer science engineering spe ialized in artificial IntelligenceKhanKhaja1
 
Is Computer Science Science?
Is Computer Science Science?Is Computer Science Science?
Is Computer Science Science?Daniel Cukier
 
Foundations of Intelligence Agents
Foundations of Intelligence AgentsFoundations of Intelligence Agents
Foundations of Intelligence Agentsmahutte
 

Ähnlich wie Tutorial 3 - Research methods - Part 1 (20)

Figuring out Computer Science
Figuring out Computer ScienceFiguring out Computer Science
Figuring out Computer Science
 
Scientific software engineering methods and their validity
Scientific software engineering methods and their validityScientific software engineering methods and their validity
Scientific software engineering methods and their validity
 
Grid07 7 Gagliardi
Grid07 7 GagliardiGrid07 7 Gagliardi
Grid07 7 Gagliardi
 
Empirical Software Engineering - What is it and why do we need it?
Empirical Software Engineering - What is it and why do we need it?Empirical Software Engineering - What is it and why do we need it?
Empirical Software Engineering - What is it and why do we need it?
 
science _engineering__and_the_liberal_arts
science _engineering__and_the_liberal_artsscience _engineering__and_the_liberal_arts
science _engineering__and_the_liberal_arts
 
Informatics is a natural science
Informatics is a natural scienceInformatics is a natural science
Informatics is a natural science
 
Foundations of Machine Learning
Foundations of Machine LearningFoundations of Machine Learning
Foundations of Machine Learning
 
2005: Natural Computing - Concepts and Applications
2005: Natural Computing - Concepts and Applications2005: Natural Computing - Concepts and Applications
2005: Natural Computing - Concepts and Applications
 
The Epistemology of Software Engineering
The Epistemology of Software EngineeringThe Epistemology of Software Engineering
The Epistemology of Software Engineering
 
4AI_N1.pdf
4AI_N1.pdf4AI_N1.pdf
4AI_N1.pdf
 
Research in Computer Science and Engineering
Research in Computer Science and EngineeringResearch in Computer Science and Engineering
Research in Computer Science and Engineering
 
ppt_ids-data science.pdf
ppt_ids-data science.pdfppt_ids-data science.pdf
ppt_ids-data science.pdf
 
ai.ppt
ai.pptai.ppt
ai.ppt
 
ai.ppt
ai.pptai.ppt
ai.ppt
 
Introduction to Artificial Intelligences
Introduction to Artificial IntelligencesIntroduction to Artificial Intelligences
Introduction to Artificial Intelligences
 
ai.ppt
ai.pptai.ppt
ai.ppt
 
computer science engineering spe ialized in artificial Intelligence
computer science engineering spe ialized in artificial Intelligencecomputer science engineering spe ialized in artificial Intelligence
computer science engineering spe ialized in artificial Intelligence
 
ai.ppt
ai.pptai.ppt
ai.ppt
 
Is Computer Science Science?
Is Computer Science Science?Is Computer Science Science?
Is Computer Science Science?
 
Foundations of Intelligence Agents
Foundations of Intelligence AgentsFoundations of Intelligence Agents
Foundations of Intelligence Agents
 

Kürzlich hochgeladen

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Kürzlich hochgeladen (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Tutorial 3 - Research methods - Part 1

  • 1. Research Methods in Computer Science (Serge Demeyer — University of Antwerp) AnSyMo Antwerp Systems and software Modelling http://ansymo.ua.ac.be/ Universiteit Antwerpen
  • 2. Helicopter View (Ph.D.) Research How to perform research ? How to write research ? (and get “empirical” results) (and get papers accepted) How many of you have done / will do a case-study ? 1. Research Methods 2
  • 3. Zürich Kunsthaus Antwerp Middelheim
  • 4. 1. Research Methods Introduction • Origins of Computer Science • Research Philosophy Research Methods • 1. Feasibility study • 2. Pilot Case • 3. Comparative study • 4. Observational Study [a.k.a. Etnography] • 5. Literature survey • 6. Formal Model • 7. Simulation Conclusion • Studying a Case vs. Performing a Case Study + Proposition + Unit of Analysis + Threats to Validity 1. Research Methods 4
  • 5. What is (Ph.d.) Research ? http://gizmodo.com/5613794/what-is-exactly-a-doctorate Human Elementary Knowledge School High School Bachelor Ph.D. Ph.D. Master (early stages) (finished) 1. Research Methods 5
  • 6. Computer Science All science is either physics or stamp collecting (E. Rutherford) We study artifacts produced by humans Computer science is no more about computers than astronomy is about telescopes. (E. Dijkstra) Computer science Computer engineering Informatics Software Engineering 1. Research Methods 6
  • 7. Science vs. Engineering Science Engineering Physics Civil Engineering ??? Chemistry Computer Electronics Science Biology ??? Software Chemistry and Materials Mathematics Engineering ??? Geography Electro-Mechanical Engineering 1. Research Methods 7
  • 8. Mathematical Origins Turing Machines • Halting problem (inductive) Reasoning • logical argumentation Algorithmic Complexity + formal models, • P = ? NP theorem proving, … + axioms & lemma’s Compilers + foo, bar type of examples • Chomsky hierarchy • “deep” and generic universal knowledge Databases • Relational model Gödel theorem: consistency of the system is not provable in the system. A complete and consistent set of axioms for all of mathematics is impossible 1. Research Methods 8
  • 9. Engineering Origins Computer Engineering Empirical Approach • Moore’s law: “the number of • Tom De Marco: “you cannot transistors on a chip will double control what you cannot about every two years” measure” + Self-fulfilling prophesy + quantify • Hardware technology + mathematical model + RISC vs. CISC • Pareto principle + MPSoC + 80 % - 20 % rule • Compiler optimization (80% of the effects come + peephole optimization from 20% of the causes) + branch prediction As good as your next observation. Premise: The sun has risen in the east every morning up until now. Conclusion: The sun will also rise in the east tomorrow. … Or Not ? 1. Research Methods 9
  • 10. Influence of Society Lives are at stake (e.g., automatic pilot, nuclear power plants) Huge amounts of money are at stake (e.g., Ariane V crash, Denver Airport Baggage) Software became Ubiquitous … its not a hobby anymore Corporate success or failure is at stake (e.g., telephone billing, VTM launching 2nd channel) 1. Research Methods 10
  • 11. Interdisciplinary Nature “Hard” Science Engineering Sciences Computer Science Action Research “Soft” Economics Sociology Sciences Psychology 1. Research Methods 11
  • 12. The Oak Forest Robert Zünd - 1882
  • 13. Franz and Luciano Franz Gertsch - 1973
  • 14. Objective Subjective • Plato’s cave • Scientific Paradigm (Kuhn) + Dominant paradigm / Competing paradigms / Paradigm shift ➡ Normal science vs. Revolutionary science 1. Research Methods 14
  • 15. Dominant view on Research Methods Physics Medicine (“The” Scientific method) (Double-blind treatment) • form hypothesis about a • form hypothesis about a phenomenon treatment • design experiment • select experimental and control • collect data groups that are comparable • compare data to hypothesis except for the treatment • accept or reject hypothesis • collect data + … publish (in Nature) • commit statistics on the data • get someone else to repeat • treatment difference experiment (replication) (statistically significant) Cannot answer the “big” questions … in timely fashion •smoking is unhealthy •climate change •darwin theory vs. intelligent design •… •agile methods 1. Research Methods 15
  • 16. Experiment principles source: C. Wohlin, P. Runeson, M. Höst, M. Ohlsson, B. Regnell, and A. Wesslén. Experimentation in Software Engineering - An Introduction. Kluwer Academic Publishers, 2000 “Bo Experiment objective • To ring o m to r res ea THEORY ear uch f d” s ch ocu yn pro s o drom ced n p e ure rop cause-effect er construct Cause Effect construct construct OBSERVATION treatment- outcome construct Treatment Outcome Independent variable Dependent variable Experiment operation 1. Research Methods 16
  • 17. !"""!"#$%&'()*+%$,(-&./%$(01(23456" Research Methods in Computer Science Different Sources Static analysis • Marvin V. Zelkowitz and Dolores R. Lesso ns learned Wallace, "Experimental Models for Legacy data Validating Technology", IEEE Literat ure search Computer, May 1998. Field st u dy Validation method Assertio n Case st u dy • Easterbrook, S. M., Singer, J., Storey, Project mo nit orin g M, and Damian, D. Selecting Empirical Simulatio n 1995 (152 papers) Methods for Software Engineering Dynamic analysis 1990 (217 papers) 1985 (243 papers) Research. Appears in F. Shull and J. Syn t hetic Singer (eds) "Guide to Advanced Replicated Empirical Software Engineering", No experimen tatio n " Springer, 2007. 0371(81"#$%&'"()*+,-$.&,/"0&+1"/23+$%&'/"*.,"1&-14&-1+,5"&6"$.*6-,78" 0 5 10 15 20 25 30 35 40 • Gordona Dodif-Crnkovic, “Scientific " Percen tage o f papers Methods in Computer Science” lection method that conforms to any one of the 12 validate the claims in the paper. For completeness we • Andreas Höfer, Walter F. Tichy, Status given data collection methods. Our 12 methods are not the only ways to classify added the following two classifications: of Empirical Research in Software data collection, although we believe they are the most 1. Not applicable. Some papers did not address some comprehensive. For example, Victor Basili6 calls an new technology, so the concept of data collection does Engineering, Empirical Software experiment in vivo when it is run at a development loca- not apply. For example, a paper summarizing a recent tion and in vitro when it is run in an isolated, controlled Engineering Issues, p. 10-19, conference or workshop wouldn’t be applicable. setting. According to Basili, a project may involve one 2. No experiment. Some papers describing a new Springer, 2007. team of developers or multiple teams, and an experi- ment may involve one project or multiple projects. This technology contained no experimental validations. variability permits eight different experiment classifi- In our survey, we were interested in the data col- cations. On the other hand, Barbara Kitchenham7 con- lection methods employed by the authors of the papers siders nine classifications of experiments divided into in order to determine our classification scheme’s com- three general categories: a quantitative experiment to prehensiveness. We tried to distinguish between data identify measurable benefits of using a method or tool, used as a demonstration of concept (which may a qualitative experiment to assess the features provided involve some measurements as a “proof of concept,” by a method or tool, and a benchmarking experiment but not a full validation of the method) and a true to determine performance. attempt at validation of their results. As in the study by Walter Tichy,8 we considered a 1. Research Methods MODEL VALIDATION To test whether the classification presented here 17 demonstration of technology via example as part of the analytical phase. The paper had to go beyond that "
  • 18. Case studies - Spectrum case studies are widely used in computer science 7. Simulation “studying a case” vs. “doing a case study” • what if ? 6. Formal Model • underlying concepts ? 5. Literature survey • what is known/unknown ? 4. Observational Study • What is “it” ? 3. Comparative study • is it better ? 2. Pilot Case, Demonstrator • is it appropriate ? 1. Feasibility study • is it possible ? Source: Personal experience (Guidelines for Master Thesis Research – University of Antwerp) 1. Research Methods 18
  • 19. The sixteenth of september Rene Margritte
  • 20. Feasibility Study Here is a new idea, is it possible ? ➡ Metaphor: Christopher Columbus and western route to India • Is it possible to solve a specific kind of problem … effectively ? + computer science perspective (P = NP, Turing test, …) + engineering perspective (build efficiently; fast — small) + economic perspective (cost effective; profitable) • Is the technique new / novel / innovative ? + compare against alternatives ➡ See literature survey; comparative study • Proof by construction + build a prototype + often by applying on a “CASE” • Conclusions + primarily qualitative; "lessons learned" + quantitative - economic perspective: cost - benefit - engineering perspective: speed - memory footprint 1. Research Methods 20
  • 22. Pilot Case (a.k.a. Demonstrator) Here is an idea that has proven valuable; does it work for us ? ➡ Metaphor: Portugal (Amerigo Vespucci) explores western route • proven valuable + accepted merits (e.g. “lessons learned” from feasibility study) + there is some (implicit) theory explaining why the idea has merit • does it work for us + context is very important • Demonstrated on a simple yet representative “CASE” + “Pilot case” ≠ “Pilot Study” • Proof by construction + build a prototype + apply on a “case” • Conclusions + primarily qualitative; "lessons learned" + quantitative; preferably with predefined criteria ➡ compare to context before applying the idea !! 1. Research Methods 22
  • 23. Walking man Standing Figure – Alberto Giacometti
  • 24. Comparative Study Here are two techniques, which one is better ? • for a given purpose ! + (Not necessarily absolute ranking) • Where are the differences ? What are the tradeoffs ? • Criteria check-list + predefined - should not favor one technique + qualitative and quantitative - qualitative: how to remain unbiased ? - quantitative: represent what you want to know ? + Criteria check-list should be complete and reusable ! ➡ If done well, most important contribution (replication !) ➡ See literature survey • Score criteria check-list + Often by applying the technique on a “CASE” • Compare + typically in the form of a table 1. Research Methods 24
  • 25.
  • 26. Observational Study [Ethnography] Understand phenomena through observations ➡ Metaphor: Diane Fossey “Gorillas in the Mist” • systematic collection of data derived from direct observation of the everyday life + phenomena is best understood in the fullest possible context ➡ observation & participation ➡ interviews & questionnaires • Observing a series of cases “CASE” + observation vs. participation ? • example: Action Research + Action research is carried out by people who usually recognize a problem or limitation in their workplace situation and, together, devise a plan to counteract the problem, implement the plan, observe what happens, reflect on these outcomes, revise the plan, implement it, reflect, revise and so on. • Conclusions + primarily qualitative: classifications/observations/… 1. Research Methods 26
  • 27. Torben Giehler Paul Klee Matterhorn Niesen
  • 28. Literature Survey What is known ? What questions are still open ? • source: B. A. Kitchenham, “Procedures for Performing Systematic Reviews”, Keele University Technical Report EBSE-2007-01, 2007 Systematic • “comprehensive” ➡ precise research question is prerequisite + defined search strategy (rigor, completeness, replication) + clearly defined scope - criteria for inclusion and exclusion + specify information to be obtained - the “CASES” are the selected papers • outcome is organized classification taxonomy conceptual model table tree frequency 1. Research Methods 28
  • 29. Literature survey - example IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 0, NO. Survey of Program Comprehension through Dynamic Analysis Cornelissen et al. - An Systematic 0, JANUARY 2000 5 !"#$%&'()#)*%"$+ ;!#1$'#0!#"$+1'- Source <#"$+1'-0*#,. $)$"$!10!#"$+1' Bas Cornelissen, Andy Zaidman, Arie van #'1'9!)"09')&'- 7)$"$!10-'1'+"$,) #'*'#')+'0+4'+3$)5 :$)!10-'1'+"$,) =&1>0?@@0A0=&)'0BCD -'1'+"$,) Deursen, Leon Moonen, Rainer Koschke. A Systematic Survey of Program <#"$+1'-0*#,. Comprehension through Dynamic Analysis ,"4'#09')&'- !"#$%&'()'&'%#$*+ IEEE Transactions on Software Engineering (TSE): 35(5): 684-702, 2009. <""#$%&"' !""#$%&"' 7)$"$!1 !""#$%&"' *#!.'2,#3 5')'#!1$/!"$,) !""#$%&"'- $(')"$*$+!"$,) !##"$,-#'($.'+#$/$%0#$*+ !"#$%&',(("-+.)+% 89'#9$'20,* !""#$%&"' -&..!#$/!"$,)0,* +4!#!+"'#$/'( !--$5).')" -$.$1!#02,#3 !#"$+1'- !"#$%&'(%10"0%#'"$20#$*+ Cornelissen et al. - An Systematic Survey of Program Comprehension through Dynamic Analysis IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 0, NO. 0, JANUARY 2000 22 E'+,..')(!"$,)- $)"'#6#'"!"$,) 3+#'"4"'#0#$*+ &** Fig. 1. Overview of the systematic survey process. )* (# (& !" #% #( !* ## vague, as it does not specify which properties are analyzed. To allow the definition to serve in "# multiple problem domains, the exact properties under analysis are left open. ') '! "* $% $% $% '* $% While the definition of dynamic analysis is rather abstract, we can elaborate upon the benefits $# $# $" $' &% &% and limitations of using dynamic analysis in program comprehension contexts. The benefits that $* &' && &$ && &* % &* we consider are: ( ( # # # ! • The preciseness with regard to the actual*behavior of the software system, for example, in 46>6 10 1=3 - ? ;- : 5-3 234 ? > A/ 8 0- / 8- 2-/ 1/ - 62,8 </ 31 7432 2 /7 > 5; 3 C@ 32? F> 692 34 +,/ <-6> 5,46- >B D7 +>B 4- 0 < 0 ,; @3 2,;/ 72, / ,10 4,C - 2-0 - ,1< /:1 C- > ,34 /4,; / 2;9 C +,: the context of object-oriented software software with its late binding mechanism. . 2+- 3; - 78 .- ; ; : 1 2,1 @2- 636,+ 4,1 76- 6< C@ /636 2 62 3 /6, E1 7/62 -3 23; , 1- 97 1=1: <7 < 7C + ,- -2? 67 ,1 6 :1 93 32, > A3 66> 6, 2>=/ + ,/ C3 The fact that a goal-oriented strategy can be used, which entails the definition of an :. /,0 9- @2: • < ,/ C7 D7 ;: <- 1. Research Methods only the parts of interest of the software system are analyzed. 29 ;: 71 execution scenario such that
  • 30. Klee Bergbahn Vojin Bakic Bull
  • 31. Formal Model How can we understand/explain the world ? • make a mathematical abstraction of a certain problem + analytical model, stochastic model, logical model, re-write system, ... + often explained using a “CASE” • prove some important characteristics + based on inductive reasoning, axioms & lemma’s, … Motivate • which factors are irrelevant (excluded) and which are not (included) ? • which properties are worthwhile (proven) ? ➡ See literature survey Problem Problem Properties ? Mathematical Abstraction Properties 1. Research Methods 31
  • 32. Hodler Eiger, Mönch and Jungfrau in the Morning Sun Seurat Eiffel Tower
  • 33. Simulation What would happen if … ? • study circumstances of phenomena in detail + simulated because real world too expensive; too slow or impossible • make prognoses about what can happen in certain situations + test using real observations, typically obtained via a “CASE” Motivate • which circumstances are irrelevant (excluded) and which are not (included) ? • which properties are worthwhile (to be observed/predicted) ? ➡ See literature survey Examples • distributed systems (grid); network protocols + too expensive or too slow to test in real life • embedded systems — simulating hardware platforms + impossible to observe real clock-speed / memory footprint / … ➡ Heisenberg uncertainty principle 1. Research Methods 33
  • 34. Case studies - Revisited 7. Simulation: test prognoses with real case studies are widely used in computer science observations obtained “studying a case” vs. “doing a case study” via a “CASE” 6. Formal Model often explained using a “CASE” 5. Literature survey “CASES” = selected papers 4. Observational Study Observing a series of “CASES” 3. Comparative study Score criteria check-list; often by applying on a “CASE” 2. Pilot Case, Demonstrator Demonstrated on a simple yet representative “CASE” 1. Feasibility study Proof by construction; often by applying on a “CASE” 1. Research Methods 34
  • 35. Case Study Research Introduction • Origins of Computer Science • Research Philosophy Research Methods • 1. Feasibility study • 2. Pilot Case • 3. Comparative study • 4. Observational Study [a.k.a. Etnography] • 5. Literature survey Sources • 6. Formal Model • Robert K. Yin. Case Study Research: • 7. Simulation Design and Methods. 3rd Edition. SAGE Publications. California, 2009. Conclusion • Bent Flyvbjerg, "Five • Studying a Case Misunderstandings About Case Study vs. Performing a Case Study Research." Qualitative Inquiry, vol. 12, + Proposition no. 2, April 2006, pp. 219-245. • Runeson, P. and Höst, M. 2009. + Unit of Analysis Guidelines for conducting and reporting + Threats to Validity case study research in software engineering. Empirical Softw. Eng. 14, 2 (Apr. 2009), 131-164. 1. Research Methods 35
  • 36. Spectrum of cases created for explanation • foo, bar examples Toy-example • simple model; illustrates differences Martin S. Feather , Stephen Fickas , accepted teaching vehicle Anthony Finkelstein , Axel Van • “textbook example” Exemplar Lamsweerde, Requirements and • simple but illustrates Specification Exemplars, Automated Software Engineering, v.4 n.4, p. relevant issues 419-438, October 1997 real-life example Case study Runeson, P. and Höst, M. 2009. Guidelines for conducting and reporting case study research in software • industrial system, Case engineering. Empirical Softw. Eng. 14, open-source system 2 (Apr. 2009), 131-164. • context is difficult to grasp Mining Software Repositories Challenge. competition (tool oriented) [Yearly workshop where research tools compete against one another on a common predefined • approved by community Community case case.] • comparing Susan Elliott Sim, Steve Easterbrook, and Richard C. Holt. Using benchmark Benchmarking to Advance Research: A Challenge to Software Engineering, Proceedings of the Twenty-fifth International Benchmark • approved by community Conference on Software Engineering, Portland, Oregon, pp. • known context 74-83, 3-10 May, 2003. • “planted” issues 1. Research Methods 36
  • 37. Case study — definition A case study is an empirical inquiry that investigates a contemporary phenomenon within its real-life context, especially when the boundaries between the phenomenon and context are not clearly evident [Robert K. Yin. Case Study Research: Design and Methods; p. 13] • empirical inquiry: yes, it is empirical research • contemporary: (close to) real-time observations + incl. interviews • boundaries between the phenomenon and context not clear + as opposed to “experiment” Context Treatment Outcome Phenomenon Experiment Case Study 1. Research Methods 37
  • 38. Case Study — Counter evidence Context Phenomenon - many more variables than data points - multiple sources of evidence; triangulation - theoretical propositions guide data collection (try to confirm or refute propositions with well-selected cases) Case studies also look for counter evidence 1. Research Methods 38
  • 39. Misunderstanding 2: Generalization One cannot generalize on the basis of an individual case; therefore the case study cannot contribute to scientific development. ➡ [Bent Flyvbjerg, "Five Misunderstandings About Case Study Research."] • Understanding + The power of examples + Formal generalization is overvalued - dominant research views of physics and medicine • Counterexamples + one black swan falsifies “all swans are white” - case studies generate deep understanding; what appears to be white often turns out to be black • sampling logic vs. replication logic + sampling logic: operational enumeration of entire universe - use statistics: generalize from “randomly selected” observations + replication logic: careful selection of boundary values - use logic reasoning: presence of absence of property has effect 1. Research Methods 39
  • 40. Sampling Logic vs. Replication Logic Boundary value Selection of (boundary) value Random selection understand differences generalize for entire population • propositions • units of analysis 1. Research Methods 40
  • 41. Research questions for Case Studies Existence: Exploratory Relationship Explanatory • Does X exist? • Are X and Y related? • Do occurrences of X correlate with Description & Classification occurrences of Y? • What is X like? • What are its properties? Causality • How can it be categorized? • What causes X? • How can we measure it? • What effect does X have on Y? • What are its components? • Does X cause Y? • Does X prevent Y? Descriptive-Comparative • How does X differ from Y? Causality-Comparative • Does X cause more Y than does Z? Frequency and Distribution • Is X better at preventing Y than is Z? • How often does X occur? • Does X cause more Y than does Z • What is an average amount of X? under one condition but not others? Descriptive-Process Design • How does X normally work? • What is an effective way to achieve • By what process does X happen? X? • What are the steps as X evolves? • How can we improve X? Source: Empirical Research Methods in Requirements Engineering. Tutorial given at RE'07, New Delhi, India, Oct 2007. 1. Research Methods 41
  • 42. Proposition (a.k.a. Purpose) Where to expect boundaries ? Thorough preparation is necessary ! You need an explicit theory. Boundary value Exploratory Confirmatory Confirmatory case studies are used to test existing Exploratory case studies are used as initial theories. The latter are especially important for investigations of some phenomena to derive new refuting theories: a detailed case study of a real hypotheses and build theories.(*) situation in which a theory fails may be more convincing than failed experiments in the lab.(*) (*) Steve Easterbrook, Janice Singer, Margaret-Anne Storey, and Daniela Damian. Selecting empirical methods for soft- ware engineering research. In Forrest Shull, Janice Singer, and Dag I. K. Sjoberg, editors, Guide to Advanced Empirical Software Engineering, pages 285—311. Springer London, 2008. 1. Research Methods 42
  • 43. Units of Analysis What phenomena to analyze • depends on research questions • affects data collection & interpretation • affects generalizability Example: Clone Detection, Bug Prediction • the tool/algorithm Possibilities Does it work ? • individual developer • the individual developer • a team How/why does he produce bugs/clones ? • a decision • about the culture/process in the team • a process How does the team prevent bugs/clones ? How successful is this prevention ? • a programming language • about the programming language • a tool How vulnerable is the programming language towards clones / bugs ? Design in advance (COBOL vs. AspectJ) • avoid “easy” units of analysis + cases restricted to Java because parser - Is the language really an issue for your research question ? + report size of the system (KLOC, # Classes, # Bug reports) - Is team composition not more important ? 1. Research Methods 43
  • 44. Threats to Validity (Experiments) Experiment objective THEORY cause- effect construct Cause Effect construct construct 4 OBSERVATION 3 3 treatment- outcome construct Treatment Outcome 1 2 Independent variable Dependent variable Experiment operation 1. Conclusion validity 2. Internal validity 3. Construct validity 4. External validity 1. Research Methods 44
  • 45. Threats to validity (Case Studies) • Source: Runeson, P. and Höst, M. 2009. Guidelines for conducting and reporting case study research in software engineering. 1. Construct validity • Do the operational measures reflect what the researcher had in mind ? 2. Internal validity • Are there any other factors that may affect the results ? ➡ Mainly when investigating causality ! 3. External validity • To what extent can the findings be generalized ? ➡ Precise research question & units of analysis required 4. Reliability • To what extent is the data and the analysis dependent on the researcher (the instruments, …) Other categories have been proposed as well • credibility, transferability, dependability, confirmability 1. Research Methods 45
  • 46. Threats to validity — Examples (1/2) 1. Construct validity • Do the operational measures reflect what the researcher had in mind ? • Time recorded vs. time spent • Execution time, memory consumption, … + noise of operating system, sampling method • Human-assigned classifiers (bug severity, …) + risk for “default” values • Participants in interviews have pressure to answer positively 2. Internal validity • Are there any other factors that may affect the results ? • Were phenomena observed under special conditions + in the lab, close to a deadline, company risked bankruptcy, … + major turnover in team, contributors changed (open-source), … • Similar observations repeated over time (learning effects) 1. Research Methods 46
  • 47. Threats to validity — Examples (2/2) 3. External validity • To what extent can the findings be generalized ? • Does it apply to other languages ? other sizes ? other domains ? • Background & education of participants • Simplicity & scale of the team + small teams & flexible roles vs. large organizations & fixed roles 4. Reliability • To what extent is the data and the analysis dependent on the researcher (the instruments, …) • How did you cope with bugs in the tool, the instrument ? • Classification: if others were to classify, would they obtain the same ? • How did you search for evidence in mailing archives, bug reports, … 1. Research Methods 47
  • 48. Threats to validity = Risk Management No experimental design can be “perfect” … but you can limit the chance of deriving false conclusions • manage the risk of false conclusions as much as possible + likelihood + impact • state clearly what and how you alleviated the risk (replication !) + construct validity - precise metric definitions - GQM paradigm + internal & external validity - report the context consciously + Reliability - bugs in tools: testing, usage of well-known libraries, … - classification: develop guidelines & others repeat classification - search for evidence (mailing archives, bug reports, …): have an explicit search procedure 1. Research Methods 48
  • 49. 1. Research Methods Introduction • Origins of Computer Science • Research Philosophy Research Methods • 1. Feasibility study • 2. Pilot Case • 3. Comparative study • 4. Observational Study [a.k.a. Etnography] • 5. Literature survey • 6. Formal Model • 7. Simulation Conclusion • Studying a Case vs. Performing a Case Study + Proposition + Unit of Analysis + Threats to Validity 1. Research Methods 49
  • 50. Studying a case vs. Performing a case study 1. Questions • most likely “How” and “Why”; also sometimes “What” 2. Propositions (a.k.a. Purpose) –––––––––Low hanging fruit––––––––– • explanatory: where to look for evidence • exploratory: rationale and direction + example: Christopher Columbus asks for sponsorship - Why three ships (not one, not five) ? - Why going westward (not south ?) • role of “Theories” + possible explanations (how, why) for certain phenomena ➡ Obtained through literature survey 3. Unit(s) of analysis • What is the case ? Threats to 4. Logic linking data to propositions validity + 5. Criteria for interpreting findings • Chain of evidence from multiple sources • When does data confirm proposition ? When does it refute ? 1. Research Methods 50