SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
An eternal question of timing
Author: Andrey Karpov

Date: 30.03.2011

It seemed that long forum debates about methods of measuring algorithm's running time, functions to
use and precision that should be expected were over. Unfortunately, we have to return to this question
once again. Today we will discuss the question how we should measure speed of a parallel algorithm.

I want to say right away that I will not give you a concrete recipe. I myself have faced the issue of
measuring parallel algorithms' speed only recently, so I am not an expert in this question. So, this post is
rather a research article. I will appreciate if you share your opinions and recommendations with me. I
think we will manage the problem together and make out an optimal solution.

The task is to measure the running time of a fragment of user code. I would use the following class to
solve this task earlier:

class Timing {

public:

   void StartTiming();

   void StopTiming();

   double GetUserSeconds() const {

       return double(m_userTime) / 10000000.0;

   }

private:

   __int64 GetUserTime() const;

   __int64 m_userTime;

};



__int64 Timing::GetUserTime() const {

   FILETIME creationTime;

   FILETIME exitTime;

   FILETIME kernelTime;

   FILETIME userTime;

   GetThreadTimes(GetCurrentThread(),

                          &creationTime, &exitTime,
&kernelTime, &userTime);

    __int64 curTime;

    curTime = userTime.dwHighDateTime;

    curTime <<= 32;

    curTime += userTime.dwLowDateTime;

    return curTime;

}



void Timing::StartTiming() {

    m_userTime = GetUserTime();

}



void Timing::StopTiming() {

    __int64 curUserTime = GetUserTime();

    m_userTime = curUserTime - m_userTime;

}

This class is based on the GetThreadTimes function that allows you to separate running time of user
code from running time of system functions. The class is intended for estimate of running time of a
thread in user mode, so we use only the returned parameter lpUserTime.

Now consider a code sample where some number is calculated. We will use the Timing class to measure
the running time.

void UseTiming1()

{

    Timing t;

    t.StartTiming();

    unsigned sum = 0;



    for (int i = 0; i < 1000000; i++)

    {

        char str[1000];

        for (int j = 0; j < 999; j++)
str[j] = char(((i + j) % 254) + 1);

        str[999] = 0;

        for (char c = 'a'; c <= 'z'; c++)

         if (strchr(str, c) != NULL)

           sum += 1;

    }



    t.StopTiming();

    printf("sum = %un", sum);

    printf("%.3G seconds.n", t.GetUserSeconds());

}

Being presented in this form, the timing mechanism behaves as it was expected and gives, say, 7
seconds on my machine. The result is correct even for a multi-core machine since it does not matter
which cores will be used while the algorithm is running (see Figure 1).




Figure 1 - Work of one thread on a multi-core computer

Now imagine that we want to use capabilities of multi-core processors in our program and estimate
benefits we will get from parallelizing the algorithm relying on the OpenMP technology. Let's parallelize
our code by adding one line:

#pragma omp parallel for reduction(+:sum)

for (int i = 0; i < 1000000; i++)

{

    char str[1000];
for (int j = 0; j < 999; j++)

      str[j] = char(((i + j) % 254) + 1);

    str[999] = 0;

    for (char c = 'a'; c <= 'z'; c++)

      if (strchr(str, c) != NULL)

         sum += 1;

}

The program now prints the running time 1.6 seconds. Since we use a 4-core computer, I feel like saying
"Hurray! We've got a 4-time speed-up and timing confirms this".

But really it is not so good: we are not measuring the running time of the algorithm. We are measuring
the running time of the main thread instead. In this case, the measuring seems reliable because the
main thread was working for the same time as the secondary threads. Let's carry out a simple
experiment: we will explicitly specify 10 threads to be used instead of 4:

#pragma omp parallel for reduction(+:sum) num_threads(10)

Logic says that this code must work for about the same time as the code parallelized into 4 threads. We
have a four-core processor, so we should expect that a larger number of threads will cause only
slowdown. Instead, we will see the result about 0.7 seconds.

This is an expected result although we wanted to get quite a different thing. We created 10 threads.
Each of them was working for about 0.7 seconds. It is time of the main thread, whose running time is
measured with the Timing class, ran for. As you can see, this method cannot be used to measure speed
of programs with parallel code fragments. Let's make it clearer by presenting it graphically in Figure 2.




Figure 2 - This is how work of 10 threads might look on a four-core computer

Of course, we may well use the time() function but its resolution is low and it will not allow you to
separate the running time of user code from that of system code. There may be other processes
influencing the time, which may also significantly distort timing.
A favorite timing function of many developers is QueryPerformanceCounter. Let's measure speed using
this function. In a simple form, the timing class looks this way:

class Timing2 {

public:

    void StartTiming();

    void StopTiming();

    double GetUserSeconds() const {

        return value;

    }

private:

    double value;

    LARGE_INTEGER time1;

};



void Timing2::StartTiming()

{

    QueryPerformanceCounter(&time1);

}



void Timing2::StopTiming()

{

    LARGE_INTEGER performance_frequency, time2;

    QueryPerformanceFrequency(&performance_frequency);

    QueryPerformanceCounter(&time2);

    value = (double)(time2.QuadPart - time1.QuadPart);

    value /= performance_frequency.QuadPart;

}

Unfortunately, we cannot do so on a multi-core computer anymore. :) Let's read the description of this
function in MSDN:
On a multiprocessor computer, it should not matter which processor is called. However, you can get
different results on different processors due to bugs in the basic input/output system (BIOS) or the
hardware abstraction layer (HAL). To specify processor affinity for a thread, use the
SetThreadAffinityMask function.

Let's improve the code and tie the main thread to one core:

class Timing2 {

public:

    void StartTiming();

    void StopTiming();

    double GetUserSeconds() const {

        return value;

    }

private:

    DWORD_PTR oldmask;

    double value;

    LARGE_INTEGER time1;

};



void Timing2::StartTiming()

{

    volatile int warmingUp = 1;

    #pragma omp parallel for

    for (int i=1; i<10000000; i++)

    {

        #pragma omp atomic

        warmingUp *= i;

    }



    oldmask = SetThreadAffinityMask(::GetCurrentThread(), 1);



    QueryPerformanceCounter(&time1);
}



void Timing2::StopTiming()

{

    LARGE_INTEGER performance_frequency, time2;

    QueryPerformanceFrequency(&performance_frequency);

    QueryPerformanceCounter(&time2);

    SetThreadAffinityMask(::GetCurrentThread(), oldmask);



    value = (double)(time2.QuadPart - time1.QuadPart);

    value /= performance_frequency.QuadPart;

}

Readers might ask what for we need a strange loop that does nothing. Contemporary processors reduce
their frequency at low load. This loop preliminarily increases the processor's speed to the maximum and
therefore increases the precision of speed measurement a bit. Additionally, we are warming up all the
available cores.

The timing method we have shown involves the same drawback: we cannot separate the running time
of user code from that of system code. If there are other tasks running on a core at the same time, the
result might also be rather inaccurate. But it seems to me that this method still can be applied to a
parallel algorithm unlike GetThreadTimes.

Let's measure the results of Timing and Timing2 classes at various numbers of threads. For this purpose,
the OpenMP-directive num_threads(N) is used. Let's arrange the data in a table shown in Figure 3.




Figure 3 - Algorithm's running time in seconds measured with functions GetThreadTimes and
QueryPerformanceCounter on a four-core machine

As you may see, as long as the number of threads does not exceed the number of cores, the
GetThreadTimes function gives us a result similar to that of the QueryPerformanceCounter function,
which makes you think that measurement is correct. But if there are more threads, you cannot rely on
its result.
Unfortunately, the program prints varying values from launch to launch. I do not know how to make
measuring more accurate and correct. So I am waiting for your feedback and methods of correct timing
of parallel algorithms.

You may download the program text here (a project for Visual Studio 2005).

Weitere ähnliche Inhalte

Was ist angesagt?

Suyash Thesis Presentation
Suyash Thesis PresentationSuyash Thesis Presentation
Suyash Thesis PresentationTanvee Katyal
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersDhanashree Prasad
 
A Language Support for Exhaustive Fault-Injection in Message-Passing System M...
A Language Support for Exhaustive Fault-Injection in Message-Passing System M...A Language Support for Exhaustive Fault-Injection in Message-Passing System M...
A Language Support for Exhaustive Fault-Injection in Message-Passing System M...Takuo Watanabe
 
Spin Locks and Contention : The Art of Multiprocessor Programming : Notes
Spin Locks and Contention : The Art of Multiprocessor Programming : NotesSpin Locks and Contention : The Art of Multiprocessor Programming : Notes
Spin Locks and Contention : The Art of Multiprocessor Programming : NotesSubhajit Sahu
 
Commit messages - Good practices
Commit messages - Good practicesCommit messages - Good practices
Commit messages - Good practicesTarin Gamberini
 
Linux Kernel, tested by the Linux-version of PVS-Studio
Linux Kernel, tested by the Linux-version of PVS-StudioLinux Kernel, tested by the Linux-version of PVS-Studio
Linux Kernel, tested by the Linux-version of PVS-StudioPVS-Studio
 
Jnp
JnpJnp
Jnphj43us
 
Profiling ruby
Profiling rubyProfiling ruby
Profiling rubynasirj
 
Stackless Python 101
Stackless Python 101Stackless Python 101
Stackless Python 101guest162fd90
 
Java Performance Tweaks
Java Performance TweaksJava Performance Tweaks
Java Performance TweaksJim Bethancourt
 
Demystifying the Go Scheduler
Demystifying the Go SchedulerDemystifying the Go Scheduler
Demystifying the Go Schedulermatthewrdale
 
Multi-threaded Programming in JAVA
Multi-threaded Programming in JAVAMulti-threaded Programming in JAVA
Multi-threaded Programming in JAVAVikram Kalyani
 

Was ist angesagt? (20)

Python multithreading
Python multithreadingPython multithreading
Python multithreading
 
Suyash Thesis Presentation
Suyash Thesis PresentationSuyash Thesis Presentation
Suyash Thesis Presentation
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for Beginners
 
A Language Support for Exhaustive Fault-Injection in Message-Passing System M...
A Language Support for Exhaustive Fault-Injection in Message-Passing System M...A Language Support for Exhaustive Fault-Injection in Message-Passing System M...
A Language Support for Exhaustive Fault-Injection in Message-Passing System M...
 
Spin Locks and Contention : The Art of Multiprocessor Programming : Notes
Spin Locks and Contention : The Art of Multiprocessor Programming : NotesSpin Locks and Contention : The Art of Multiprocessor Programming : Notes
Spin Locks and Contention : The Art of Multiprocessor Programming : Notes
 
4759826-Java-Thread
4759826-Java-Thread4759826-Java-Thread
4759826-Java-Thread
 
Commit messages - Good practices
Commit messages - Good practicesCommit messages - Good practices
Commit messages - Good practices
 
MPI n OpenMP
MPI n OpenMPMPI n OpenMP
MPI n OpenMP
 
Linux Kernel, tested by the Linux-version of PVS-Studio
Linux Kernel, tested by the Linux-version of PVS-StudioLinux Kernel, tested by the Linux-version of PVS-Studio
Linux Kernel, tested by the Linux-version of PVS-Studio
 
Jnp
JnpJnp
Jnp
 
Profiling ruby
Profiling rubyProfiling ruby
Profiling ruby
 
OpenMP And C++
OpenMP And C++OpenMP And C++
OpenMP And C++
 
Python faster for loop
Python faster for loopPython faster for loop
Python faster for loop
 
Operating System Engineering Quiz
Operating System Engineering QuizOperating System Engineering Quiz
Operating System Engineering Quiz
 
Stackless Python 101
Stackless Python 101Stackless Python 101
Stackless Python 101
 
Thread
ThreadThread
Thread
 
Java Performance Tweaks
Java Performance TweaksJava Performance Tweaks
Java Performance Tweaks
 
Threads in python
Threads in pythonThreads in python
Threads in python
 
Demystifying the Go Scheduler
Demystifying the Go SchedulerDemystifying the Go Scheduler
Demystifying the Go Scheduler
 
Multi-threaded Programming in JAVA
Multi-threaded Programming in JAVAMulti-threaded Programming in JAVA
Multi-threaded Programming in JAVA
 

Andere mochten auch

Analysis of the Ultimate Toolbox project
Analysis of the Ultimate Toolbox projectAnalysis of the Ultimate Toolbox project
Analysis of the Ultimate Toolbox projectPVS-Studio
 
Static code analysis and the new language standard C++0x
Static code analysis and the new language standard C++0xStatic code analysis and the new language standard C++0x
Static code analysis and the new language standard C++0xPVS-Studio
 
Development of resource-intensive applications in Visual C++
Development of resource-intensive applications in Visual C++Development of resource-intensive applications in Visual C++
Development of resource-intensive applications in Visual C++PVS-Studio
 
Lesson 1. What 64-bit systems are
Lesson 1. What 64-bit systems areLesson 1. What 64-bit systems are
Lesson 1. What 64-bit systems arePVS-Studio
 
Lesson 10. Pattern 2. Functions with variable number of arguments
Lesson 10. Pattern 2. Functions with variable number of argumentsLesson 10. Pattern 2. Functions with variable number of arguments
Lesson 10. Pattern 2. Functions with variable number of argumentsPVS-Studio
 
The reasons why 64-bit programs require more stack memory
The reasons why 64-bit programs require more stack memoryThe reasons why 64-bit programs require more stack memory
The reasons why 64-bit programs require more stack memoryPVS-Studio
 
Safety of 64-bit code
Safety of 64-bit codeSafety of 64-bit code
Safety of 64-bit codePVS-Studio
 
Of complicacy of programming, or won't C# save us?
Of complicacy of programming, or won't C# save us?Of complicacy of programming, or won't C# save us?
Of complicacy of programming, or won't C# save us?PVS-Studio
 
Explanations to the article on Copy-Paste
Explanations to the article on Copy-PasteExplanations to the article on Copy-Paste
Explanations to the article on Copy-PastePVS-Studio
 
Lesson 14. Pattern 6. Changing an array's type
Lesson 14. Pattern 6. Changing an array's typeLesson 14. Pattern 6. Changing an array's type
Lesson 14. Pattern 6. Changing an array's typePVS-Studio
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source codePVS-Studio
 
Lesson 9. Pattern 1. Magic numbers
Lesson 9. Pattern 1. Magic numbersLesson 9. Pattern 1. Magic numbers
Lesson 9. Pattern 1. Magic numbersPVS-Studio
 
Lesson 26. Optimization of 64-bit programs
Lesson 26. Optimization of 64-bit programsLesson 26. Optimization of 64-bit programs
Lesson 26. Optimization of 64-bit programsPVS-Studio
 
Comparing the general static analysis in Visual Studio 2010 and PVS-Studio by...
Comparing the general static analysis in Visual Studio 2010 and PVS-Studio by...Comparing the general static analysis in Visual Studio 2010 and PVS-Studio by...
Comparing the general static analysis in Visual Studio 2010 and PVS-Studio by...PVS-Studio
 
Driver Development for Windows 64-bit
Driver Development for Windows 64-bitDriver Development for Windows 64-bit
Driver Development for Windows 64-bitPVS-Studio
 
Comparison of analyzers' diagnostic possibilities at checking 64-bit code
Comparison of analyzers' diagnostic possibilities at checking 64-bit codeComparison of analyzers' diagnostic possibilities at checking 64-bit code
Comparison of analyzers' diagnostic possibilities at checking 64-bit codePVS-Studio
 
Brief description of the VivaCore code analysis library
Brief description of the VivaCore code analysis libraryBrief description of the VivaCore code analysis library
Brief description of the VivaCore code analysis libraryPVS-Studio
 
Lesson 23. Pattern 15. Growth of structures' sizes
Lesson 23. Pattern 15. Growth of structures' sizesLesson 23. Pattern 15. Growth of structures' sizes
Lesson 23. Pattern 15. Growth of structures' sizesPVS-Studio
 
The essence of the VivaCore code analysis library
The essence of the VivaCore code analysis libraryThe essence of the VivaCore code analysis library
The essence of the VivaCore code analysis libraryPVS-Studio
 
20 issues of porting C++ code on the 64-bit platform
20 issues of porting C++ code on the 64-bit platform20 issues of porting C++ code on the 64-bit platform
20 issues of porting C++ code on the 64-bit platformPVS-Studio
 

Andere mochten auch (20)

Analysis of the Ultimate Toolbox project
Analysis of the Ultimate Toolbox projectAnalysis of the Ultimate Toolbox project
Analysis of the Ultimate Toolbox project
 
Static code analysis and the new language standard C++0x
Static code analysis and the new language standard C++0xStatic code analysis and the new language standard C++0x
Static code analysis and the new language standard C++0x
 
Development of resource-intensive applications in Visual C++
Development of resource-intensive applications in Visual C++Development of resource-intensive applications in Visual C++
Development of resource-intensive applications in Visual C++
 
Lesson 1. What 64-bit systems are
Lesson 1. What 64-bit systems areLesson 1. What 64-bit systems are
Lesson 1. What 64-bit systems are
 
Lesson 10. Pattern 2. Functions with variable number of arguments
Lesson 10. Pattern 2. Functions with variable number of argumentsLesson 10. Pattern 2. Functions with variable number of arguments
Lesson 10. Pattern 2. Functions with variable number of arguments
 
The reasons why 64-bit programs require more stack memory
The reasons why 64-bit programs require more stack memoryThe reasons why 64-bit programs require more stack memory
The reasons why 64-bit programs require more stack memory
 
Safety of 64-bit code
Safety of 64-bit codeSafety of 64-bit code
Safety of 64-bit code
 
Of complicacy of programming, or won't C# save us?
Of complicacy of programming, or won't C# save us?Of complicacy of programming, or won't C# save us?
Of complicacy of programming, or won't C# save us?
 
Explanations to the article on Copy-Paste
Explanations to the article on Copy-PasteExplanations to the article on Copy-Paste
Explanations to the article on Copy-Paste
 
Lesson 14. Pattern 6. Changing an array's type
Lesson 14. Pattern 6. Changing an array's typeLesson 14. Pattern 6. Changing an array's type
Lesson 14. Pattern 6. Changing an array's type
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source code
 
Lesson 9. Pattern 1. Magic numbers
Lesson 9. Pattern 1. Magic numbersLesson 9. Pattern 1. Magic numbers
Lesson 9. Pattern 1. Magic numbers
 
Lesson 26. Optimization of 64-bit programs
Lesson 26. Optimization of 64-bit programsLesson 26. Optimization of 64-bit programs
Lesson 26. Optimization of 64-bit programs
 
Comparing the general static analysis in Visual Studio 2010 and PVS-Studio by...
Comparing the general static analysis in Visual Studio 2010 and PVS-Studio by...Comparing the general static analysis in Visual Studio 2010 and PVS-Studio by...
Comparing the general static analysis in Visual Studio 2010 and PVS-Studio by...
 
Driver Development for Windows 64-bit
Driver Development for Windows 64-bitDriver Development for Windows 64-bit
Driver Development for Windows 64-bit
 
Comparison of analyzers' diagnostic possibilities at checking 64-bit code
Comparison of analyzers' diagnostic possibilities at checking 64-bit codeComparison of analyzers' diagnostic possibilities at checking 64-bit code
Comparison of analyzers' diagnostic possibilities at checking 64-bit code
 
Brief description of the VivaCore code analysis library
Brief description of the VivaCore code analysis libraryBrief description of the VivaCore code analysis library
Brief description of the VivaCore code analysis library
 
Lesson 23. Pattern 15. Growth of structures' sizes
Lesson 23. Pattern 15. Growth of structures' sizesLesson 23. Pattern 15. Growth of structures' sizes
Lesson 23. Pattern 15. Growth of structures' sizes
 
The essence of the VivaCore code analysis library
The essence of the VivaCore code analysis libraryThe essence of the VivaCore code analysis library
The essence of the VivaCore code analysis library
 
20 issues of porting C++ code on the 64-bit platform
20 issues of porting C++ code on the 64-bit platform20 issues of porting C++ code on the 64-bit platform
20 issues of porting C++ code on the 64-bit platform
 

Ă„hnlich wie An eternal question of timing

Parallel Programming With Dot Net
Parallel Programming With Dot NetParallel Programming With Dot Net
Parallel Programming With Dot NetNeeraj Kaushik
 
DA lecture 3.pptx
DA lecture 3.pptxDA lecture 3.pptx
DA lecture 3.pptxSayanSen36
 
Fundamentals of the Analysis of Algorithm Efficiency
Fundamentals of the Analysis of Algorithm EfficiencyFundamentals of the Analysis of Algorithm Efficiency
Fundamentals of the Analysis of Algorithm EfficiencySaranya Natarajan
 
Distributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation ProjectDistributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation ProjectAssignmentpedia
 
Distributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation ProjectDistributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation ProjectAssignmentpedia
 
Data structure introduction
Data structure introductionData structure introduction
Data structure introductionNavneetSandhu0
 
Process of algorithm evaluation
Process of algorithm evaluationProcess of algorithm evaluation
Process of algorithm evaluationAshish Ranjan
 
Taking r to its limits. 70+ tips
Taking r to its limits. 70+ tipsTaking r to its limits. 70+ tips
Taking r to its limits. 70+ tipsIlya Shutov
 
Synchronizing Parallel Tasks Using STM
Synchronizing Parallel Tasks Using STMSynchronizing Parallel Tasks Using STM
Synchronizing Parallel Tasks Using STMIJERA Editor
 
Programming in Java: Why Object-Orientation?
Programming in Java: Why Object-Orientation?Programming in Java: Why Object-Orientation?
Programming in Java: Why Object-Orientation?Martin Chapman
 
unit 2 hpc.pptx
unit 2 hpc.pptxunit 2 hpc.pptx
unit 2 hpc.pptxgopal467344
 
Instruction1. Please read the two articles. (Kincheloe part 1 &.docx
Instruction1. Please read the two articles. (Kincheloe part 1 &.docxInstruction1. Please read the two articles. (Kincheloe part 1 &.docx
Instruction1. Please read the two articles. (Kincheloe part 1 &.docxcarliotwaycave
 
TIME EXECUTION OF DIFFERENT SORTED ALGORITHMS
TIME EXECUTION   OF  DIFFERENT SORTED ALGORITHMSTIME EXECUTION   OF  DIFFERENT SORTED ALGORITHMS
TIME EXECUTION OF DIFFERENT SORTED ALGORITHMSTanya Makkar
 
SMP4 Thread Scheduler (PART 1)======================INS.docx
SMP4 Thread Scheduler (PART 1)======================INS.docxSMP4 Thread Scheduler (PART 1)======================INS.docx
SMP4 Thread Scheduler (PART 1)======================INS.docxpbilly1
 
Letselectronic.blogspot.com robotic arm based on atmega mcu controlled by win...
Letselectronic.blogspot.com robotic arm based on atmega mcu controlled by win...Letselectronic.blogspot.com robotic arm based on atmega mcu controlled by win...
Letselectronic.blogspot.com robotic arm based on atmega mcu controlled by win...Aymen Lachkhem
 
Chapter 1 Data structure.pptx
Chapter 1 Data structure.pptxChapter 1 Data structure.pptx
Chapter 1 Data structure.pptxwondmhunegn
 

Ă„hnlich wie An eternal question of timing (20)

Code Tuning
Code TuningCode Tuning
Code Tuning
 
Parallel Programming With Dot Net
Parallel Programming With Dot NetParallel Programming With Dot Net
Parallel Programming With Dot Net
 
DA lecture 3.pptx
DA lecture 3.pptxDA lecture 3.pptx
DA lecture 3.pptx
 
Fundamentals of the Analysis of Algorithm Efficiency
Fundamentals of the Analysis of Algorithm EfficiencyFundamentals of the Analysis of Algorithm Efficiency
Fundamentals of the Analysis of Algorithm Efficiency
 
Distributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation ProjectDistributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation Project
 
Distributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation ProjectDistributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation Project
 
Data structure introduction
Data structure introductionData structure introduction
Data structure introduction
 
Process of algorithm evaluation
Process of algorithm evaluationProcess of algorithm evaluation
Process of algorithm evaluation
 
Taking r to its limits. 70+ tips
Taking r to its limits. 70+ tipsTaking r to its limits. 70+ tips
Taking r to its limits. 70+ tips
 
CPP Homework Help
CPP Homework HelpCPP Homework Help
CPP Homework Help
 
Synchronizing Parallel Tasks Using STM
Synchronizing Parallel Tasks Using STMSynchronizing Parallel Tasks Using STM
Synchronizing Parallel Tasks Using STM
 
Programming in Java: Why Object-Orientation?
Programming in Java: Why Object-Orientation?Programming in Java: Why Object-Orientation?
Programming in Java: Why Object-Orientation?
 
Analyzing algorithms
Analyzing algorithmsAnalyzing algorithms
Analyzing algorithms
 
unit 2 hpc.pptx
unit 2 hpc.pptxunit 2 hpc.pptx
unit 2 hpc.pptx
 
Instruction1. Please read the two articles. (Kincheloe part 1 &.docx
Instruction1. Please read the two articles. (Kincheloe part 1 &.docxInstruction1. Please read the two articles. (Kincheloe part 1 &.docx
Instruction1. Please read the two articles. (Kincheloe part 1 &.docx
 
TIME EXECUTION OF DIFFERENT SORTED ALGORITHMS
TIME EXECUTION   OF  DIFFERENT SORTED ALGORITHMSTIME EXECUTION   OF  DIFFERENT SORTED ALGORITHMS
TIME EXECUTION OF DIFFERENT SORTED ALGORITHMS
 
SMP4 Thread Scheduler (PART 1)======================INS.docx
SMP4 Thread Scheduler (PART 1)======================INS.docxSMP4 Thread Scheduler (PART 1)======================INS.docx
SMP4 Thread Scheduler (PART 1)======================INS.docx
 
Letselectronic.blogspot.com robotic arm based on atmega mcu controlled by win...
Letselectronic.blogspot.com robotic arm based on atmega mcu controlled by win...Letselectronic.blogspot.com robotic arm based on atmega mcu controlled by win...
Letselectronic.blogspot.com robotic arm based on atmega mcu controlled by win...
 
Chapter 1 Data structure.pptx
Chapter 1 Data structure.pptxChapter 1 Data structure.pptx
Chapter 1 Data structure.pptx
 
Matrix multiplication
Matrix multiplicationMatrix multiplication
Matrix multiplication
 

KĂĽrzlich hochgeladen

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
WhatsApp 9892124323 âś“Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 âś“Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 âś“Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 âś“Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

KĂĽrzlich hochgeladen (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
WhatsApp 9892124323 âś“Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 âś“Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 âś“Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 âś“Call Girls In Kalyan ( Mumbai ) secure service
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

An eternal question of timing

  • 1. An eternal question of timing Author: Andrey Karpov Date: 30.03.2011 It seemed that long forum debates about methods of measuring algorithm's running time, functions to use and precision that should be expected were over. Unfortunately, we have to return to this question once again. Today we will discuss the question how we should measure speed of a parallel algorithm. I want to say right away that I will not give you a concrete recipe. I myself have faced the issue of measuring parallel algorithms' speed only recently, so I am not an expert in this question. So, this post is rather a research article. I will appreciate if you share your opinions and recommendations with me. I think we will manage the problem together and make out an optimal solution. The task is to measure the running time of a fragment of user code. I would use the following class to solve this task earlier: class Timing { public: void StartTiming(); void StopTiming(); double GetUserSeconds() const { return double(m_userTime) / 10000000.0; } private: __int64 GetUserTime() const; __int64 m_userTime; }; __int64 Timing::GetUserTime() const { FILETIME creationTime; FILETIME exitTime; FILETIME kernelTime; FILETIME userTime; GetThreadTimes(GetCurrentThread(), &creationTime, &exitTime,
  • 2. &kernelTime, &userTime); __int64 curTime; curTime = userTime.dwHighDateTime; curTime <<= 32; curTime += userTime.dwLowDateTime; return curTime; } void Timing::StartTiming() { m_userTime = GetUserTime(); } void Timing::StopTiming() { __int64 curUserTime = GetUserTime(); m_userTime = curUserTime - m_userTime; } This class is based on the GetThreadTimes function that allows you to separate running time of user code from running time of system functions. The class is intended for estimate of running time of a thread in user mode, so we use only the returned parameter lpUserTime. Now consider a code sample where some number is calculated. We will use the Timing class to measure the running time. void UseTiming1() { Timing t; t.StartTiming(); unsigned sum = 0; for (int i = 0; i < 1000000; i++) { char str[1000]; for (int j = 0; j < 999; j++)
  • 3. str[j] = char(((i + j) % 254) + 1); str[999] = 0; for (char c = 'a'; c <= 'z'; c++) if (strchr(str, c) != NULL) sum += 1; } t.StopTiming(); printf("sum = %un", sum); printf("%.3G seconds.n", t.GetUserSeconds()); } Being presented in this form, the timing mechanism behaves as it was expected and gives, say, 7 seconds on my machine. The result is correct even for a multi-core machine since it does not matter which cores will be used while the algorithm is running (see Figure 1). Figure 1 - Work of one thread on a multi-core computer Now imagine that we want to use capabilities of multi-core processors in our program and estimate benefits we will get from parallelizing the algorithm relying on the OpenMP technology. Let's parallelize our code by adding one line: #pragma omp parallel for reduction(+:sum) for (int i = 0; i < 1000000; i++) { char str[1000];
  • 4. for (int j = 0; j < 999; j++) str[j] = char(((i + j) % 254) + 1); str[999] = 0; for (char c = 'a'; c <= 'z'; c++) if (strchr(str, c) != NULL) sum += 1; } The program now prints the running time 1.6 seconds. Since we use a 4-core computer, I feel like saying "Hurray! We've got a 4-time speed-up and timing confirms this". But really it is not so good: we are not measuring the running time of the algorithm. We are measuring the running time of the main thread instead. In this case, the measuring seems reliable because the main thread was working for the same time as the secondary threads. Let's carry out a simple experiment: we will explicitly specify 10 threads to be used instead of 4: #pragma omp parallel for reduction(+:sum) num_threads(10) Logic says that this code must work for about the same time as the code parallelized into 4 threads. We have a four-core processor, so we should expect that a larger number of threads will cause only slowdown. Instead, we will see the result about 0.7 seconds. This is an expected result although we wanted to get quite a different thing. We created 10 threads. Each of them was working for about 0.7 seconds. It is time of the main thread, whose running time is measured with the Timing class, ran for. As you can see, this method cannot be used to measure speed of programs with parallel code fragments. Let's make it clearer by presenting it graphically in Figure 2. Figure 2 - This is how work of 10 threads might look on a four-core computer Of course, we may well use the time() function but its resolution is low and it will not allow you to separate the running time of user code from that of system code. There may be other processes influencing the time, which may also significantly distort timing.
  • 5. A favorite timing function of many developers is QueryPerformanceCounter. Let's measure speed using this function. In a simple form, the timing class looks this way: class Timing2 { public: void StartTiming(); void StopTiming(); double GetUserSeconds() const { return value; } private: double value; LARGE_INTEGER time1; }; void Timing2::StartTiming() { QueryPerformanceCounter(&time1); } void Timing2::StopTiming() { LARGE_INTEGER performance_frequency, time2; QueryPerformanceFrequency(&performance_frequency); QueryPerformanceCounter(&time2); value = (double)(time2.QuadPart - time1.QuadPart); value /= performance_frequency.QuadPart; } Unfortunately, we cannot do so on a multi-core computer anymore. :) Let's read the description of this function in MSDN:
  • 6. On a multiprocessor computer, it should not matter which processor is called. However, you can get different results on different processors due to bugs in the basic input/output system (BIOS) or the hardware abstraction layer (HAL). To specify processor affinity for a thread, use the SetThreadAffinityMask function. Let's improve the code and tie the main thread to one core: class Timing2 { public: void StartTiming(); void StopTiming(); double GetUserSeconds() const { return value; } private: DWORD_PTR oldmask; double value; LARGE_INTEGER time1; }; void Timing2::StartTiming() { volatile int warmingUp = 1; #pragma omp parallel for for (int i=1; i<10000000; i++) { #pragma omp atomic warmingUp *= i; } oldmask = SetThreadAffinityMask(::GetCurrentThread(), 1); QueryPerformanceCounter(&time1);
  • 7. } void Timing2::StopTiming() { LARGE_INTEGER performance_frequency, time2; QueryPerformanceFrequency(&performance_frequency); QueryPerformanceCounter(&time2); SetThreadAffinityMask(::GetCurrentThread(), oldmask); value = (double)(time2.QuadPart - time1.QuadPart); value /= performance_frequency.QuadPart; } Readers might ask what for we need a strange loop that does nothing. Contemporary processors reduce their frequency at low load. This loop preliminarily increases the processor's speed to the maximum and therefore increases the precision of speed measurement a bit. Additionally, we are warming up all the available cores. The timing method we have shown involves the same drawback: we cannot separate the running time of user code from that of system code. If there are other tasks running on a core at the same time, the result might also be rather inaccurate. But it seems to me that this method still can be applied to a parallel algorithm unlike GetThreadTimes. Let's measure the results of Timing and Timing2 classes at various numbers of threads. For this purpose, the OpenMP-directive num_threads(N) is used. Let's arrange the data in a table shown in Figure 3. Figure 3 - Algorithm's running time in seconds measured with functions GetThreadTimes and QueryPerformanceCounter on a four-core machine As you may see, as long as the number of threads does not exceed the number of cores, the GetThreadTimes function gives us a result similar to that of the QueryPerformanceCounter function, which makes you think that measurement is correct. But if there are more threads, you cannot rely on its result.
  • 8. Unfortunately, the program prints varying values from launch to launch. I do not know how to make measuring more accurate and correct. So I am waiting for your feedback and methods of correct timing of parallel algorithms. You may download the program text here (a project for Visual Studio 2005).