SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
Difference of code analysis approaches in
compilers and specialized tools
Author: Andrey Karpov

Date: 01.11.2010

Compilers and third-party static code analyzers have one common task: to detect dangerous code
                        party
fragments. However, there is a great difference in the types of analysis performed by each kind of these
tools. I will try to show you the differences between these two approaches (and explain their source) by
the example of the Intel C++ compiler and PVS
                           l                 PVS-Studio analyzer.

This time, it is the Notepad++ 5.8.2 project that we chose for the test
                                                                   test.




Notepad++
At first a couple of words about the project we have chosen Notepad++ is an open-source and free
                                                         chosen.
source code editor that supports many languages and appears a substitute for the standard Notepad. It
works in the Microsoft Windows environment and is released under the GPL license What I liked about
                                                                                    license.
this project is that it is written in C++ and has a small size - just 73000 lines of code. But what is the most
important, this is a rather accurate project - it is compiled by presence of the /W4 switch in the project
                                                                                                       project's
settings and /WX switch that makes analyzers treat each warning as an er      error.


Static analysis by compiler
Now let's study the analysis procedure from the viewpoints of a compiler and a separate specialized
         s
tool. The compiler is always inclined to generating warnings after processing only very small local code
fragments. This preference is a consequence of very strict performance requirements imposed on the
compiler. It is no coincidence that there exist tools of distributed project build. The time needed to
compile medium and large projects is a significant factor influencing the choice of development
methodology. So if developers can get a 5% performance gain out of the compiler, they will do it  it.

Such optimization makes the compiler solider and actually such steps as preprocessing, building AST and
code generation are not so distinct. For instance, I may say relying on some indirect signs that Visual C++
uses different preprocessor algorithms when compiling projects and generating preprocessed "*.i" files.
The compiler also does not need (it is even harmful for it) to store the whole AST. Once the code for
some particular nodes is generated and they are no more needed, they get destroyed right away. During
the compilation process, AST may never exist in the full form. There is simply no need for that - we parse
a small code fragment, generate the code and go further. This saves memory and cache and therefore
increases speed.
The result of this approach is "locality" of warnings. The compiler consciously saves on various
structures that could help it detect higher-level errors. Let's see in practice what local warnings Intel C++
will generate for the Notepad++ project. Let me remind you that the Notepad++ project is built with the
Visual C++ compiler without any warnings with the /W4 switch enabled. But the Intel C++ compiler
certainly has a different set of warnings and I also set a specific switch /W5 [Intel C++]. Moreover, I
would like to have a look at what the Intel C++ compiler calls "remark".

Let's see what kinds of messages we get from Intel C++. Here it found four similar errors where the
CharUpper function is being handled (SEE NOTE AT THE END). Note the "locality" of the diagnosis - the
compiler found just a very dangerous type conversion. Let's study the corresponding code fragment:

wchar_t *destStr = new wchar_t[len+1];

...

for (int j = 0 ; j < nbChar ; j++)

{

    if (Case == UPPERCASE)

      destStr[j] =

         (wchar_t)::CharUpperW((LPWSTR)destStr[j]);

    else

      destStr[j] =

         (wchar_t)::CharLowerW((LPWSTR)destStr[j]);

}

Here we see strange type conversions. The Intel C++ compiler warns us: "#810: conversion from
"LPWSTR={WCHAR={__wchar_t} *}" to "__wchar_t" may lose significant bits". Let's look at the
CharUpper function's prototype.

LPTSTR WINAPI CharUpper(

    __inout     LPTSTR lpsz

);

The function handles a string and not separate characters at all. But here a character is cast to a pointer
and some memory area is modified by this pointer. How horrible.

Well, actually this is the only horrible issue detected by Intel C++. All the rest are much more boring and
are rather inaccurate code than error-prone code. But let's study some other warnings too.

The compiler generated a lot of #1125 warnings:

"#1125: function "Window::init(HINSTANCE, HWND)" is hidden by "TabBarPlus::init" -- virtual function
override intended?"
These are not errors but just poor naming of functions. We are interested in this message for a different
reason: although it seems to involve several classes for the check, the compiler does not keep special
data - it must store diverse information about base classes anyway, that is why this diagnosis is
implemented.

The next sample. The message "#186: pointless comparison of unsigned integer with zero" is generated
for the meaningless comparisons:

static LRESULT CALLBACK hookProcMouse(

    UINT nCode, WPARAM wParam, LPARAM lParam)

{

    if(nCode < 0)

    {

        ...

        return 0;

    }

...

}

The "nCode < 0" condition is always false. It is a good example of good local diagnosis. You may easily
find an error this way.

Let's consider the last warning by Intel C++ and get finished with it. I think you have understood the
concept of "locality".

void ScintillaKeyMap::showCurrentSettings() {

    int i = ::SendDlgItemMessage(...);

    ...

    for (size_t i = 0 ; i < nrKeys ; i++)

    {

        ...

    }

}

Again we have no error here. It is just poor naming of variables. The "i" variable has the "int" type at
first. Then a new "i" variable of the "size_t" type is defined in the "for()" operator and is being used for
different purposes. At the moment when "size_t i" is defined, the compiler knows that there already
exists a variable with the same name and generates the warning. Again, it did not require the compiler
to store any additional data - it must remember anyway that the "int i" variable is available until the end
of the function's body.
Third-party static code analyzers
Now let's consider specialized static code analyzers. They do not have such severe speed restrictions
since they are launched ten times less frequently than compilers. The speed of their work might get tens
of times slower than code compilation but it is not crucial: for instance, the programmer may work with
the compiler at day and launch a static code analyzer at night to get a report about suspicious fragments
on the morning. It is quite a reasonable approach.

While paying with slow-down for their work, static code analyzers can store the whole code tree,
traverse it several times and store a lot of additional information. It lets them find "spreaded" and high-
level errors.

Let's see what the PVS-Studio static analyzer can find in Notepad++. Note that I am using a pilot version
that is not available for download yet. We will present the new free general-purpose rule set in 1-2
months within the scope of PVS-Studio 4.00.

Surely, the PVS-Studio analyzer finds errors that may be referred to "local" like in case of Intel C++. This
is the first sample:

bool _isPointXValid;

bool _isPointYValid;

bool isPointValid() {

    return _isPointXValid && _isPointXValid;

};

The PVS-Studio analyzer informs us: "V501: There are identical sub-expressions to the left and to the
right of the '&&' operator: _isPointXValid && _isPointXValid".

I think the error is clear to you and we will not dwell upon it. The diagnosis is "local" because it is
enough to analyze one expression to perform the check.

Here is one more local error causing incomplete clearing of the _iContMap array:

#define CONT_MAP_MAX 50

int _iContMap[CONT_MAP_MAX];

...

DockingManager::DockingManager()

{

    ...

    memset(_iContMap, -1, CONT_MAP_MAX);

    ...

}
Here we have the warning "V512: A call of the memset function will lead to a buffer overflow or
underflow". This is the correct code:

memset(_iContMap, -1, CONT_MAP_MAX * sizeof(int));

And now let's go over to more interesting issues. This is the code where we must analyze two branches
simultaneously to see that there is something wrong:

void TabBarPlus::drawItem(

    DRAWITEMSTRUCT *pDrawItemStruct)

{

    ...

    if (!_isVertical)

     Flags |= DT_BOTTOM;

    else

     Flags |= DT_BOTTOM;

    ...

}

PVS-Studio generates the message "V523: The 'then' statement is equivalent to the 'else' statement". If
we review the code nearby, we may conclude that the author intended to write this text:

if (!_isVertical)

    Flags |= DT_VCENTER;

else

    Flags |= DT_BOTTOM;

And now get brave to meet a trial represented by the following code fragment:

void KeyWordsStyleDialog::updateDlg()

{

    ...

    Style & w1Style =

     _pUserLang->_styleArray.getStyler(STYLE_WORD1_INDEX);

    styleUpdate(w1Style, _pFgColour[0], _pBgColour[0],

     IDC_KEYWORD1_FONT_COMBO, IDC_KEYWORD1_FONTSIZE_COMBO,

     IDC_KEYWORD1_BOLD_CHECK, IDC_KEYWORD1_ITALIC_CHECK,

     IDC_KEYWORD1_UNDERLINE_CHECK);
Style & w2Style =

      _pUserLang->_styleArray.getStyler(STYLE_WORD2_INDEX);

    styleUpdate(w2Style, _pFgColour[1], _pBgColour[1],

      IDC_KEYWORD2_FONT_COMBO, IDC_KEYWORD2_FONTSIZE_COMBO,

      IDC_KEYWORD2_BOLD_CHECK, IDC_KEYWORD2_ITALIC_CHECK,

      IDC_KEYWORD2_UNDERLINE_CHECK);



    Style & w3Style =

      _pUserLang->_styleArray.getStyler(STYLE_WORD3_INDEX);

    styleUpdate(w3Style, _pFgColour[2], _pBgColour[2],

      IDC_KEYWORD3_FONT_COMBO, IDC_KEYWORD3_FONTSIZE_COMBO,

      IDC_KEYWORD3_BOLD_CHECK, IDC_KEYWORD3_BOLD_CHECK,

      IDC_KEYWORD3_UNDERLINE_CHECK);



    Style & w4Style =

      _pUserLang->_styleArray.getStyler(STYLE_WORD4_INDEX);

    styleUpdate(w4Style, _pFgColour[3], _pBgColour[3],

      IDC_KEYWORD4_FONT_COMBO, IDC_KEYWORD4_FONTSIZE_COMBO,

      IDC_KEYWORD4_BOLD_CHECK, IDC_KEYWORD4_ITALIC_CHECK,

      IDC_KEYWORD4_UNDERLINE_CHECK);

    ...

}

I can say that I am proud of our analyzer PVS-Studio that managed to find an error here. I think you have
hardly noticed it or just have skipped the whole fragment to see the explanation. Code review is almost
helpless before this code. But the static analyzer is patient and pedantic: "V525: The code containing the
collection of similar blocks. Check items '7', '7', '6', '7' in lines 576, 580, 584, 588".

I will abridge the text to point out the most interesting fragment:

styleUpdate(...

    IDC_KEYWORD1_BOLD_CHECK, IDC_KEYWORD1_ITALIC_CHECK,

    ...);
styleUpdate(...

   IDC_KEYWORD2_BOLD_CHECK, IDC_KEYWORD2_ITALIC_CHECK,

   ...);

styleUpdate(...

   IDC_KEYWORD3_BOLD_CHECK, !!! IDC_KEYWORD3_BOLD_CHECK !!!,

   ...);

styleUpdate(...

   IDC_KEYWORD4_BOLD_CHECK, IDC_KEYWORD4_ITALIC_CHECK,

   ...);

This code was most likely written by the Copy-Paste method. As a result, it is
IDC_KEYWORD3_BOLD_CHECK which is used instead of IDC_KEYWORD3_ITALIC_CHECK. The warning
looks a bit strange reporting about numbers '7', '7', '6', '7'. Unfortunately, it cannot generate a clearer
message. These numbers arise from macros like these:

#define IDC_KEYWORD1_ITALIC_CHECK (IDC_KEYWORD1 + 7)

#define IDC_KEYWORD3_BOLD_CHECK (IDC_KEYWORD3 + 6)

The last cited sample is especially significant because it demonstrates that the PVS-Studio analyzer
processed a whole large code fragment simultaneously, detected repetitive structures in it and managed
to suspect something wrong relying on heuristic method. This is a very significant difference in the levels
of information processing performed by compilers and static analyzers.


Some figures
Let's touch upon one more consequence of "local" analysis performed by compilers and more global
analysis of specialized tools. In case of "local analysis", it is difficult to make it clear if some issue is really
dangerous or not. As a result, there are ten times more false alarms. Let me explain this by example.

When we analyzed the Notepad++ project, PVS-Studio generated only 10 warnings. 4 messages out of
them indicated real errors. The result is modest, but general-purpose analysis in PVS-Studio is only
beginning to develop. It will become one of the best in time.

When analyzing the Notepad++ project with the Intel C++ compiler, it generated 439 warnings and 3139
remarks. I do not know how many of them point to real errors, but I found strength to review some part
of these warnings and saw only 4 real issues related to CharUpper (see the above description).

3578 messages are too many for a close investigation of each of them. It turns out that the compiler
offers me to consider each 20-th line in the program (73000 / 3578 = 20). Well, come on, it's not serious.
When you are dealing with a general-purpose analyzer, you must cut off as much unnecessary stuff as
possible.

Those who tried the Viva64 rule set (included into PVS-Studio) may notice that it produces the same
huge amount of false alarms. But we have a different case there: we must detect all the suspicious type
conversions. It is more important not to miss an error than not to produce a false alarm. Besides, the
tool's settings provide a flexible filtering of false alarms.


UPDATE: Note
It turned out that I had written a wrong thing here. There is no error in the sample with CharUpperW
but nobody corrected me. I noticed it myself when I decided to implement a similar rule in PVS-Studio.

The point is that CharUpperW can handle both strings and individual characters. If the high-order part of
a pointer is zero, the pointer is considered a character and not pointer any more. Of course, the WIN API
interface in this place disappointed me by its poorness, but the code in Notepad++ is correct.

By the way, it turns out now that Intel C++ has not found any errors at all.

Weitere ähnliche Inhalte

Andere mochten auch

Debugging and optimization of multi-thread OpenMP-programs
Debugging and optimization of multi-thread OpenMP-programsDebugging and optimization of multi-thread OpenMP-programs
Debugging and optimization of multi-thread OpenMP-programsPVS-Studio
 
Interview with Anatoliy Kuznetsov, the author of BitMagic C++ library
Interview with Anatoliy Kuznetsov, the author of BitMagic C++ libraryInterview with Anatoliy Kuznetsov, the author of BitMagic C++ library
Interview with Anatoliy Kuznetsov, the author of BitMagic C++ libraryPVS-Studio
 
VivaMP - a tool for OpenMP
VivaMP - a tool for OpenMPVivaMP - a tool for OpenMP
VivaMP - a tool for OpenMPPVS-Studio
 
PVS-Studio, a solution for developers of modern resource-intensive applications
PVS-Studio, a solution for developers of modern resource-intensive applicationsPVS-Studio, a solution for developers of modern resource-intensive applications
PVS-Studio, a solution for developers of modern resource-intensive applicationsPVS-Studio
 
PVS-Studio vs Chromium
PVS-Studio vs ChromiumPVS-Studio vs Chromium
PVS-Studio vs ChromiumPVS-Studio
 
Building of systems of automatic C/C++ code logging
Building of systems of automatic C/C++ code loggingBuilding of systems of automatic C/C++ code logging
Building of systems of automatic C/C++ code loggingPVS-Studio
 
Parallel programs to multi-processor computers!
Parallel programs to multi-processor computers!Parallel programs to multi-processor computers!
Parallel programs to multi-processor computers!PVS-Studio
 
AMD64 (EM64T) architecture
AMD64 (EM64T) architectureAMD64 (EM64T) architecture
AMD64 (EM64T) architecturePVS-Studio
 
Installation of PC-Lint and its using in Visual Studio 2005
Installation of PC-Lint and its using in Visual Studio 2005Installation of PC-Lint and its using in Visual Studio 2005
Installation of PC-Lint and its using in Visual Studio 2005PVS-Studio
 
Knee-deep in C++ s... code
Knee-deep in C++ s... codeKnee-deep in C++ s... code
Knee-deep in C++ s... codePVS-Studio
 
Some examples of the 64-bit code errors
Some examples of the 64-bit code errorsSome examples of the 64-bit code errors
Some examples of the 64-bit code errorsPVS-Studio
 
Lesson 13. Pattern 5. Address arithmetic
Lesson 13. Pattern 5. Address arithmeticLesson 13. Pattern 5. Address arithmetic
Lesson 13. Pattern 5. Address arithmeticPVS-Studio
 
Optimization in the world of 64-bit errors
Optimization  in the world of 64-bit errorsOptimization  in the world of 64-bit errors
Optimization in the world of 64-bit errorsPVS-Studio
 
Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)
Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)
Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)PVS-Studio
 
Viva64: What Is It, and Who Is It for?
Viva64: What Is It, and Who Is It for?Viva64: What Is It, and Who Is It for?
Viva64: What Is It, and Who Is It for?PVS-Studio
 
Static code analysis for verification of the 64-bit applications
Static code analysis for verification of the 64-bit applicationsStatic code analysis for verification of the 64-bit applications
Static code analysis for verification of the 64-bit applicationsPVS-Studio
 
Problems of testing 64-bit applications
Problems of testing 64-bit applicationsProblems of testing 64-bit applications
Problems of testing 64-bit applicationsPVS-Studio
 

Andere mochten auch (17)

Debugging and optimization of multi-thread OpenMP-programs
Debugging and optimization of multi-thread OpenMP-programsDebugging and optimization of multi-thread OpenMP-programs
Debugging and optimization of multi-thread OpenMP-programs
 
Interview with Anatoliy Kuznetsov, the author of BitMagic C++ library
Interview with Anatoliy Kuznetsov, the author of BitMagic C++ libraryInterview with Anatoliy Kuznetsov, the author of BitMagic C++ library
Interview with Anatoliy Kuznetsov, the author of BitMagic C++ library
 
VivaMP - a tool for OpenMP
VivaMP - a tool for OpenMPVivaMP - a tool for OpenMP
VivaMP - a tool for OpenMP
 
PVS-Studio, a solution for developers of modern resource-intensive applications
PVS-Studio, a solution for developers of modern resource-intensive applicationsPVS-Studio, a solution for developers of modern resource-intensive applications
PVS-Studio, a solution for developers of modern resource-intensive applications
 
PVS-Studio vs Chromium
PVS-Studio vs ChromiumPVS-Studio vs Chromium
PVS-Studio vs Chromium
 
Building of systems of automatic C/C++ code logging
Building of systems of automatic C/C++ code loggingBuilding of systems of automatic C/C++ code logging
Building of systems of automatic C/C++ code logging
 
Parallel programs to multi-processor computers!
Parallel programs to multi-processor computers!Parallel programs to multi-processor computers!
Parallel programs to multi-processor computers!
 
AMD64 (EM64T) architecture
AMD64 (EM64T) architectureAMD64 (EM64T) architecture
AMD64 (EM64T) architecture
 
Installation of PC-Lint and its using in Visual Studio 2005
Installation of PC-Lint and its using in Visual Studio 2005Installation of PC-Lint and its using in Visual Studio 2005
Installation of PC-Lint and its using in Visual Studio 2005
 
Knee-deep in C++ s... code
Knee-deep in C++ s... codeKnee-deep in C++ s... code
Knee-deep in C++ s... code
 
Some examples of the 64-bit code errors
Some examples of the 64-bit code errorsSome examples of the 64-bit code errors
Some examples of the 64-bit code errors
 
Lesson 13. Pattern 5. Address arithmetic
Lesson 13. Pattern 5. Address arithmeticLesson 13. Pattern 5. Address arithmetic
Lesson 13. Pattern 5. Address arithmetic
 
Optimization in the world of 64-bit errors
Optimization  in the world of 64-bit errorsOptimization  in the world of 64-bit errors
Optimization in the world of 64-bit errors
 
Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)
Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)
Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)
 
Viva64: What Is It, and Who Is It for?
Viva64: What Is It, and Who Is It for?Viva64: What Is It, and Who Is It for?
Viva64: What Is It, and Who Is It for?
 
Static code analysis for verification of the 64-bit applications
Static code analysis for verification of the 64-bit applicationsStatic code analysis for verification of the 64-bit applications
Static code analysis for verification of the 64-bit applications
 
Problems of testing 64-bit applications
Problems of testing 64-bit applicationsProblems of testing 64-bit applications
Problems of testing 64-bit applications
 

Kürzlich hochgeladen

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Kürzlich hochgeladen (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Difference of code analysis approaches in compilers and specialized tools

  • 1. Difference of code analysis approaches in compilers and specialized tools Author: Andrey Karpov Date: 01.11.2010 Compilers and third-party static code analyzers have one common task: to detect dangerous code party fragments. However, there is a great difference in the types of analysis performed by each kind of these tools. I will try to show you the differences between these two approaches (and explain their source) by the example of the Intel C++ compiler and PVS l PVS-Studio analyzer. This time, it is the Notepad++ 5.8.2 project that we chose for the test test. Notepad++ At first a couple of words about the project we have chosen Notepad++ is an open-source and free chosen. source code editor that supports many languages and appears a substitute for the standard Notepad. It works in the Microsoft Windows environment and is released under the GPL license What I liked about license. this project is that it is written in C++ and has a small size - just 73000 lines of code. But what is the most important, this is a rather accurate project - it is compiled by presence of the /W4 switch in the project project's settings and /WX switch that makes analyzers treat each warning as an er error. Static analysis by compiler Now let's study the analysis procedure from the viewpoints of a compiler and a separate specialized s tool. The compiler is always inclined to generating warnings after processing only very small local code fragments. This preference is a consequence of very strict performance requirements imposed on the compiler. It is no coincidence that there exist tools of distributed project build. The time needed to compile medium and large projects is a significant factor influencing the choice of development methodology. So if developers can get a 5% performance gain out of the compiler, they will do it it. Such optimization makes the compiler solider and actually such steps as preprocessing, building AST and code generation are not so distinct. For instance, I may say relying on some indirect signs that Visual C++ uses different preprocessor algorithms when compiling projects and generating preprocessed "*.i" files. The compiler also does not need (it is even harmful for it) to store the whole AST. Once the code for some particular nodes is generated and they are no more needed, they get destroyed right away. During the compilation process, AST may never exist in the full form. There is simply no need for that - we parse a small code fragment, generate the code and go further. This saves memory and cache and therefore increases speed.
  • 2. The result of this approach is "locality" of warnings. The compiler consciously saves on various structures that could help it detect higher-level errors. Let's see in practice what local warnings Intel C++ will generate for the Notepad++ project. Let me remind you that the Notepad++ project is built with the Visual C++ compiler without any warnings with the /W4 switch enabled. But the Intel C++ compiler certainly has a different set of warnings and I also set a specific switch /W5 [Intel C++]. Moreover, I would like to have a look at what the Intel C++ compiler calls "remark". Let's see what kinds of messages we get from Intel C++. Here it found four similar errors where the CharUpper function is being handled (SEE NOTE AT THE END). Note the "locality" of the diagnosis - the compiler found just a very dangerous type conversion. Let's study the corresponding code fragment: wchar_t *destStr = new wchar_t[len+1]; ... for (int j = 0 ; j < nbChar ; j++) { if (Case == UPPERCASE) destStr[j] = (wchar_t)::CharUpperW((LPWSTR)destStr[j]); else destStr[j] = (wchar_t)::CharLowerW((LPWSTR)destStr[j]); } Here we see strange type conversions. The Intel C++ compiler warns us: "#810: conversion from "LPWSTR={WCHAR={__wchar_t} *}" to "__wchar_t" may lose significant bits". Let's look at the CharUpper function's prototype. LPTSTR WINAPI CharUpper( __inout LPTSTR lpsz ); The function handles a string and not separate characters at all. But here a character is cast to a pointer and some memory area is modified by this pointer. How horrible. Well, actually this is the only horrible issue detected by Intel C++. All the rest are much more boring and are rather inaccurate code than error-prone code. But let's study some other warnings too. The compiler generated a lot of #1125 warnings: "#1125: function "Window::init(HINSTANCE, HWND)" is hidden by "TabBarPlus::init" -- virtual function override intended?"
  • 3. These are not errors but just poor naming of functions. We are interested in this message for a different reason: although it seems to involve several classes for the check, the compiler does not keep special data - it must store diverse information about base classes anyway, that is why this diagnosis is implemented. The next sample. The message "#186: pointless comparison of unsigned integer with zero" is generated for the meaningless comparisons: static LRESULT CALLBACK hookProcMouse( UINT nCode, WPARAM wParam, LPARAM lParam) { if(nCode < 0) { ... return 0; } ... } The "nCode < 0" condition is always false. It is a good example of good local diagnosis. You may easily find an error this way. Let's consider the last warning by Intel C++ and get finished with it. I think you have understood the concept of "locality". void ScintillaKeyMap::showCurrentSettings() { int i = ::SendDlgItemMessage(...); ... for (size_t i = 0 ; i < nrKeys ; i++) { ... } } Again we have no error here. It is just poor naming of variables. The "i" variable has the "int" type at first. Then a new "i" variable of the "size_t" type is defined in the "for()" operator and is being used for different purposes. At the moment when "size_t i" is defined, the compiler knows that there already exists a variable with the same name and generates the warning. Again, it did not require the compiler to store any additional data - it must remember anyway that the "int i" variable is available until the end of the function's body.
  • 4. Third-party static code analyzers Now let's consider specialized static code analyzers. They do not have such severe speed restrictions since they are launched ten times less frequently than compilers. The speed of their work might get tens of times slower than code compilation but it is not crucial: for instance, the programmer may work with the compiler at day and launch a static code analyzer at night to get a report about suspicious fragments on the morning. It is quite a reasonable approach. While paying with slow-down for their work, static code analyzers can store the whole code tree, traverse it several times and store a lot of additional information. It lets them find "spreaded" and high- level errors. Let's see what the PVS-Studio static analyzer can find in Notepad++. Note that I am using a pilot version that is not available for download yet. We will present the new free general-purpose rule set in 1-2 months within the scope of PVS-Studio 4.00. Surely, the PVS-Studio analyzer finds errors that may be referred to "local" like in case of Intel C++. This is the first sample: bool _isPointXValid; bool _isPointYValid; bool isPointValid() { return _isPointXValid && _isPointXValid; }; The PVS-Studio analyzer informs us: "V501: There are identical sub-expressions to the left and to the right of the '&&' operator: _isPointXValid && _isPointXValid". I think the error is clear to you and we will not dwell upon it. The diagnosis is "local" because it is enough to analyze one expression to perform the check. Here is one more local error causing incomplete clearing of the _iContMap array: #define CONT_MAP_MAX 50 int _iContMap[CONT_MAP_MAX]; ... DockingManager::DockingManager() { ... memset(_iContMap, -1, CONT_MAP_MAX); ... }
  • 5. Here we have the warning "V512: A call of the memset function will lead to a buffer overflow or underflow". This is the correct code: memset(_iContMap, -1, CONT_MAP_MAX * sizeof(int)); And now let's go over to more interesting issues. This is the code where we must analyze two branches simultaneously to see that there is something wrong: void TabBarPlus::drawItem( DRAWITEMSTRUCT *pDrawItemStruct) { ... if (!_isVertical) Flags |= DT_BOTTOM; else Flags |= DT_BOTTOM; ... } PVS-Studio generates the message "V523: The 'then' statement is equivalent to the 'else' statement". If we review the code nearby, we may conclude that the author intended to write this text: if (!_isVertical) Flags |= DT_VCENTER; else Flags |= DT_BOTTOM; And now get brave to meet a trial represented by the following code fragment: void KeyWordsStyleDialog::updateDlg() { ... Style & w1Style = _pUserLang->_styleArray.getStyler(STYLE_WORD1_INDEX); styleUpdate(w1Style, _pFgColour[0], _pBgColour[0], IDC_KEYWORD1_FONT_COMBO, IDC_KEYWORD1_FONTSIZE_COMBO, IDC_KEYWORD1_BOLD_CHECK, IDC_KEYWORD1_ITALIC_CHECK, IDC_KEYWORD1_UNDERLINE_CHECK);
  • 6. Style & w2Style = _pUserLang->_styleArray.getStyler(STYLE_WORD2_INDEX); styleUpdate(w2Style, _pFgColour[1], _pBgColour[1], IDC_KEYWORD2_FONT_COMBO, IDC_KEYWORD2_FONTSIZE_COMBO, IDC_KEYWORD2_BOLD_CHECK, IDC_KEYWORD2_ITALIC_CHECK, IDC_KEYWORD2_UNDERLINE_CHECK); Style & w3Style = _pUserLang->_styleArray.getStyler(STYLE_WORD3_INDEX); styleUpdate(w3Style, _pFgColour[2], _pBgColour[2], IDC_KEYWORD3_FONT_COMBO, IDC_KEYWORD3_FONTSIZE_COMBO, IDC_KEYWORD3_BOLD_CHECK, IDC_KEYWORD3_BOLD_CHECK, IDC_KEYWORD3_UNDERLINE_CHECK); Style & w4Style = _pUserLang->_styleArray.getStyler(STYLE_WORD4_INDEX); styleUpdate(w4Style, _pFgColour[3], _pBgColour[3], IDC_KEYWORD4_FONT_COMBO, IDC_KEYWORD4_FONTSIZE_COMBO, IDC_KEYWORD4_BOLD_CHECK, IDC_KEYWORD4_ITALIC_CHECK, IDC_KEYWORD4_UNDERLINE_CHECK); ... } I can say that I am proud of our analyzer PVS-Studio that managed to find an error here. I think you have hardly noticed it or just have skipped the whole fragment to see the explanation. Code review is almost helpless before this code. But the static analyzer is patient and pedantic: "V525: The code containing the collection of similar blocks. Check items '7', '7', '6', '7' in lines 576, 580, 584, 588". I will abridge the text to point out the most interesting fragment: styleUpdate(... IDC_KEYWORD1_BOLD_CHECK, IDC_KEYWORD1_ITALIC_CHECK, ...);
  • 7. styleUpdate(... IDC_KEYWORD2_BOLD_CHECK, IDC_KEYWORD2_ITALIC_CHECK, ...); styleUpdate(... IDC_KEYWORD3_BOLD_CHECK, !!! IDC_KEYWORD3_BOLD_CHECK !!!, ...); styleUpdate(... IDC_KEYWORD4_BOLD_CHECK, IDC_KEYWORD4_ITALIC_CHECK, ...); This code was most likely written by the Copy-Paste method. As a result, it is IDC_KEYWORD3_BOLD_CHECK which is used instead of IDC_KEYWORD3_ITALIC_CHECK. The warning looks a bit strange reporting about numbers '7', '7', '6', '7'. Unfortunately, it cannot generate a clearer message. These numbers arise from macros like these: #define IDC_KEYWORD1_ITALIC_CHECK (IDC_KEYWORD1 + 7) #define IDC_KEYWORD3_BOLD_CHECK (IDC_KEYWORD3 + 6) The last cited sample is especially significant because it demonstrates that the PVS-Studio analyzer processed a whole large code fragment simultaneously, detected repetitive structures in it and managed to suspect something wrong relying on heuristic method. This is a very significant difference in the levels of information processing performed by compilers and static analyzers. Some figures Let's touch upon one more consequence of "local" analysis performed by compilers and more global analysis of specialized tools. In case of "local analysis", it is difficult to make it clear if some issue is really dangerous or not. As a result, there are ten times more false alarms. Let me explain this by example. When we analyzed the Notepad++ project, PVS-Studio generated only 10 warnings. 4 messages out of them indicated real errors. The result is modest, but general-purpose analysis in PVS-Studio is only beginning to develop. It will become one of the best in time. When analyzing the Notepad++ project with the Intel C++ compiler, it generated 439 warnings and 3139 remarks. I do not know how many of them point to real errors, but I found strength to review some part of these warnings and saw only 4 real issues related to CharUpper (see the above description). 3578 messages are too many for a close investigation of each of them. It turns out that the compiler offers me to consider each 20-th line in the program (73000 / 3578 = 20). Well, come on, it's not serious. When you are dealing with a general-purpose analyzer, you must cut off as much unnecessary stuff as possible. Those who tried the Viva64 rule set (included into PVS-Studio) may notice that it produces the same huge amount of false alarms. But we have a different case there: we must detect all the suspicious type
  • 8. conversions. It is more important not to miss an error than not to produce a false alarm. Besides, the tool's settings provide a flexible filtering of false alarms. UPDATE: Note It turned out that I had written a wrong thing here. There is no error in the sample with CharUpperW but nobody corrected me. I noticed it myself when I decided to implement a similar rule in PVS-Studio. The point is that CharUpperW can handle both strings and individual characters. If the high-order part of a pointer is zero, the pointer is considered a character and not pointer any more. Of course, the WIN API interface in this place disappointed me by its poorness, but the code in Notepad++ is correct. By the way, it turns out now that Intel C++ has not found any errors at all.