SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Downloaden Sie, um offline zu lesen
Reference Code: TA001575ADT
Publication Date: May 2009
Author: Chandranshu Singh and Michael Azoff




   TECHNOLOGY AUDIT


   TotalView 8.7, ReplayEngine 2.0
   TotalView Technologies


BUTLER GROUP VIEW


ABSTRACT

               TotalView Technologies provides ReplayEngine 2.0, a reverse debugging tool that records and
               deterministically replays execution history of programs to which it is attached. The tool simplifies
               troubleshooting of application errors, and enables developers to work back from failures to their causes,
               thereby enhancing developer productivity and facilitating quick resolution of hitherto hard-to-find bugs.
               TotalView ReplayEngine (RE) works within the TotalView 8.7 environment which provides comprehensive
               debugging capabilities for distributed multi-threaded and multi-process programs. TotalView also provides
               memory debugging capabilities through its separately licensed TotalView debugger. The solution is well
               suited to distributed parallel development environments. TotalView 8.7 and ReplayEngine 2.0 are aimed at
               specific industry verticals engaged in development of distributed, multi-core applications, and also at the High
               Performance Computing (HPC) environments. Butler Group believes that the solution should be considered
               by all organisations developing C/C++ and FORTRAN on the Linux platform.


KEY FINDINGS

                            By recording execution history ReplayEngine                    TotalView 8.7 enables debugging of parallel
                            2.0 solves the time-consuming problem of                       applications, and supports various MPI
                            determining failure causes.                                    architectures.
                            TotalView 8.7’s distributed architecture is                    ReplayEngine 2.0 enables analysis of
                            well suited for popular HPC clusters.                          programs with shared memory or multiple
                                                                                           threads.
                            Recording overheads may be unsuitable for                      TotalView does not cover the Microsoft
                            certain real-time/near-time requirements.                      Windows platform.
                            ReplayEngine depends on and works within                       Graphical visualisation features are provided
                            the TotalView 8.7 environment.                                 for data structures and execution paths.




LOOK AHEAD

               TotalView Tech plans for future versions of ReplayEngine to support backwards continue operations, to
               provide a graphical time line of recorded history, and to separate the process of recording execution history
               and the process of replaying and examining that history.



TotalView Technologies – TotalView 8.7, ReplayEngine 2.0                                                             Published 05/2009

© Butler Group. This Technology Audit is a licensed product and is not to be photocopied                                         Page 1
TECHNOLOGY AUDIT


FUNCTIONALITY

               Developer productivity is an issue for all organisations engaged in software development, irrespective of their
               scale or nature. Developer time spent on debugging and understanding application behaviour is usually
               greater than that spent on writing code, as debugging typically involves analysing many variables in
               unpredictable ways, which renders analysis in limited schedules very difficult. Developers therefore depend
               on source-code debuggers – products that provide visibility into the execution state of programs – to try to
               work their way back from failures and system crashes by analysing system state dump (core) files and
               attempting to replicate failure conditions in test environments to arrive at the root cause of failure. This may
               take anywhere between hours and months, or may even never happen, depending on the complexity of the
               problem. Management may blame this on IT departments’ incompetence, while the real reason may well be
               the non-deterministic nature of some kinds of problems that can arise in production environments. Non-
               deterministic failures can be very difficult to replicate in testing. Some classes of error leave no helpful traces
               in logs and core files may provide an incomplete picture of the failure.


Product Analysis

               TotalView Technologies’ new offering, ReplayEngine, overcomes the aforementioned problem by enabling
               users to record program state in execution and deterministically replay it, thereby simplifying troubleshooting
               activities. This is a new concept of reverse debugging, where the flow of an application can be moved
               backward in time, as well as forward stepped, providing a far stronger approach to debugging than purely
               forward stepping.

               Developers, with the help of TotalView ReplayEngine, can now trace execution steps backwards from
               application crashes and arrive at the root cause of failure much more easily. Application programmers
               therefore no longer need to spend hours trying to recreate problem conditions with the help only of controlled
               forward execution using conventional debugging products. ReplayEngine enables developers to arrive at
               anomaly and error conditions that occurred long before actual failure by providing controlled backward
               execution capabilities.

               The troubleshooting process using a traditional forward-stepping debugger is cyclic in nature. During each
               stage the developer is trying to run the program to a specific point in order to answer some question that will
               help reach the next stage in understanding the error and fixing the problem. Once the developers have the
               program in the desired state they can examine where the program is and what the value of various
               programming language variables are at that point. Sometimes that is all that is needed, but more often a
               developer needs to inspect an earlier state in the program. This cycle of running to a point of interest,
               inspecting clues, and having to run again to a different point of interest may need to be repeated many times
               to complete the full analysis of a defective program.

               The reverse debugging paradigm enabled by products like ReplayEngine changes the entire model.
               Developers can run the program once and perform their entire analysis within a single session. They can
               work their way backwards, line-by-line if necessary, from the failure towards the root cause. This allows the
               application developer to focus more on the problem in the source code and less on a “problem reproducer”
               framework. The developer can examine points in the execution history as many times as needed to develop a
               solid understanding of both how the failure happened and how it can be avoided.




TotalView Technologies – TotalView 8.7, ReplayEngine 2.0                                                      Published 05/2009

© Butler Group. This Technology Audit is a licensed product and is not to be photocopied                                  Page 2
TECHNOLOGY AUDIT


               ReplayEngine functions inside the TotalView debugger environment as a plug-in product, as shown below in
               figure 1. By leveraging the capabilities of TotalView debugger, ReplayEngine can handle distributed multi-
               process and multi-threaded programs. The product records program input, such as network and file I/O,
               records information read from shared memory, and captures thread behaviour such as thread creation and
               context switching among threads. In replay mode, operations are similar to that of ‘rewind’ in a media player;
               it lets users review any part of the recorded program execution, set breakpoints/watchpoints, and jump to any
               part of the recorded history. In Butler Group’s opinion ReplayEngine provides value by drastically reducing
               the time spent on debugging activities.

               Butler Group is impressed with the handling of distributed and/or multi-threaded programs in the TotalView
               debugger. The debugger supports various Message Passing Interface (MPI) implementations such as MPICH
               and MPICH2, Open MPI, MVAPICH and MVAPICH2, Intel MPI, HP MPI, and LAM. For multi-process
               debugging TotalView allows users to control anything from a single process, or an arbitrary group of
               processes, through to an entire parallel job. For multi-threaded debugging TotalView allows independent
               control of each thread, and thread group, or operations that affect entire processes. Users can switch their
               view of variables and data between the processes to which the debugger is attached to drill down to the
               lowest level of granularity in any thread or process.

               The testing and debugging of multi-threaded programs is a well-recognised area of project risk in parallel
               program development. Issues related to defects in synchronisation and concurrency control that may occur in
               the actual execution environment can be difficult or impossible to replicate in a smaller-scale development
               and testing environment without the kinds of features provided by TotalView and ReplayEngine. For instance,
               a deadlock condition may cause the application to become unresponsive, but the same condition may evade
               detection when run using a traditional debugger. TotalView gives programmers investigating deadlocks and
               race conditions an easier time by displaying information about messages being exchanged in MPI programs
               in a graphical form. Cycles in these graphs can be an indication of message passing related to deadlocks.

               When the product behaviour depends sensitively on the timing of either inputs to the program or the
               execution of separate calculations, it can become very difficult to perform iterative analysis. TotalView gives
               the developer the ability to control both inputs and the detailed execution sequence of the program.
               ReplayEngine gives the developer the ability, once any given sequence is recorded, to review that sequence
               in as much detail as is necessary to understand how it leads to the failure.

               TotalView enables debugging of multi-process and multi-threaded programs by taking control of the new
               remote or local process or thread as it is created. The product allows users to see the value of a variable in
               each process or thread simultaneously, and places the executing processes and threads into groups so that
               operations like start, stop, step, and examine can be carried out on groups of processes and threads. Users
               can switch between processes and threads by selecting the process or thread name in the Root window.

               One innovative characteristic of ReplayEngine, when compared with other ways of recording program
               execution history, is that it is compatible with a wide range of real-world application architectures. Many
               advanced performance and correctness tools require that the program be built in special ways, that the
               program source code be annotated or processed by some tool that is not part of the normal toolchain, or that
               the program be launched directly from within the correctness tool. TotalView and ReplayEngine place very
               few hurdles of that nature between the developer and the process of using the tool.




TotalView Technologies – TotalView 8.7, ReplayEngine 2.0                                                   Published 05/2009

© Butler Group. This Technology Audit is a licensed product and is not to be photocopied                              Page 3
TECHNOLOGY AUDIT


               The application can be built with or without debug symbols (though debug symbols allow for source-code
               rather than assembly-code display), using the production compiler; can be statically or dynamically linked with
               any libraries; and can be launched either from within TotalView or launched and “attached to” separately with
               TotalView and ReplayEngine. If the user attaches to an already running product with ReplayEngine the
               recording of execution history starts at the point in time where ReplayEngine is attached (rather than the
               beginning of the program’s actual execution).

               Overall Butler Group is of the opinion that TotalView debugger along with features provided by ReplayEngine
               enables developers to take better control, and gain a better understanding, of their programs. The product is
               alive to the changing patterns of programming, and provides strong out-of-the-box features to support
               debugging of distributed multi-threaded, multi-process programs. TotalView debugger allows for editing and
               changing programs on the fly for investigation without needing to recompile, provides comprehensive memory
               debugging features, and, with the introduction of ReplayEngine, has covered a backward traceability feature
               gap that existed in debugging products. The product also supports a wide variety of Linux implementation
               platforms, but currently lacks Windows support.


Product Operation

               TotalView 8.7 is a source-code debugger that allows users to trace program failures back to the root cause,
               analyse source code, and tune the performance of multi-process or multi-threaded programs running on
               various platforms. TotalView provides a GUI interface which enables dynamic source-code analysis, and
               memory debugging for C/C++ and FORTRAN programs. The solution enables diagnosis and resolution of
               complex problems like deadlocks, memory leaks, and race conditions.

               TotalView is a distributed parallel application which has a main process (tvmain) that creates and controls
               lightweight agent processes known as TotalView debugger servers (tvdsvr). During product operation, the
               main process runs on any one system, and agent processes run on other nodes in the cluster. Users interact
               through a graphical user interface with the main process. The lightweight agent processes are controlled by
               the main process and perform operations to control and interrogate the user processes being debugged on
               the respective nodes. ReplayEngine is integrated into both the main process and the agent processes.

               TotalView’s operations are similar to other debuggers, hence developers can start using the product with a
               minimal learning curve. Programs need to be compiled with ‘-g’ switch so that they can be analysed in terms
               of the program source code. If this is not done the debugger can still be used to control and examine the
               program; however, the display of what the program is doing will be at the assembly-language level rather than
               the source-code level. A program can be started under TotalView through a New Program wizard or by
               supplying the program name as a command-line argument. Alternatively the user can start the program
               separately and use the “attach to” operation to gain control of the running application with the debugger.
               TotalView Root and Process windows appear when execution starts. The Root window provides an overview
               of all processes and threads being inspected by the product; listed entries can be selected to provide in-depth
               information about the process or thread.

               The Process window provides detailed information about function and system calls through a stack trace
               pane; when an entry in the stack trace is selected corresponding detailed information about identifiers and
               register values are populated in the adjoining stack frame. The middle pane provides the source code for the
               function being investigated, and allows the user to set breakpoints. The bottom pane allows users to switch
               between three views: Action Points, Processes, and Threads.



TotalView Technologies – TotalView 8.7, ReplayEngine 2.0                                                   Published 05/2009

© Butler Group. This Technology Audit is a licensed product and is not to be photocopied                              Page 4
TECHNOLOGY AUDIT


               A breakpoint in TotalView is known as an Action Point because the product allows users to associate actions
               with breakpoints. This helps to print values, evaluate expressions, or insert a code snippet without the need
               for recompilation and with or without stopping code execution.


               Figure 1:          ReplayEngine Architecture Diagram




               Source: TotalView                                                                           DATAMONITOR



               The product also enables examination of complex user-defined data structures such as arrays, linked lists,
               and structures, and allows users to drill down to the lowest level of granularity and examine data elements’
               values; this is enabled by a feature known as diving. The product also allows users to specify the array
               indices’ range, and filter data based upon user-defined conditions. Another intuitive feature of the solution is
               the Expressions Window which allows users to see the value of variables even if the variable is not present in
               the current routine being investigated; users can add variables and expressions (such as x[i+1]) to the
               expression list at any time.

               Watchpoints provide another way to examine data elements. Users can define watchpoints for specific
               variables, and the watchpoint stops code execution once the value of the variable changes, irrespective of the
               instruction that caused the value to change.




TotalView Technologies – TotalView 8.7, ReplayEngine 2.0                                                   Published 05/2009

© Butler Group. This Technology Audit is a licensed product and is not to be photocopied                               Page 5
TECHNOLOGY AUDIT


               TotalView provides the users with a scriptable Command Line Interface (CLI), which provides access to all
               the fundamental (and quite a few advanced) debugging operations. The CLI is an extension of the Tool
               Command Language (Tcl). Users can enter Tcl statements in a CLI window for manipulating the program
               being debugged, and use commands added to Tcl by TotalView to debug the program. The CLI also allows
               users to create their own commands and use looping constructs.

               The new addition to the TotalView product line is ReplayEngine. ReplayEngine allows users to record the
               execution history of programs, and then replay it, thereby enabling backward debugging and eliminating the
               need to recreate the failure conditions for root-cause analysis. The ReplayEngine operates in two modes:
               Record where it saves the program state, and Replay where it allows users to view the state of the program as it
               executed any previously executed statements. Full information for all variables is available for the entire extent
               of execution history that is recorded and that data may be inspected and explored using the data display
               features described above. Since history is immutable, certain features such as changing a variable’s value,
               calling functions that alter memory, and running threads asynchronously are unavailable during Replay mode.

               The ReplayEngine product bar provides Prev, Unstep, Caller, BackTo, and Live commands. ‘Prev’ displays
               the program state that existed when the previous statement was executed; the command skips over any
               function call made in that statement. ‘Unstep’ is quite similar to Prev, except that it moves the control to the
               last statement of the sub routine if any function call was made in the previous statement. ‘Caller’ displays the
               state that existed before the current routine was called. ‘BackTo’ displays the program state for the statement
               selected by the user if the selected statement executed prior to the currently displayed line. The ‘Live’
               command switches the mode to Record from Replay, and moves the control back to the statement which was
               to be executed when the mode was switched to Replay.

               TotalView also supports debugging of memory-related issues. The memory debugger module of the solution
               can stop program execution when memory is allocated or freed illegitimately on the heap. Keeping track of
               illegitimate memory allocation and de-allocation by the program helps identify statements in code that are
               causing problems. The product also provides details of dangling pointers and memory leaks in the program.
               Memory leaks are those blocks which have been allocated by the program but are no longer used. Identifying
               memory leaks is a crucial step towards problem resolution, as a significant memory leak may cause the
               program to run out of memory sometime after it starts execution and therefore crash. TotalView memory
               debugger can write bit patterns into allocated and de-allocated blocks which helps to identify whether the
               program is using memory that has not yet been initialised, or if it is referencing a location already de-
               allocated. The product can also hold on to de-allocated memory, and check if it is still being used by the
               program. In addition to the aforementioned features the product can search for memory overflow blocks.
               Memory-bound overflow is also a common, yet potentially dangerous, problem that can be detected using
               TotalView. The product detects overflow by allocating blocks adjacent to the block allocated by the program,
               initialising them to a set bit pattern, and checking for overwrites.

               Areas where Butler Group believes TotalView can enhance its offering are in reducing the
               footprint/processing overhead, so that it can be used in real-time or near real-time performance-critical
               applications, and also in expanding its OS platform support to include Microsoft Windows. Combining
               ReplayEngine with its traditional forward debugging tool in one package would also be a useful possibility.


Product Emphasis

               TotalView is among the select few debugging tool vendors that have brought reverse debugging tools to
               market. Butler Group is impressed with the capabilities of the TotalView debugger, and is of the opinion that
               TotalView is equally suited to address debugging and troubleshooting challenges faced by all market
               segments. Furthermore, the product is particularly suited for distributed, parallel-processing High
               Performance Computing (HPC) environments, given its architecture.



TotalView Technologies – TotalView 8.7, ReplayEngine 2.0                                                     Published 05/2009

© Butler Group. This Technology Audit is a licensed product and is not to be photocopied                                 Page 6
TECHNOLOGY AUDIT


DEPLOYMENT

               TotalView ReplayEngine is a separately licensed feature of the TotalView debugger. Both TotalView and
               ReplayEngine can be deployed by users themselves. Deployment involves downloading and setting up the
               software, and then obtaining licences to use the product. The product is usually deployed on a network-
               mounted file system accessible by any node in the cluster. Average installation time is usually measured in
               minutes or hours as it involves download and installation at the end-user level. The TotalView product is a
               closely integrated development suite and does not support modular deployment.

               No additional resources are required for end users to make use of the product post deployment. Training for
               TotalView products is provided online, on site, and at major industry conferences. TotalView debugger and
               ReplayEngine operations however are intuitive and self-explanatory for end users having experience with GUI
               debuggers.

               Technical support over telephone and e-mail for TotalView products is available upon purchase of a
               maintenance contract which also includes access to new product versions and updates.

               The products are available on RedHat Enterprise and Fedora Linux versions, Novell SuSE Enterprise and
               Desktop Linux versions, and Ubuntu Linux running on x86 32- and 64-bit processor-based systems.
               ReplayEngine is an optional component for TotalView debugger, and therefore depends on TotalView
               debugger. The product is not dependent on any other products from TotalView or third parties.

               TotalView, but not ReplayEngine, is available for Mac OS X and for a variety of UNIX operating systems such
               as Solaris, AIX, HP-UX, IRIX and the Linux variants used on the IBM BlueGene, Cray XT series, SGI Altix
               (Itanium2 and x86-64), and SiCortex supercomputers.

               TotalView debugger and ReplayEngine are stand-alone debugging tools that fit into existing software
               development practices, aid in development, and can be used with a wide variety of development
               environments. If developers work using command line tools such as ‘cc’ and ‘make’ then they have the
               required skills to use ReplayEngine and TotalView. The same is true if they use Integrated Development
               Environments (IDEs) such as the Sun Studio for Linux or Eclipse CDT. ReplayEngine and TotalView rely on
               standard information placed into the executable by the compilation process and are agnostic to the user’s
               preference of tools to control the edit and build processes. TotalView Technologies has made an
               experimental Eclipse plug-in available for TotalView as a free download. A separate TotalView Workbench is
               available as a point for integrating third-party development and debugging tools. The products work with
               source code written in C, C++, and FORTRAN languages, and corresponding executables.



PRODUCT STRATEGY

               TotalView targets verticals with complex computing requirements; the vertical segments specifically targeted
               by the company include Oil and Gas, Independent Software Vendors, Financial Services, Digital Content
               Creation, Computer Aided Engineering, Computational Fluid Dynamics, and Aerospace Engineering. The
               company believes that TotalView debugger and ReplayEngine are suitable for development organisations of
               all sizes and should be adopted by all to aid in their development process.




TotalView Technologies – TotalView 8.7, ReplayEngine 2.0                                                Published 05/2009

© Butler Group. This Technology Audit is a licensed product and is not to be photocopied                           Page 7
TECHNOLOGY AUDIT


               ReplayEngine is a time-saving tool that helps programmers develop and troubleshoot programs quickly.
               TotalView provides a method to calculate the product’s Return on Investment (ROI) per developer. It can be
               calculated as the quotient when the difference between cost savings on developer time and cost of ownership
               of the debugger per developer per year is divided by the cost of ownership of the debugger per developer per
               year. The fraction of developer time saved varies from developer to developer and with different problems, so
               it is best to arrive at a range estimate. TotalView suggests that a 10% reduction in resolution time amounts to
               an ROI of 3.2 per year assuming average developer salary. By the same estimate a 50% reduction in
               problem resolution time leads to ROI of 20 times. The company however claims that these are conservative
               estimates and actual returns could be higher. Other intangible benefits include resolution of long-pending
               problems, greater predictability in development schedules, and resolution of bugs that are costly to isolate.

               TotalView debugger and ReplayEngine are sold through the company’s established channel which comprises
               direct sales in the USA, and through resellers in other geographies. TotalView’s business partners include
               Intel, RedHat, SGI, IBM, AMD, Sun, HP, Novell, and James Rivers among others. The company has
               established technical partnerships with Absoft, ClusterCorp, PGI, GCC, and PBS Gridworks among others.

               TotalView debugger and ReplayEngine are licensed on a perpetual variable-capacity licensing model
               whereby a number of licences (called tokens by TotalView) can be shared among a team of developers. A
               given number of tokens provide end users with the ability to debug the same number of processes
               simultaneously. These could be used by one or more members of the development team simultaneously,
               such that the total number of processes being debugged is the same as the number of tokens purchased.
               TotalView debugger and ReplayEngine are separately licensed products.

               TotalView plans to release a new version of the ReplayEngine every six months. The company wants to
               expand the scope of the product so that it would be suitable for a greater set of High Performance Computing
               (HPC) cluster environments. TotalView Tech plans for future versions of ReplayEngine to support backwards
               continue operations, to provide a graphical time line of recorded history, and to separate the process of
               recording execution history and the process of replaying and examining that history.

               The market for debugging tools can be roughly divided between traditional application development on single
               process machines in the first case, and secondly, development of multi-threaded applications on single-core
               machines and the truly parallel multi-core machines – the lattermost being dominated by the High Performance
               Computing (HPC) segment. The former market is well addressed today to the extent that interactive debugging
               tools are commonly available. The market challenge today is that many of the tools that are provided both by
               hardware or OS vendors and Independent Software Vendor (ISV) tool vendors fail to support the needs of
               developers retooling their applications for multiple cores. Even applications that hardly resemble traditional HPC
               applications now need to be explicitly parallel to take advantage of multi-core processors; there is pressure
               building up for existing applications to make the transition. This is causing a surge in the demand for tools that
               enable the debugging of essentially multi-threaded and multi-process applications.

               Debugging programs in HPC/supercomputing environments requires superior technical capabilities as
               applications are inherently multi-process, parallel, and distributed. However, with the entry of multi-core into
               the mainstream, what was a niche market has prospects of growing to become the dominant market
               paradigm, borrowing many of the techniques common in HPC. Butler Group believes that this transition will
               only be successful with the benefit of automation to support such complex programming environments – the
               mass market has neither the skills nor the (business) time to master multi-core with manual methods.
               Therefore, this potentially lucrative market is ideal for TotalView debugger 8.7 and ReplayEngine 2.0 over and
               above its use in purely HPC environments. In Butler Group’s opinion TotalView Technologies is well placed to
               capitalise on the opportunities provided by a growing market.




TotalView Technologies – TotalView 8.7, ReplayEngine 2.0                                                     Published 05/2009

© Butler Group. This Technology Audit is a licensed product and is not to be photocopied                                 Page 8
TECHNOLOGY AUDIT


COMPANY PROFILE

               TotalView Technologies is a provider of troubleshooting and analysis tools for source code in C/C++ and
               FORTRAN. The company is headquartered in Natick, Massachusetts, USA and has offices in Mississippi and
               Toronto. TotalView’s sales, distribution, and support networks are housed in the aforementioned offices,
               supported by the company’s distributor channel in Europe, South America, Asia, and Middle East
               geographies. The company is publicly listed in Norway. TotalView’s customers include Applied Research
               Associates, OpenGeoSolutions, CINECA, STFC Daresbury Laboratory, Weston Geophysical, SIMULIA, Ultra
               Electronics, and Stanford University among others. The company has around 1,200 customers overall.
               TotalView tools are used at the department level in such organisations.



SUMMARY

               TotalView 8.7 and ReplayEngine 2.0 taken together provide comprehensive dynamic source-debugging
               capabilities suitable for all market segments, including HPC environments and organisations of all sizes. The
               products are geared to greatly enhance developer productivity in client organisations. Reverse debugging
               represents a paradigm shift in established debugging practices that has been brought about by recent
               technological advancements; TotalView is one of the select few vendors that provide these capabilities at
               present. Given the trend towards multi-core computing there will be a growing need to support debugging of
               parallel programs and this is, in Butler Group’s opinion, where TotalView has a particular market opportunity.


               Table 1:           Contact Details


                   TotalView Technologies
                   24 Prime Park Way
                   Natick, MA 01760
                   USA
                   Tel: +1 (508) 652 7700
                   E-mail: info@totalviewtech.com
                   www.totalviewtech.com


               Source: TotalView                                                                                                    DATAMONITOR




Headquarters                               Butler Direct Pty Ltd.                          Butler Group           Important Notice
Shirethorn House,                          Level 46, Citigroup Building,                   245 Fifth Avenue,      This report contains data and information up-
37/43 Prospect Street,                     2 Park Street, Sydney,                          4th Floor, New York,   to-date and correct to the best of our
                                                                                                                  knowledge at the time of preparation. The data
Kingston upon Hull,                        NSW, 2000,                                      NY 10016,
                                                                                                                  and information comes from a variety of
HU2 8PX, UK                                Australia                                       USA
                                                                                                                  sources outside our direct control, therefore
Tel: +44 (0)1482 586149                    Tel: + 61 (02) 8705 6960                        Tel: +1 212 652 5302   Butler   Direct   Limited    cannot    give    any
Fax: +44 (0)1482 323577                    Fax: + 61 (02) 8705 6961                        Fax: +1 212 202 4684   guarantees relating to the content of this report.
                                                                                                                  Ultimate responsibility for all interpretations of,
                                                                                                                  and use of, data, information and commentary
       For more information on Butler Group’s Subscription Services please contact                                in this report remains with you. Butler Direct
                                                                                                                  Limited will not be liable for any interpretations
                                       one of the local offices above.                                            or decisions made by you.

TotalView Technologies – TotalView 8.7, ReplayEngine 2.0                                                                             Published 05/2009

© Butler Group. This Technology Audit is a licensed product and is not to be photocopied                                                                 Page 9
                                                                                                                                                          age

Weitere ähnliche Inhalte

Was ist angesagt?

A system for performance evaluation of embedded software
A system for performance evaluation of embedded softwareA system for performance evaluation of embedded software
A system for performance evaluation of embedded software
Mr. Chanuwan
 
LUXproject Functionality Overview R12.1
LUXproject Functionality Overview R12.1LUXproject Functionality Overview R12.1
LUXproject Functionality Overview R12.1
Alexander Zagvozdin
 
XebiaLabs deployment automation brochure
XebiaLabs deployment automation brochureXebiaLabs deployment automation brochure
XebiaLabs deployment automation brochure
guestea92ba
 
Creating_Installers_for_Java_Applications-report
Creating_Installers_for_Java_Applications-reportCreating_Installers_for_Java_Applications-report
Creating_Installers_for_Java_Applications-report
tutorialsruby
 
Verifying Architectural Design Rules of a Flight Software Product Line
Verifying Architectural Design Rules of a Flight Software Product LineVerifying Architectural Design Rules of a Flight Software Product Line
Verifying Architectural Design Rules of a Flight Software Product Line
Dharmalingam Ganesan
 

Was ist angesagt? (15)

SDLC and Software Process Models
SDLC and Software Process ModelsSDLC and Software Process Models
SDLC and Software Process Models
 
A system for performance evaluation of embedded software
A system for performance evaluation of embedded softwareA system for performance evaluation of embedded software
A system for performance evaluation of embedded software
 
Climberreport
ClimberreportClimberreport
Climberreport
 
Unified Process
Unified ProcessUnified Process
Unified Process
 
LUXproject Functionality Overview R12.1
LUXproject Functionality Overview R12.1LUXproject Functionality Overview R12.1
LUXproject Functionality Overview R12.1
 
Solution4 V4
Solution4 V4Solution4 V4
Solution4 V4
 
XebiaLabs deployment automation brochure
XebiaLabs deployment automation brochureXebiaLabs deployment automation brochure
XebiaLabs deployment automation brochure
 
HTAF 2.0 - A hybrid test automation framework.
HTAF 2.0 - A hybrid test automation framework.HTAF 2.0 - A hybrid test automation framework.
HTAF 2.0 - A hybrid test automation framework.
 
Master certificate
Master certificateMaster certificate
Master certificate
 
Software Development Life Cycle
Software Development Life Cycle Software Development Life Cycle
Software Development Life Cycle
 
Creating_Installers_for_Java_Applications-report
Creating_Installers_for_Java_Applications-reportCreating_Installers_for_Java_Applications-report
Creating_Installers_for_Java_Applications-report
 
Verifying Architectural Design Rules of a Flight Software Product Line
Verifying Architectural Design Rules of a Flight Software Product LineVerifying Architectural Design Rules of a Flight Software Product Line
Verifying Architectural Design Rules of a Flight Software Product Line
 
Prasad_CTP
Prasad_CTPPrasad_CTP
Prasad_CTP
 
Software Engineering unit 5
Software Engineering unit 5Software Engineering unit 5
Software Engineering unit 5
 
Agbaje7survey of softwar process
Agbaje7survey of softwar processAgbaje7survey of softwar process
Agbaje7survey of softwar process
 

Ähnlich wie TotalView ReplayEngine Tech Audit

Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]
Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]
Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]
Atmel Corporation
 
Five benefits of agile practices in software intensive systems development
Five benefits of agile practices in software intensive systems developmentFive benefits of agile practices in software intensive systems development
Five benefits of agile practices in software intensive systems development
IBM Rational software
 

Ähnlich wie TotalView ReplayEngine Tech Audit (20)

DevOps CI Automation Continuous Integration
DevOps CI Automation Continuous IntegrationDevOps CI Automation Continuous Integration
DevOps CI Automation Continuous Integration
 
Software engineering
Software engineeringSoftware engineering
Software engineering
 
safety assurence in process control
safety assurence in process controlsafety assurence in process control
safety assurence in process control
 
Software engineering introduction
Software engineering introductionSoftware engineering introduction
Software engineering introduction
 
Introduction to problem solving in C
Introduction to problem solving in CIntroduction to problem solving in C
Introduction to problem solving in C
 
Building Enterprise Application with J2EE
Building Enterprise Application with J2EEBuilding Enterprise Application with J2EE
Building Enterprise Application with J2EE
 
Lecture - 20-23.pptx
Lecture - 20-23.pptxLecture - 20-23.pptx
Lecture - 20-23.pptx
 
Software Development Standard Operating Procedure
Software Development Standard Operating Procedure Software Development Standard Operating Procedure
Software Development Standard Operating Procedure
 
DevOps explained
DevOps explainedDevOps explained
DevOps explained
 
SWE-401 - 11. Software maintenance overview
SWE-401 - 11. Software maintenance overviewSWE-401 - 11. Software maintenance overview
SWE-401 - 11. Software maintenance overview
 
Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]
Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]
Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]
 
Sample report
Sample reportSample report
Sample report
 
ashimpptonsdlc-141119005634-conversion-gate02.pdf
ashimpptonsdlc-141119005634-conversion-gate02.pdfashimpptonsdlc-141119005634-conversion-gate02.pdf
ashimpptonsdlc-141119005634-conversion-gate02.pdf
 
How to increase the ui performance of apps designed using react
How to increase the ui performance of apps designed using react How to increase the ui performance of apps designed using react
How to increase the ui performance of apps designed using react
 
Five benefits of agile practices in software intensive systems development
Five benefits of agile practices in software intensive systems developmentFive benefits of agile practices in software intensive systems development
Five benefits of agile practices in software intensive systems development
 
Adm Initial Proposal
Adm Initial ProposalAdm Initial Proposal
Adm Initial Proposal
 
Ss debuggers
Ss debuggersSs debuggers
Ss debuggers
 
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISON
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISONSTATISTICAL ANALYSIS FOR PERFORMANCE COMPARISON
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISON
 
Life cycle-management-for-oracle-data-integrator-(odi)
Life cycle-management-for-oracle-data-integrator-(odi)Life cycle-management-for-oracle-data-integrator-(odi)
Life cycle-management-for-oracle-data-integrator-(odi)
 
Sd Revision
Sd RevisionSd Revision
Sd Revision
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

TotalView ReplayEngine Tech Audit

  • 1. Reference Code: TA001575ADT Publication Date: May 2009 Author: Chandranshu Singh and Michael Azoff TECHNOLOGY AUDIT TotalView 8.7, ReplayEngine 2.0 TotalView Technologies BUTLER GROUP VIEW ABSTRACT TotalView Technologies provides ReplayEngine 2.0, a reverse debugging tool that records and deterministically replays execution history of programs to which it is attached. The tool simplifies troubleshooting of application errors, and enables developers to work back from failures to their causes, thereby enhancing developer productivity and facilitating quick resolution of hitherto hard-to-find bugs. TotalView ReplayEngine (RE) works within the TotalView 8.7 environment which provides comprehensive debugging capabilities for distributed multi-threaded and multi-process programs. TotalView also provides memory debugging capabilities through its separately licensed TotalView debugger. The solution is well suited to distributed parallel development environments. TotalView 8.7 and ReplayEngine 2.0 are aimed at specific industry verticals engaged in development of distributed, multi-core applications, and also at the High Performance Computing (HPC) environments. Butler Group believes that the solution should be considered by all organisations developing C/C++ and FORTRAN on the Linux platform. KEY FINDINGS By recording execution history ReplayEngine TotalView 8.7 enables debugging of parallel 2.0 solves the time-consuming problem of applications, and supports various MPI determining failure causes. architectures. TotalView 8.7’s distributed architecture is ReplayEngine 2.0 enables analysis of well suited for popular HPC clusters. programs with shared memory or multiple threads. Recording overheads may be unsuitable for TotalView does not cover the Microsoft certain real-time/near-time requirements. Windows platform. ReplayEngine depends on and works within Graphical visualisation features are provided the TotalView 8.7 environment. for data structures and execution paths. LOOK AHEAD TotalView Tech plans for future versions of ReplayEngine to support backwards continue operations, to provide a graphical time line of recorded history, and to separate the process of recording execution history and the process of replaying and examining that history. TotalView Technologies – TotalView 8.7, ReplayEngine 2.0 Published 05/2009 © Butler Group. This Technology Audit is a licensed product and is not to be photocopied Page 1
  • 2. TECHNOLOGY AUDIT FUNCTIONALITY Developer productivity is an issue for all organisations engaged in software development, irrespective of their scale or nature. Developer time spent on debugging and understanding application behaviour is usually greater than that spent on writing code, as debugging typically involves analysing many variables in unpredictable ways, which renders analysis in limited schedules very difficult. Developers therefore depend on source-code debuggers – products that provide visibility into the execution state of programs – to try to work their way back from failures and system crashes by analysing system state dump (core) files and attempting to replicate failure conditions in test environments to arrive at the root cause of failure. This may take anywhere between hours and months, or may even never happen, depending on the complexity of the problem. Management may blame this on IT departments’ incompetence, while the real reason may well be the non-deterministic nature of some kinds of problems that can arise in production environments. Non- deterministic failures can be very difficult to replicate in testing. Some classes of error leave no helpful traces in logs and core files may provide an incomplete picture of the failure. Product Analysis TotalView Technologies’ new offering, ReplayEngine, overcomes the aforementioned problem by enabling users to record program state in execution and deterministically replay it, thereby simplifying troubleshooting activities. This is a new concept of reverse debugging, where the flow of an application can be moved backward in time, as well as forward stepped, providing a far stronger approach to debugging than purely forward stepping. Developers, with the help of TotalView ReplayEngine, can now trace execution steps backwards from application crashes and arrive at the root cause of failure much more easily. Application programmers therefore no longer need to spend hours trying to recreate problem conditions with the help only of controlled forward execution using conventional debugging products. ReplayEngine enables developers to arrive at anomaly and error conditions that occurred long before actual failure by providing controlled backward execution capabilities. The troubleshooting process using a traditional forward-stepping debugger is cyclic in nature. During each stage the developer is trying to run the program to a specific point in order to answer some question that will help reach the next stage in understanding the error and fixing the problem. Once the developers have the program in the desired state they can examine where the program is and what the value of various programming language variables are at that point. Sometimes that is all that is needed, but more often a developer needs to inspect an earlier state in the program. This cycle of running to a point of interest, inspecting clues, and having to run again to a different point of interest may need to be repeated many times to complete the full analysis of a defective program. The reverse debugging paradigm enabled by products like ReplayEngine changes the entire model. Developers can run the program once and perform their entire analysis within a single session. They can work their way backwards, line-by-line if necessary, from the failure towards the root cause. This allows the application developer to focus more on the problem in the source code and less on a “problem reproducer” framework. The developer can examine points in the execution history as many times as needed to develop a solid understanding of both how the failure happened and how it can be avoided. TotalView Technologies – TotalView 8.7, ReplayEngine 2.0 Published 05/2009 © Butler Group. This Technology Audit is a licensed product and is not to be photocopied Page 2
  • 3. TECHNOLOGY AUDIT ReplayEngine functions inside the TotalView debugger environment as a plug-in product, as shown below in figure 1. By leveraging the capabilities of TotalView debugger, ReplayEngine can handle distributed multi- process and multi-threaded programs. The product records program input, such as network and file I/O, records information read from shared memory, and captures thread behaviour such as thread creation and context switching among threads. In replay mode, operations are similar to that of ‘rewind’ in a media player; it lets users review any part of the recorded program execution, set breakpoints/watchpoints, and jump to any part of the recorded history. In Butler Group’s opinion ReplayEngine provides value by drastically reducing the time spent on debugging activities. Butler Group is impressed with the handling of distributed and/or multi-threaded programs in the TotalView debugger. The debugger supports various Message Passing Interface (MPI) implementations such as MPICH and MPICH2, Open MPI, MVAPICH and MVAPICH2, Intel MPI, HP MPI, and LAM. For multi-process debugging TotalView allows users to control anything from a single process, or an arbitrary group of processes, through to an entire parallel job. For multi-threaded debugging TotalView allows independent control of each thread, and thread group, or operations that affect entire processes. Users can switch their view of variables and data between the processes to which the debugger is attached to drill down to the lowest level of granularity in any thread or process. The testing and debugging of multi-threaded programs is a well-recognised area of project risk in parallel program development. Issues related to defects in synchronisation and concurrency control that may occur in the actual execution environment can be difficult or impossible to replicate in a smaller-scale development and testing environment without the kinds of features provided by TotalView and ReplayEngine. For instance, a deadlock condition may cause the application to become unresponsive, but the same condition may evade detection when run using a traditional debugger. TotalView gives programmers investigating deadlocks and race conditions an easier time by displaying information about messages being exchanged in MPI programs in a graphical form. Cycles in these graphs can be an indication of message passing related to deadlocks. When the product behaviour depends sensitively on the timing of either inputs to the program or the execution of separate calculations, it can become very difficult to perform iterative analysis. TotalView gives the developer the ability to control both inputs and the detailed execution sequence of the program. ReplayEngine gives the developer the ability, once any given sequence is recorded, to review that sequence in as much detail as is necessary to understand how it leads to the failure. TotalView enables debugging of multi-process and multi-threaded programs by taking control of the new remote or local process or thread as it is created. The product allows users to see the value of a variable in each process or thread simultaneously, and places the executing processes and threads into groups so that operations like start, stop, step, and examine can be carried out on groups of processes and threads. Users can switch between processes and threads by selecting the process or thread name in the Root window. One innovative characteristic of ReplayEngine, when compared with other ways of recording program execution history, is that it is compatible with a wide range of real-world application architectures. Many advanced performance and correctness tools require that the program be built in special ways, that the program source code be annotated or processed by some tool that is not part of the normal toolchain, or that the program be launched directly from within the correctness tool. TotalView and ReplayEngine place very few hurdles of that nature between the developer and the process of using the tool. TotalView Technologies – TotalView 8.7, ReplayEngine 2.0 Published 05/2009 © Butler Group. This Technology Audit is a licensed product and is not to be photocopied Page 3
  • 4. TECHNOLOGY AUDIT The application can be built with or without debug symbols (though debug symbols allow for source-code rather than assembly-code display), using the production compiler; can be statically or dynamically linked with any libraries; and can be launched either from within TotalView or launched and “attached to” separately with TotalView and ReplayEngine. If the user attaches to an already running product with ReplayEngine the recording of execution history starts at the point in time where ReplayEngine is attached (rather than the beginning of the program’s actual execution). Overall Butler Group is of the opinion that TotalView debugger along with features provided by ReplayEngine enables developers to take better control, and gain a better understanding, of their programs. The product is alive to the changing patterns of programming, and provides strong out-of-the-box features to support debugging of distributed multi-threaded, multi-process programs. TotalView debugger allows for editing and changing programs on the fly for investigation without needing to recompile, provides comprehensive memory debugging features, and, with the introduction of ReplayEngine, has covered a backward traceability feature gap that existed in debugging products. The product also supports a wide variety of Linux implementation platforms, but currently lacks Windows support. Product Operation TotalView 8.7 is a source-code debugger that allows users to trace program failures back to the root cause, analyse source code, and tune the performance of multi-process or multi-threaded programs running on various platforms. TotalView provides a GUI interface which enables dynamic source-code analysis, and memory debugging for C/C++ and FORTRAN programs. The solution enables diagnosis and resolution of complex problems like deadlocks, memory leaks, and race conditions. TotalView is a distributed parallel application which has a main process (tvmain) that creates and controls lightweight agent processes known as TotalView debugger servers (tvdsvr). During product operation, the main process runs on any one system, and agent processes run on other nodes in the cluster. Users interact through a graphical user interface with the main process. The lightweight agent processes are controlled by the main process and perform operations to control and interrogate the user processes being debugged on the respective nodes. ReplayEngine is integrated into both the main process and the agent processes. TotalView’s operations are similar to other debuggers, hence developers can start using the product with a minimal learning curve. Programs need to be compiled with ‘-g’ switch so that they can be analysed in terms of the program source code. If this is not done the debugger can still be used to control and examine the program; however, the display of what the program is doing will be at the assembly-language level rather than the source-code level. A program can be started under TotalView through a New Program wizard or by supplying the program name as a command-line argument. Alternatively the user can start the program separately and use the “attach to” operation to gain control of the running application with the debugger. TotalView Root and Process windows appear when execution starts. The Root window provides an overview of all processes and threads being inspected by the product; listed entries can be selected to provide in-depth information about the process or thread. The Process window provides detailed information about function and system calls through a stack trace pane; when an entry in the stack trace is selected corresponding detailed information about identifiers and register values are populated in the adjoining stack frame. The middle pane provides the source code for the function being investigated, and allows the user to set breakpoints. The bottom pane allows users to switch between three views: Action Points, Processes, and Threads. TotalView Technologies – TotalView 8.7, ReplayEngine 2.0 Published 05/2009 © Butler Group. This Technology Audit is a licensed product and is not to be photocopied Page 4
  • 5. TECHNOLOGY AUDIT A breakpoint in TotalView is known as an Action Point because the product allows users to associate actions with breakpoints. This helps to print values, evaluate expressions, or insert a code snippet without the need for recompilation and with or without stopping code execution. Figure 1: ReplayEngine Architecture Diagram Source: TotalView DATAMONITOR The product also enables examination of complex user-defined data structures such as arrays, linked lists, and structures, and allows users to drill down to the lowest level of granularity and examine data elements’ values; this is enabled by a feature known as diving. The product also allows users to specify the array indices’ range, and filter data based upon user-defined conditions. Another intuitive feature of the solution is the Expressions Window which allows users to see the value of variables even if the variable is not present in the current routine being investigated; users can add variables and expressions (such as x[i+1]) to the expression list at any time. Watchpoints provide another way to examine data elements. Users can define watchpoints for specific variables, and the watchpoint stops code execution once the value of the variable changes, irrespective of the instruction that caused the value to change. TotalView Technologies – TotalView 8.7, ReplayEngine 2.0 Published 05/2009 © Butler Group. This Technology Audit is a licensed product and is not to be photocopied Page 5
  • 6. TECHNOLOGY AUDIT TotalView provides the users with a scriptable Command Line Interface (CLI), which provides access to all the fundamental (and quite a few advanced) debugging operations. The CLI is an extension of the Tool Command Language (Tcl). Users can enter Tcl statements in a CLI window for manipulating the program being debugged, and use commands added to Tcl by TotalView to debug the program. The CLI also allows users to create their own commands and use looping constructs. The new addition to the TotalView product line is ReplayEngine. ReplayEngine allows users to record the execution history of programs, and then replay it, thereby enabling backward debugging and eliminating the need to recreate the failure conditions for root-cause analysis. The ReplayEngine operates in two modes: Record where it saves the program state, and Replay where it allows users to view the state of the program as it executed any previously executed statements. Full information for all variables is available for the entire extent of execution history that is recorded and that data may be inspected and explored using the data display features described above. Since history is immutable, certain features such as changing a variable’s value, calling functions that alter memory, and running threads asynchronously are unavailable during Replay mode. The ReplayEngine product bar provides Prev, Unstep, Caller, BackTo, and Live commands. ‘Prev’ displays the program state that existed when the previous statement was executed; the command skips over any function call made in that statement. ‘Unstep’ is quite similar to Prev, except that it moves the control to the last statement of the sub routine if any function call was made in the previous statement. ‘Caller’ displays the state that existed before the current routine was called. ‘BackTo’ displays the program state for the statement selected by the user if the selected statement executed prior to the currently displayed line. The ‘Live’ command switches the mode to Record from Replay, and moves the control back to the statement which was to be executed when the mode was switched to Replay. TotalView also supports debugging of memory-related issues. The memory debugger module of the solution can stop program execution when memory is allocated or freed illegitimately on the heap. Keeping track of illegitimate memory allocation and de-allocation by the program helps identify statements in code that are causing problems. The product also provides details of dangling pointers and memory leaks in the program. Memory leaks are those blocks which have been allocated by the program but are no longer used. Identifying memory leaks is a crucial step towards problem resolution, as a significant memory leak may cause the program to run out of memory sometime after it starts execution and therefore crash. TotalView memory debugger can write bit patterns into allocated and de-allocated blocks which helps to identify whether the program is using memory that has not yet been initialised, or if it is referencing a location already de- allocated. The product can also hold on to de-allocated memory, and check if it is still being used by the program. In addition to the aforementioned features the product can search for memory overflow blocks. Memory-bound overflow is also a common, yet potentially dangerous, problem that can be detected using TotalView. The product detects overflow by allocating blocks adjacent to the block allocated by the program, initialising them to a set bit pattern, and checking for overwrites. Areas where Butler Group believes TotalView can enhance its offering are in reducing the footprint/processing overhead, so that it can be used in real-time or near real-time performance-critical applications, and also in expanding its OS platform support to include Microsoft Windows. Combining ReplayEngine with its traditional forward debugging tool in one package would also be a useful possibility. Product Emphasis TotalView is among the select few debugging tool vendors that have brought reverse debugging tools to market. Butler Group is impressed with the capabilities of the TotalView debugger, and is of the opinion that TotalView is equally suited to address debugging and troubleshooting challenges faced by all market segments. Furthermore, the product is particularly suited for distributed, parallel-processing High Performance Computing (HPC) environments, given its architecture. TotalView Technologies – TotalView 8.7, ReplayEngine 2.0 Published 05/2009 © Butler Group. This Technology Audit is a licensed product and is not to be photocopied Page 6
  • 7. TECHNOLOGY AUDIT DEPLOYMENT TotalView ReplayEngine is a separately licensed feature of the TotalView debugger. Both TotalView and ReplayEngine can be deployed by users themselves. Deployment involves downloading and setting up the software, and then obtaining licences to use the product. The product is usually deployed on a network- mounted file system accessible by any node in the cluster. Average installation time is usually measured in minutes or hours as it involves download and installation at the end-user level. The TotalView product is a closely integrated development suite and does not support modular deployment. No additional resources are required for end users to make use of the product post deployment. Training for TotalView products is provided online, on site, and at major industry conferences. TotalView debugger and ReplayEngine operations however are intuitive and self-explanatory for end users having experience with GUI debuggers. Technical support over telephone and e-mail for TotalView products is available upon purchase of a maintenance contract which also includes access to new product versions and updates. The products are available on RedHat Enterprise and Fedora Linux versions, Novell SuSE Enterprise and Desktop Linux versions, and Ubuntu Linux running on x86 32- and 64-bit processor-based systems. ReplayEngine is an optional component for TotalView debugger, and therefore depends on TotalView debugger. The product is not dependent on any other products from TotalView or third parties. TotalView, but not ReplayEngine, is available for Mac OS X and for a variety of UNIX operating systems such as Solaris, AIX, HP-UX, IRIX and the Linux variants used on the IBM BlueGene, Cray XT series, SGI Altix (Itanium2 and x86-64), and SiCortex supercomputers. TotalView debugger and ReplayEngine are stand-alone debugging tools that fit into existing software development practices, aid in development, and can be used with a wide variety of development environments. If developers work using command line tools such as ‘cc’ and ‘make’ then they have the required skills to use ReplayEngine and TotalView. The same is true if they use Integrated Development Environments (IDEs) such as the Sun Studio for Linux or Eclipse CDT. ReplayEngine and TotalView rely on standard information placed into the executable by the compilation process and are agnostic to the user’s preference of tools to control the edit and build processes. TotalView Technologies has made an experimental Eclipse plug-in available for TotalView as a free download. A separate TotalView Workbench is available as a point for integrating third-party development and debugging tools. The products work with source code written in C, C++, and FORTRAN languages, and corresponding executables. PRODUCT STRATEGY TotalView targets verticals with complex computing requirements; the vertical segments specifically targeted by the company include Oil and Gas, Independent Software Vendors, Financial Services, Digital Content Creation, Computer Aided Engineering, Computational Fluid Dynamics, and Aerospace Engineering. The company believes that TotalView debugger and ReplayEngine are suitable for development organisations of all sizes and should be adopted by all to aid in their development process. TotalView Technologies – TotalView 8.7, ReplayEngine 2.0 Published 05/2009 © Butler Group. This Technology Audit is a licensed product and is not to be photocopied Page 7
  • 8. TECHNOLOGY AUDIT ReplayEngine is a time-saving tool that helps programmers develop and troubleshoot programs quickly. TotalView provides a method to calculate the product’s Return on Investment (ROI) per developer. It can be calculated as the quotient when the difference between cost savings on developer time and cost of ownership of the debugger per developer per year is divided by the cost of ownership of the debugger per developer per year. The fraction of developer time saved varies from developer to developer and with different problems, so it is best to arrive at a range estimate. TotalView suggests that a 10% reduction in resolution time amounts to an ROI of 3.2 per year assuming average developer salary. By the same estimate a 50% reduction in problem resolution time leads to ROI of 20 times. The company however claims that these are conservative estimates and actual returns could be higher. Other intangible benefits include resolution of long-pending problems, greater predictability in development schedules, and resolution of bugs that are costly to isolate. TotalView debugger and ReplayEngine are sold through the company’s established channel which comprises direct sales in the USA, and through resellers in other geographies. TotalView’s business partners include Intel, RedHat, SGI, IBM, AMD, Sun, HP, Novell, and James Rivers among others. The company has established technical partnerships with Absoft, ClusterCorp, PGI, GCC, and PBS Gridworks among others. TotalView debugger and ReplayEngine are licensed on a perpetual variable-capacity licensing model whereby a number of licences (called tokens by TotalView) can be shared among a team of developers. A given number of tokens provide end users with the ability to debug the same number of processes simultaneously. These could be used by one or more members of the development team simultaneously, such that the total number of processes being debugged is the same as the number of tokens purchased. TotalView debugger and ReplayEngine are separately licensed products. TotalView plans to release a new version of the ReplayEngine every six months. The company wants to expand the scope of the product so that it would be suitable for a greater set of High Performance Computing (HPC) cluster environments. TotalView Tech plans for future versions of ReplayEngine to support backwards continue operations, to provide a graphical time line of recorded history, and to separate the process of recording execution history and the process of replaying and examining that history. The market for debugging tools can be roughly divided between traditional application development on single process machines in the first case, and secondly, development of multi-threaded applications on single-core machines and the truly parallel multi-core machines – the lattermost being dominated by the High Performance Computing (HPC) segment. The former market is well addressed today to the extent that interactive debugging tools are commonly available. The market challenge today is that many of the tools that are provided both by hardware or OS vendors and Independent Software Vendor (ISV) tool vendors fail to support the needs of developers retooling their applications for multiple cores. Even applications that hardly resemble traditional HPC applications now need to be explicitly parallel to take advantage of multi-core processors; there is pressure building up for existing applications to make the transition. This is causing a surge in the demand for tools that enable the debugging of essentially multi-threaded and multi-process applications. Debugging programs in HPC/supercomputing environments requires superior technical capabilities as applications are inherently multi-process, parallel, and distributed. However, with the entry of multi-core into the mainstream, what was a niche market has prospects of growing to become the dominant market paradigm, borrowing many of the techniques common in HPC. Butler Group believes that this transition will only be successful with the benefit of automation to support such complex programming environments – the mass market has neither the skills nor the (business) time to master multi-core with manual methods. Therefore, this potentially lucrative market is ideal for TotalView debugger 8.7 and ReplayEngine 2.0 over and above its use in purely HPC environments. In Butler Group’s opinion TotalView Technologies is well placed to capitalise on the opportunities provided by a growing market. TotalView Technologies – TotalView 8.7, ReplayEngine 2.0 Published 05/2009 © Butler Group. This Technology Audit is a licensed product and is not to be photocopied Page 8
  • 9. TECHNOLOGY AUDIT COMPANY PROFILE TotalView Technologies is a provider of troubleshooting and analysis tools for source code in C/C++ and FORTRAN. The company is headquartered in Natick, Massachusetts, USA and has offices in Mississippi and Toronto. TotalView’s sales, distribution, and support networks are housed in the aforementioned offices, supported by the company’s distributor channel in Europe, South America, Asia, and Middle East geographies. The company is publicly listed in Norway. TotalView’s customers include Applied Research Associates, OpenGeoSolutions, CINECA, STFC Daresbury Laboratory, Weston Geophysical, SIMULIA, Ultra Electronics, and Stanford University among others. The company has around 1,200 customers overall. TotalView tools are used at the department level in such organisations. SUMMARY TotalView 8.7 and ReplayEngine 2.0 taken together provide comprehensive dynamic source-debugging capabilities suitable for all market segments, including HPC environments and organisations of all sizes. The products are geared to greatly enhance developer productivity in client organisations. Reverse debugging represents a paradigm shift in established debugging practices that has been brought about by recent technological advancements; TotalView is one of the select few vendors that provide these capabilities at present. Given the trend towards multi-core computing there will be a growing need to support debugging of parallel programs and this is, in Butler Group’s opinion, where TotalView has a particular market opportunity. Table 1: Contact Details TotalView Technologies 24 Prime Park Way Natick, MA 01760 USA Tel: +1 (508) 652 7700 E-mail: info@totalviewtech.com www.totalviewtech.com Source: TotalView DATAMONITOR Headquarters Butler Direct Pty Ltd. Butler Group Important Notice Shirethorn House, Level 46, Citigroup Building, 245 Fifth Avenue, This report contains data and information up- 37/43 Prospect Street, 2 Park Street, Sydney, 4th Floor, New York, to-date and correct to the best of our knowledge at the time of preparation. The data Kingston upon Hull, NSW, 2000, NY 10016, and information comes from a variety of HU2 8PX, UK Australia USA sources outside our direct control, therefore Tel: +44 (0)1482 586149 Tel: + 61 (02) 8705 6960 Tel: +1 212 652 5302 Butler Direct Limited cannot give any Fax: +44 (0)1482 323577 Fax: + 61 (02) 8705 6961 Fax: +1 212 202 4684 guarantees relating to the content of this report. Ultimate responsibility for all interpretations of, and use of, data, information and commentary For more information on Butler Group’s Subscription Services please contact in this report remains with you. Butler Direct Limited will not be liable for any interpretations one of the local offices above. or decisions made by you. TotalView Technologies – TotalView 8.7, ReplayEngine 2.0 Published 05/2009 © Butler Group. This Technology Audit is a licensed product and is not to be photocopied Page 9 age