Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Work Unit Analysis Tool

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
Automated testing
Automated testing
Wird geladen in …3
×

Hier ansehen

1 von 39 Anzeige

Work Unit Analysis Tool

Herunterladen, um offline zu lesen

The Workunit Analyser examines the entire workunit to produce advice that both novices and experienced ECL developers should find useful. The Workunit Analyser is a post-execution analyser that identifies potential issues and assists users in writing better ECL.

The Workunit Analyser examines the entire workunit to produce advice that both novices and experienced ECL developers should find useful. The Workunit Analyser is a post-execution analyser that identifies potential issues and assists users in writing better ECL.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Ähnlich wie Work Unit Analysis Tool (20)

Anzeige

Weitere von HPCC Systems (20)

Aktuellste (20)

Anzeige

Work Unit Analysis Tool

  1. 1. 2019 HPCC Systems® Community Day Challenge Yourself – Challenge the Status Quo Shamser Ahmed shamser.ahmed@lexisnexisrisk.co m Workunit Analysis Tool Tech Review
  2. 2. Overview • Why analyze workunits? • Analyzing workunits manually • Introducing the Workunit Analysis Tool • Demonstration • Challenges • Concluding remarks • Questions & Suggestions Workunit Analysis Tool 2
  3. 3. Why analyze workunits?
  4. 4. Why analyze workunits? Examine graph to • Determine if the job is as efficient as possible • Graph may not be optimal • Issues: redundant/duplicate activities, inefficient sorting, inefficient joins, too many sub-graphs, skew relating issues etc • Human guidance may be necessary • Reveal errors in ECL • Is the platform doing what you expect? • Platform related issues • Why is my job running slower than before? Workunit Analysis Tool 4
  5. 5. Why analyze workunits? Examine graph metrics to identify issues with • Skews • Spills • External services • Less than optimal operation (join, sort, distribute, etc) • Does actual time taken match expected time? Workunit Analysis Tool 5
  6. 6. Why analyze workunits? To make sure the platform is doing what you expected it to do, To have the information necessary to optimize the ECL code, and identify issues. Workunit Analysis Tool 6 ECL related project should not be considered complete until a thorough graph analysis has been completed.
  7. 7. Analyzing workunits manually
  8. 8. Analyzing workunit - a walk through Workunit Analysis Tool 8
  9. 9. Analyzing workunit - a walk through Workunit Analysis Tool 9
  10. 10. Analyzing workunit - a walk through Workunit Analysis Tool 10
  11. 11. Analyzing workunit - a walk through Workunit Analysis Tool 11
  12. 12. Analyzing workunit - a walk through Workunit Analysis Tool 12
  13. 13. Analyzing workunit - a walk through Workunit Analysis Tool 13
  14. 14. Analyzing workunit - a walk through Workunit Analysis Tool 14
  15. 15. Analyzing workunit - a walk through Workunit Analysis Tool 15
  16. 16. Analyzing workunit - a walk through Workunit Analysis Tool 16
  17. 17. Analyzing workunit - a walk through Workunit Analysis Tool 17
  18. 18. Analyzing workunit - a walk through Workunit Analysis Tool 18
  19. 19. So, do we routinely analyze work units? oAlways? oSometimes? oRarely? Workunit Analysis Tool 19
  20. 20. So, do we routinely analyze work units? • Probably not enough • Probably not in sufficient depth • Why? • Difficult to fully understand large graphs • Difficult to digest the large number of metrics • Difficult to interpret the metrics • Not having the time Workunit Analysis Tool 20
  21. 21. Introducing the Workunit Analysis Tool
  22. 22. Introducing the Workunit Analysis Tool • Analyzes the workunit to provide information useful for • Improving performance • Diagnosing issues Workunit Analysis Tool 22
  23. 23. Rules Distribute skew rule IO Disk read skew rule IO Disk write skew rule Spill skew rule Spilling in few nodes rule Keyed join rule Lookup join rule Sequential slow rule Slow external call How it works? Workunit Analysis Tool 23 Graph Split into activities Workunit Analysis Tool Rules Process Match Rule Issues Activity Issue Cost a3 Distrbute skew worse than input dataset 3000 A5 Heavily skewed IO 2000 Calc Cost Report highest cost issues
  24. 24. How cost is calculated? • Cost is Actual time taken - theoretical ideal time Workunit Analysis Tool 24 Example: 400 way Thor An activity’s metrics show: Theoretical ideal ~ average node’s elapsed time. i.e. 10 minutes Cost = max-ideal i.e. 45-10 => 35 minutes Slowest node Average node Activity 45 minutes 10 minutes 45 minutes Elapsed Time
  25. 25. Demonstration
  26. 26. Workunit Analysis Tool demo Workunit Analysis Tool 26
  27. 27. Workunit Analysis Tool demo Workunit Analysis Tool 27
  28. 28. Workunit Analysis Tool (command line) demo Workunit Analysis Tool 28
  29. 29. Workunit Analysis Tool (command line) demo Workunit Analysis Tool 29
  30. 30. Challenges
  31. 31. Challenges Workunit Analysis Tool 31
  32. 32. Challenges Workunit Analysis Tool 32
  33. 33. Challenges Workunit Analysis Tool 33
  34. 34. Concluding remarks
  35. 35. How it should be used Workunit Analysis Tool 35 It is a tool for the developer It does not decide if something is wrong or right: Developers should interpret the information and decide on what changes (if any) is needed. It will not catch every problem There will always be cases that have not been considered or implemented. Workunits of concern should be analyzed manually.
  36. 36. • Improve cost calculation • More rules • Skews: global sort, spilling skews (some nodes spilling others not), all on one node, unbalanced join and other excessive skews • Issues caused by sequential operation • Slow joins • Ratio of disk IO time to size read out of line • Index read/keyed join & large number of reject rows • Large amount of time in functions & soap calls • Long time waiting for queues • Proportion of time spent spilling to other work • Live analysis: analyze workunit whilst it’s executing • ROXIE Support Features Planned Workunit Analysis Tool 36
  37. 37. Concluding remarks • Automatically analyzes workunit after a job completes • Analyzes the entire work unit in seconds • Thoroughly analyses workunit: • Every graph & subgraph • Every metric • Every time • Now, every workunit may be analyzed every time it executes • Caveat: • Work in progress • Doesn’t eliminate manual analysis Workunit Analysis Tool 37
  38. 38. Questions? Shamser Ahmed Senior Consulting SW Engineer shamser.ahmed@lexisnexisrisk.com Workunit Analysis Tool 38
  39. 39. View this presentation on YouTube: https://www.youtube.com/watch?v=5F9WW89yDZw&list=PL-8MJMUpp8IKH5- d56az56t52YccleX5h&index=3&t=0s (5:33:00) Workunit Analysis Tool 39

Hinweis der Redaktion



  • In the presentation, I will be covering the following areas:
  • So, WHY WOULD YOU WANT TO ANALYZE WU?

    I'd suggest that you'd Examine graph to...

    Graph not optimal (compile time information)
    The code generator does not "know" about data until execution completes.
    Hints need to guide the code generator
    A different action may be better suited
    Highlight inefficiencies in ECL code
    Too many small sub-graphs with effecting performance 
    Inappropriate joins – is keyed join better, lookup join?
    Or spills at unexpected times

    Platform is not infallible
    Code generator could do a better job. 
    The engines can always be optimise further
    … Team constantly improving
    ---------------------
    Analyzing WU may highlight issues in the design, data or architecture
    Hey, my job is running slower?
    Regression in platform
    Or bug introduced in ECL
    Or has the data changed?

  • In addition to examining the graph, the graph metrics should be examined

    The metrics will highlight

    Skews causing cluster to be used inefficiently..some nodes idle whilst others very busy

    Spills affect performance. Usually, necessary. But may be possible to eliminate

    External (soap calls) becoming a bottle neck

    Lookup join, keyed join better?  Assisting in achieving better distribution?

    How long do we expect that "work" to take? Does it match with the actual time taken?
  • Important to understand:
    how now
    Appreciate what/how analysis tool works
  • START BY having a look at real world Workunit and conducting some analysis.

    This WU executed on a 400 way thor.  As you can see it took over 1 hour 17 minutes.  That is quite a significant amount of resources.  Definitely, worth seeing if it's possible to reduce the total cluster time.
  • With a large Workunit on a busy system, it takes some time to gather and display the graph.
  • Eventually, the entire graph is shown.

    Many graphs, subgraphs and activites here. 
    Too many to examine everyone, so we'll focus on the activies having the biggest impact
  • Clicked sub-graphs icon to get the timings related to the subgraphs and then clicked "TimeElapsed" to sort by timings
  • The list is sorted in reverse elapsed time order – subgraph with highest elapsed time shown first

    Clicking on that one to drill down
  • So, here we have the subgraph with the slowest execution time....

    We going to examine the activities to see where the time is going
  • I've click spill read and see
    1) the maximum execution time is around 24 seconds
    2) other metrics not paricularly interesting
  • Now examine, Project Disk Read..  max local execute 9 seconds

    But skew is 400%.. Would be significant but subsequently HASH DISTRIBUTE
  • Quick process again.. Reducing skew

    Finally, examining Local Join
  • 14 minutes... doesn't sound significant but 14minus X 400 way cluster... worth reducing if possible

    MASSIVE SKEW IN LOCAL EXECUTE TIME 3500%!!  SEEMS one has large number of spills... needs examining

    Consider the previous hash distributes to see if skew may be reduced...


    So, we can carry on looking at elapsed time in other parts of the activites...

    More to do.. examine a different metric
  • It’s not over.. There are many more metrics to examine

    This is to give a taste of the analysing WU manually.  I'll end the demo but in the real work the analysis would continue for far more subgraphs and metrics
  • So, that bring us to the question of "in the real world" do we ...

    I think the answer for most would be "less thank we'd like"
  • large complex workunit that takes significant cluster time
    Some graphs are VERY LARGE and browsers struggle to render quickly enough
    forgiven for not fully understanding all the metrics
    Expected value
    Best case for hardware, network bandwidth
    Need: general feel for what the values should be
    Time consuming:
    Examine key metrics, for key graphs
    But small graph may be important..

    So GREAT TO HAVE MORE ASSISTANT IN analyze WU, So that brings us to Work unit Analyzer tool...
  • … The Workunit analysis tool is designed to assist the user in analysing work units

     <read slide>

    Now:
    Automatic and routine
    More thorough
  • Suppose, heavily skewed data means …

    So, cost in this case would be 2,100 seconds.


    Cost calculation not perfect: e.g. skews upstream activities/ complex relationship

  • I would like to show demo of of it working on a small test ECL.

    Here's a short piece of ECL (that does nothing useful) designed to test the Analysis tool

    It outputs the first 100 users and first 100 urls – for not reason whatsoever..

    Workunit Analysis Tool is built into the workflow...
  •   These are screen shots of job I executed earlier...

    Within a fraction of a second after the WU completes, the potential issues are shown in the messages section...
  • WHEN YOU MAY USE COMAND LINE .are going to have a quick look at the command line verson of the Analysis tool 

    We'd not normally need to use the command line .. but I'm examining the real world workunit that we were looking at earlier.

    To see what issues it detects
  • Bang

    A fraction of a second later

    The analysis is complete...

    You can see it has detected couple of dozen issues.  We found a couple of potential issues in our 5-10 minutes of manual analysis... 2 dozen issues detected in less than a second.

    The list is sorted in reverse cost order, with the highest cost shown first...

    In the real worl, I'd now examine the reported activities in more details and see if we can do something about the area of concern
  • Is a developer tool.  It is a tool to assist the developers


  • ...You feedback and suggestions is invaluable
  •  <read slide>
    Analysis stored with workunit

    analysis is routine
    rather than when a problem arises or when the developer has time

    (3rd point): Potential to for a more thorough analysis




  • Thank-you for listening (and participation).

    We have time for any questions.

    Which work units do you analyze?
    Every work units?
    Only ones that take a long time?
    When jobs take longer than normal?
    How do you analyze workunits?
    Do you focus on particular parts of the graph?
    Particular metrics?
    Skews
    Elapsed time
    Data sizes
    That concludes the presentation.   Feel free to contact me with questions, feedback and suggestions.

    Thank-you very much for your attention.

×