Batch jobs are the lifeblood for thousands of businesses—many of which run millions of batch jobs every year. Unfortunately, managing these high volumes of batch jobs has become a huge nightmare: numerous errors require a large amount of resources to validate and isolate the problems. Even then, batch jobs still run into unexpected outages, while Service Level Agreement (SLA) violations threaten the proper operation of the business.
This webinar demonstrates how ignio™, the world’s leading cognitive system, has been helping customers tackle this complex problem. We share real world examples on how ignio™ is implemented and highlight the lessons learned from these implementations.
Attend and learn:
Why batch job management issues are affecting business operations
How Cognitive systems like ignio™ solve the complex issues of batch jobs management
How implementing ignio™ resolves customer problems—using real world examples
Insights from practitioners on how this is implemented and the lessons learned from these implementations.
Speakers:
Dr. Maitreya Natu, Digitate
Dr. Thomas Reuner, HfS Research
Victor Thu, Digitate
14. Confidential 14 A Tata Consultancy Services Venture
A Tata Consultancy Services Venture
Trusted by Fortune 500 Enterprises
Intelligently Managing Over 600,000
Infrastructure Resources
25+ Patents (pending)
400+ Employees Globally
15. Confidential 15 A Tata Consultancy Services Venture
Impact
Activity Complexity
Automate
simple tasks
Procedural
Perform complex
activities without
explicit instructions
Investigative
Drive proactive
continuous
optimizations
Analytical
Strategize
and plan for
the future
Planning
| FourTypes of Cognitive Activities
Pioneers “Cognitive Automation”
16. Confidential 16 A Tata Consultancy Services Venture
Our First Product
Adaptive cruise control
Autonomous operations
Navigator
Investigate and guide
Self-learned enterprise context
(insights & patterns about interconnected business
applications and their infrastructures)
Machine
learning & AI
Model-driven
software
engineering
Pre-built knowledge
(about IT infrastructure technologies)
+ =
A layer of intelligence for enterprise technology and operations
18. Confidential 18 A Tata Consultancy Services Venture
High complexity and noise
ü High heterogeneity
ü Most of the time is wasted in eliminating noise
ü High dependence on tacit knowledge
Difficulties in assessing impact of change
ü Forced to react to business or technology changes
• Instability and high business impact
Surprises
ü “Unexpected” outages, delays, and SLA violations
ü Inability to prioritize actions
ü Insufficient time to take corrective actions
Batch Processing | Problem Areas
19. Confidential 19 A Tata Consultancy Services Venture
Scale and Complexity
ü 100K+ jobs spread across many business units
ü Complex inter-dependencies across jobs, processes, data feeds, vendor
feeds, files, hosts.
ü Complex workload, resource, performance relationships
Changing environment
ü Every day changing jobs and dependencies
ü Changing compute and storage infrastructure allocation
Diversity
ü Different applications, business units, and business processes
ü Different schedulers – Autosys, ControlM, TWS, Opcon
ü Different platforms – mainframes and distributed systems
ü Different environments - prod, non-prod, dev, QA, UAT
Batch Processing | Key Obstacles
20. Confidential 20 A Tata Consultancy Services Venture
Batch Processing | Problem Areas
Intelligent Command
Center
Proactive
Resilience
Agile
Transformation
Improve transparency and
eliminate noise
Generate proactive notifications
to predict and prevent
What-If and If-what analysis
21. Confidential 21 A Tata Consultancy Services Venture
Case Study: How Customer Uses ignio for Batch Job Management
Context
• Proactive management of batch jobs of a leading bank in the UK
Scope and Scale
• 52 business units
• 2500 business processes
• ~23,000 batch jobs per day
• ~100,000 job-job dependencies
• 2 batch schedulers – OpCon and ControlM
22. Confidential 22 A Tata Consultancy Services Venture
Blueprint Construction
360-degree view: Graph model relating business units, to
business processes to batch jobs
Nodes represent business units, business processes
(streams), and jobs
Nodes and edges are associated with static and dynamic
attributes (e.g., job start time, run time, end time, …)
Edges represent precedence and
containment relationships
Entities, relationships and attributes are mined from batch
schedulers, batch run logs, job definitions, SLA definitions, &
other data sources
23. Confidential 23 A Tata Consultancy Services Venture
Normal Behavior Characterization
Profile vitals and issues
• Changes
• Trends
• Outliers
• Temporal patterns
Profile dependencies
• Influencers
• Influencees
• Cuts across technologies, and business
units
24. Confidential 24 A Tata Consultancy Services Venture
Suppress false alerts
• Dynamic thresholds for run
time, start time and end times
SmartTriggers
25. Confidential 25 A Tata Consultancy Services Venture
• Computes probability of
failure by analyzing past
failures and SLA
violations
• Computes impact by
analyzing dependencies
• Reports jobs with a high
failure risk
Assess and Manage Risks
26. Confidential 26 A Tata Consultancy Services Venture
Predict a Future Batch
• Status of scheduled jobs:
running, delayed, failed jobs
• Inter-stream and inter-BU
dependencies
• Anomalies and SLA violations
• Critical paths and critical jobs
27. Confidential 27 A Tata Consultancy Services Venture
Generate Proactive Notifications
• Derive historical trends and
patterns to predict future
behavior
• Predict likely SLA violations
• Predict time-to-saturation of
resources
28. Confidential 28 A Tata Consultancy Services Venture
• Predict execution behavior of jobs and business processes
• Predict potential SLA violations
• Identify critical jobs and paths to act upon to prevent SLA violations
What-If Analysis
Derive the impact of change
• Business change
• Change in workload
• Operations change
• Addition/Deletion of jobs and
dependencies
• Change in schedule
• Change in runtime
• Infrastructure change
• Change in provisioned CPU/MIPS
• Change in number of worker processes
29. Confidential 29 A Tata Consultancy Services Venture
If-What Analysis
Derive the plan for optimizing
• Batch execution time
• SLA adherence
• Number of required
CPUs/MIPS
• Peak MIPS usage
30. Confidential 30 A Tata Consultancy Services Venture
Lessons Learned
Need to start off with a comprehensive topology
construction
Self learn and adapt to system changes automatically
Knowledge-centric process to translate analytical
observations into recommendations
Detailed behavior modeling for accurate predictions