Site Reliability Engineering (SRE) Practices Online Course
Join our virtual, live instructor-led session and master Site Reliability Engineering (SRE) Practices Training from anywhere in the world.
Upcoming Virtual Training Schedules
Join from anywhere in the world with our live instructor-led sessions
| Code | Start Date | End Date | Duration | Fee | |
|---|---|---|---|---|---|
| SRE-05 | Mon - Fri (5 Days) | USD 850 | Reserve my seat → Register my team → | ||
| SRE-05 | Mon - Fri (5 Days) | USD 850 | Reserve my seat → Register my team → | ||
| SRE-05 | Mon - Fri (5 Days) | USD 850 | Reserve my seat → Register my team → | ||
| SRE-05 | Mon - Fri (5 Days) | USD 850 | Reserve my seat → Register my team → | ||
| SRE-05 | Mon - Fri (5 Days) | USD 850 | Reserve my seat → Register my team → | ||
| SRE-05 | Mon - Fri (5 Days) | USD 850 | Reserve my seat → Register my team → |
Here's What You'll Learn
Each module tackles real challenges you face in your role
SRE foundations and service targets
Observability with Prometheus and Grafana
Incident response and postmortems
Automation and closed-loop remediation
Capacity planning and load resilience
Reliability governance and reporting
SRE roadmap and executive communication
Market-specific guidance for South Africa
A country-aware view of the pressures, proof points, and practical tools that shape how this course applies locally.
Tools and platforms relevant to this field
6Field-relevant examples that may be featured in training where they support the confirmed scope. Exact coverage depends on participant needs and delivery format.
-
Grafana Grafana LabsUsed to build reliability dashboards that combine metrics from multiple systems so teams can spot latency, error-rate, and saturation issues early.
-
Prometheus The Prometheus AuthorsUsed for time-series metrics collection and alerting to support SLI monitoring and error-budget tracking.
-
Kubernetes The Cloud Native Computing FoundationUsed to run and scale services while giving SRE teams a common platform for rollout control, health checks, and automated recovery patterns.
-
PagerDuty PagerDuty, Inc.Used to route incidents to the right responders, manage on-call rotations, and reduce time to acknowledgement during outages.
-
Jira Service Management AtlassianUsed to track incidents, problem records, and follow-up actions so post-incident improvements are assigned and completed.
-
Splunk Splunk LLCUsed to correlate logs and operational events when teams need faster root-cause analysis across distributed services.
Where this course runs
Site Reliability Engineering (SRE) Practices Training is delivered in the cities below — pick the one that fits your schedule.























