Site Reliability Engineering (SRE) Practices Online Course
Join our virtual, live instructor-led session and master Site Reliability Engineering (SRE) Practices Training from anywhere in the world.
Upcoming Virtual Training Schedules
Join from anywhere in the world with our live instructor-led sessions
| Code | Start Date | End Date | Duration | Fee | |
|---|---|---|---|---|---|
| SRE-05 | Mon - Fri (5 Days) | USD 850 | Reserve my seat → Register my team → | ||
| SRE-05 | Mon - Fri (5 Days) | USD 850 | Reserve my seat → Register my team → | ||
| SRE-05 | Mon - Fri (5 Days) | USD 850 | Reserve my seat → Register my team → | ||
| SRE-05 | Mon - Fri (5 Days) | USD 850 | Reserve my seat → Register my team → | ||
| SRE-05 | Mon - Fri (5 Days) | USD 850 | Reserve my seat → Register my team → | ||
| SRE-05 | Mon - Fri (5 Days) | USD 850 | Reserve my seat → Register my team → |
Here's What You'll Learn
Each module tackles real challenges you face in your role
SRE foundations and service targets
Observability with Prometheus and Grafana
Incident response and postmortems
Automation and closed-loop remediation
Capacity planning and load resilience
Reliability governance and reporting
SRE roadmap and executive communication
Market-specific guidance for Kenya
A country-aware view of the pressures, proof points, and practical tools that shape how this course applies locally.
Tools and platforms relevant to this field
6Field-relevant examples that may be featured in training where they support the confirmed scope. Exact coverage depends on participant needs and delivery format.
-
Datadog DatadogUsed for infrastructure and application observability, including metrics, logs, traces, and alerting to support SLO tracking and incident detection.
-
Grafana Grafana LabsUsed to build reliability dashboards that help teams monitor service health, latency, and error rates in one place.
-
Prometheus The Linux FoundationUsed to collect time-series metrics for service monitoring and alerting in SRE workflows.
-
PagerDuty PagerDutyUsed for on-call alerting, incident escalation, and coordinated response during service outages.
-
Jira Service Management AtlassianUsed to manage incident tickets, post-incident actions, and service workflows across support and engineering teams.
-
Splunk SplunkUsed for centralized log analysis, incident investigation, and operational troubleshooting.
Where this course runs
Site Reliability Engineering (SRE) Practices Training is delivered in the cities below — pick the one that fits your schedule.























