Back to search (Gauteng > Pretoria / Tshwane > Jobs > IT & Tech Jobs > Find Software & Web Developer Jobs> ad 1354123461)

3336 Reliability Engineer (Expert)

1 Photo(s)

14 hours ago10 views

Report Ad

General Details

Location:

Menlyn Park, Pretoria / Tshwane

Advertised By:Agency

Company Name:Executive Placements

Job Type:Full-Time

Description

ROLE & RESPONSIBILITIES:

1. Infrastructure & Cloud (35%)
Design, build, and maintain scalable, secure, and cost-efficient cloud infrastructure on AWS
Manage and evolve Kubernetes clusters including upgrades, capacity planning, and cluster health
Build and maintain infrastructure-as-code modules for repeatable, auditable deployments
Drive cloud cost optimization - identify waste, right-size resources, implement savings plans
Ensure infrastructure meets non-functional requirements: performance, scalability, availability, Disaster
Recovery
2. CI/CD & Automation (25%)
Build, operate, and continuously improve CI/CD pipelines for fast, safe, and reliable delivery
Automate repetitive operational tasks and reduce toil through tooling and runbooks
Maintain and improve deployment automation - zero-touch deployments are the goal
Drive adoption of best practices across development teams
Own deployment runbooks and ensure they are up to date and tested
3. Security Implementation (20%)
Implement and maintain security scanning in CI/CD pipelines (SAST, DAST, container image scanning)
Harden container and cloud infrastructure security (network policies, IAM, secrets, encryption)
Translate security audit findings into concrete technical actions and execute them
Drive vulnerability remediation - track, prioritize, and fix security issues with urgency
Ensure compliance with our IT security standards and policies
4. Monitoring, Reliability & Incident Response (15%)
Implement and own monitoring, logging, and alerting for proactive issue detection
Build dashboards that give real-time visibility into system health and performance
Lead incident response for infrastructure-related issues - diagnose fast, fix fast
Conduct post-incident reviews and drive corrective actions to prevent recurrence
Continuously improve system reliability, uptime, and mean time to recovery (MTTR)
5. Technical Optimization & Lifecycle Management (5%)
Drive Technical Lifecycle Management (TLM) - plan and execute upgrades and migrations
Identify and implement technical optimizations across the stack
Contribute to technical strategy and roadmap for platform engineering
Actively use and promote AI4DevOps tools and practices where they add real value
WHAT DOES SUCCESS LOOK LIKE?
Infrastructure Reliable, scalable, cost-optimized - no surprises
CI/CD Fast, safe pipelines - developers ship with confidence
Security Vulnerabilities found early, fixed fast - no excuses
Incidents Quick response, thorough root cause, things get better over time
Automation If you did it twice manually, the third time it's automated
Delivery You ship improvements continuously - not just pl
https://www.executiveplacements.com/Jobs/A/3336-Reliability-Engineer-Expert-1278598-Job-Search-04-08-2026-04-08-52-AM.asp?sid=gumtree

Id Subtitle 1354123461

Apply now:

Executive Placements

Selling for 1 year

Total Ads5.23K

Active Ads5.23K

Professional Seller

Seller stats

5.23KTotal Ads

13.59MTotal Views

View seller profile

Contact Executive Placements

Message

(4012)

Name

(Optional)

Email Address

(Optional)

Phone

(Required)

Upload CV(Optional)

Choose file

DOC or PDF only max 2 MB file size

Send Message

By clicking "Send" you accept the Terms & Conditions and Privacy Notice and agree to receive newsletters and promo offers from us.