Someone beat you to it!

Unfortunately, another Gumtree user is in the process of purchasing this item. Check back later in case they've changed their mind, or click the button below to browse more Pay Online Securely listings.

3336 Reliability Engineer (Expert)

14 hours ago10 views
Ad Saved to My List
View and manage your saved ads in your account.
Report Ad
General Details
Advertised By:Agency
Company Name:Executive Placements
Job Type:Full-Time
Description
ROLE & RESPONSIBILITIES:
  • 1. Infrastructure & Cloud (35%)
  • Design, build, and maintain scalable, secure, and cost-efficient cloud infrastructure on AWS
  • Manage and evolve Kubernetes clusters including upgrades, capacity planning, and cluster health
  • Build and maintain infrastructure-as-code modules for repeatable, auditable deployments
  • Drive cloud cost optimization - identify waste, right-size resources, implement savings plans
  • Ensure infrastructure meets non-functional requirements: performance, scalability, availability, Disaster
  • Recovery
  • 2. CI/CD & Automation (25%)
  • Build, operate, and continuously improve CI/CD pipelines for fast, safe, and reliable delivery
  • Automate repetitive operational tasks and reduce toil through tooling and runbooks
  • Maintain and improve deployment automation - zero-touch deployments are the goal
  • Drive adoption of best practices across development teams
  • Own deployment runbooks and ensure they are up to date and tested
  • 3. Security Implementation (20%)
  • Implement and maintain security scanning in CI/CD pipelines (SAST, DAST, container image scanning)
  • Harden container and cloud infrastructure security (network policies, IAM, secrets, encryption)
  • Translate security audit findings into concrete technical actions and execute them
  • Drive vulnerability remediation - track, prioritize, and fix security issues with urgency
  • Ensure compliance with our IT security standards and policies
  • 4. Monitoring, Reliability & Incident Response (15%)
  • Implement and own monitoring, logging, and alerting for proactive issue detection
  • Build dashboards that give real-time visibility into system health and performance
  • Lead incident response for infrastructure-related issues - diagnose fast, fix fast
  • Conduct post-incident reviews and drive corrective actions to prevent recurrence
  • Continuously improve system reliability, uptime, and mean time to recovery (MTTR)
  • 5. Technical Optimization & Lifecycle Management (5%)
  • Drive Technical Lifecycle Management (TLM) - plan and execute upgrades and migrations
  • Identify and implement technical optimizations across the stack
  • Contribute to technical strategy and roadmap for platform engineering
  • Actively use and promote AI4DevOps tools and practices where they add real value
  • WHAT DOES SUCCESS LOOK LIKE?
  • Infrastructure Reliable, scalable, cost-optimized - no surprises
  • CI/CD Fast, safe pipelines - developers ship with confidence
  • Security Vulnerabilities found early, fixed fast - no excuses
  • Incidents Quick response, thorough root cause, things get better over time
  • Automation If you did it twice manually, the third time it's automated
  • Delivery You ship improvements continuously - not just pl
Id Subtitle 1354123461
View More
Apply now:
Executive Placements
Selling for 1 year
Total Ads5.23K
Active Ads5.23K
Professional Seller
Seller stats
5.23KTotal Ads
13.59MTotal Views
Contact Executive Placements
Message
(4012)
Name
(Optional)
Email Address
(Optional)
Phone
(Required)
Upload CV(Optional)
DOC or PDF only max 2 MB file size
Send Message
By clicking "Send" you accept the Terms & Conditions and Privacy Notice and agree to receive newsletters and promo offers from us.