- Ownership
- Architecture
- Reliability
- Observability
- Security
- Compliance
- Communication
- Documentation
- Autonomy
- Transparency
- Collaboration
- Simplicity
- Troubleshooting
- Scalability
- Resilience
Overview
Our client is looking for a Senior DevOps Engineer to help lead a major infrastructure transformation from on-premise roots to a cloud-native, Kubernetes-first architecture on AWS. This role is responsible for owning the cloud platform layer end-to-end, spanning infrastructure, CI/CD, observability, security, and developer enablement in a live production financial environment where uptime, auditability, and security are critical. The successful person will join a small, high-autonomy engineering team, shape platform engineering conventions and tooling choices from the ground up, and simplify complex systems for the engineering teams that rely on them.
Responsibilities
- Design and maintain Terraform and Terragrunt modules for multi-account AWS environments
- Manage EKS clusters, Karpenter node provisioning, networking, and IAM
- Drive infrastructure toward immutable, declarative patterns
- Build and operate the observability stack across metrics, logs, traces, dashboards, and alerting
- Define SLOs and support SLO-based alerting tied to DORA metrics
- Drive incident response and contribute to HA/DR architecture for a regulated financial platform
- Own CI/CD pipelines using GitLab runners, ArgoCD, and GitOps patterns
- Build golden pipelines and self-service tooling that reduce friction between code merge and production
- Implement policy-as-code, container scanning, supply chain security, and secrets management
- Partner with security teams on audit readiness and compliance controls
- Architect and deploy production EKS clusters with Karpenter for intelligent, cost-efficient node scaling
- Build a Terragrunt-driven multi-account AWS landing zone with SCPs, Control Tower, and least-privilege IAM
- Design GitOps deployment pipelines using ArgoCD with ApplicationSets for multi-environment promotion
- Stand up a full observability stack using metrics, logs, traces, and alerting
- Evaluate and implement a service mesh such as Istio or Linkerd for mTLS, traffic management, and canary deployments
- Introduce supply chain security practices including image signing with Cosign, SBOM generation, and admission policies
- Build an internal developer platform with self-service namespaces, templated pipelines, and environment provisioning
- Contribute to architecture decision records, blameless post-mortems, internal documentation, and broader platform engineering standards
- Help shape conventions, tooling choices, golden paths, and internal engineering culture from the ground up
Experience
- 3+ years of production Kubernetes experience
- Strong hands-on experience with EKS; GKE or AKS experience is also relevant
- Deep understanding of Kubernetes networking, RBAC, autoscaling, and production troubleshooting
- Proven Terraform experience at scale, including module authoring, remote state management, and provider version pinning
- Experience with Terragrunt and multi-account cloud patterns
- Deep AWS experience across EKS, IAM, VPC architecture, Organizations, SCPs, and a broad range of AWS services
- Experience designing secure, fast, developer-trusted CI/CD pipelines using GitHub Actions, GitLab CI, or equivalent
- Strong security-first delivery experience, including least-privilege design, container scanning, secrets management, and shift-left practices
- Strong communication skills, including the ability to explain architectural decisions clearly and write operational runbooks
- GitOps experience with ArgoCD or Flux, including ApplicationSets, progressive delivery, or Argo Rollouts
- Service mesh knowledge, including Istio or Linkerd, mTLS, traffic shaping, and blue-green or canary deployment patterns
- Deep observability experience across OpenTelemetry, Prometheus at scale, Grafana Loki, and SLO-based alerting
- Experience with policy-as-code and admission control frameworks such as OPA, Gatekeeper, or Kyverno
- Working familiarity with managed databases, especially Aurora PostgreSQL, including connection pooling, replication, and performance monitoring
- Familiarity working alongside systems backed by Aurora and Redis
- Platform engineering experience across Backstage, internal tooling, golden paths, and developer self-service portals
- FinOps awareness, including cost tagging, Karpenter consolidation policies, Savings Plans, and Cost Anomaly Detection
Qualifications
- AWS certifications related to cloud architecture, security, or DevOps are advantageous
Tools & Technologies
- AWS
- EKS
- Kubernetes
- Terraform
- Terragrunt
- Docker
- GitLab
- GitHub
- ArgoCD
- GitOps
- ApplicationSets
- Karpenter
- IAM
- VPC
- Organizations
- SCPs
- ControlTower
- Prometheus
- Grafana
- OpenTelemetry
- Loki
- Elastic
- CloudWatch
- OPA
- Gatekeeper
- Kyverno
- Trivy
- SOPS
- GuardDuty
- SecurityHub
- SecretsManager
- Istio
- Linkerd
- Backstage
- Cosign
- SBOM
- Aurora
- PostgreSQL
- Redis
- Python
- Bash
- Go
- HCL
- OIDC
- RBAC
- SLOs
- DORA
- Flux
Cape Town
Expected Salary
120 000 ZAR p/m
Work Policy
Hybrid
Team
Engineering
Industry
Software Development
Interview Process
- Initial Screening: (Conducted by Oneo.)
- 30-Minute Call: Focused on company culture and a technical dive.
- Technical Interview: To assess technical skills.
- Panel Interview: Deeper dive into technical & practical knowledge.
- Meet the local team.
