Lead our platform reliability initiatives by driving CI/CD modernization, implementing SRE practices, and building comprehensive observability stacks for enterprise systems.
Key Responsibilities
Modernize CI/CD pipelines and practices
Implement SRE principles and SLIs/SLOs
Build and maintain observability stacks
Design incident response playbooks
Optimize system reliability and performance
Mentor team members on SRE best practices
Requirements
7+ years of DevOps/SRE experience
Expert-level Kubernetes knowledge
AWS/GCP/Azure cloud expertise
Strong experience with observability tools (Prometheus, Grafana, Datadog)
Infrastructure as Code (Terraform, CloudFormation)
Incident response and on-call experience
What We Offer
Competitive salary and equity
Remote work flexibility
Professional development budget
Health insurance
25 days paid vacation
Latest hardware and tools
Apply for this Position
Fill out the form below and we'll get back to you soon.