About the role
Job Title: Lead Site Reliability Engineer (SRE) – Observability
Location: Remote Options
About the Role
We are looking for a Lead SRE to design, scale, and operate massive-scale observability systems that keep our global services online and performant. You will join an autonomous team of software engineers focused on solving complex data infrastructure challenges.
Key Responsibilities
- Scale Prometheus metrics infrastructure to handle 100+ million active series .
- Operate large Elasticsearch clusters holding 2000+TB of data .
- Grow high-throughput Kafka data pipelines processing hundreds of thousands of events per second.
- Build custom alerting workflows and self-service APIs for internal engineering teams.
- Provision cloud and private infrastructure using Terraform .
Requirements
- 5+ years operating mid-to-large distributed systems on Linux VMs or bare-metal machines.
- 2+ years developing in Go, Python, Ruby, Scala, or Bash.
- Hands-on experience with Prometheus/Thanos/Cortex, Kafka, the ELK stack, Ansible, or Consul .
- Comfortable diving into unfamiliar codebases and participating in an on-call rotation.
Keywords: Observability, Monitoring, SRE, Site Reliability Engineering, DevOps, ElasticSearch, ELK, Prometheus, Kafka, Terraform, Linux, Bare Metal
About this listing
This role passed our automated spam and quality filters and was active in our feed when last checked. Joboru is an aggregator — here is how we screen listings. If anything looks off, tell us.
Similar jobs you may like
Field Service Engineer Worldwide Travel
1 day agoGE Vernova
Field Service Engineer Electronics / Training
1 day agoGE Vernova
Field Service Engineer ( Worldwide )
1 day agoGE Vernova
Field Service Engineering
1 day agoGE Vernova
Trainee Junior Cyber Solutions Consultant
1 day agoNewto Training
Trainee Network Admin
1 day agoNewto Training
Trainee Managed Services Support Analyst- Entry Level
1 day agoNewto Training
IT Security - job guarantee
1 day agoNewto Training
Trainee IT Helpdesk Technician / 1st Line Support
1 day agoNewto Training