About the role
Senior / Staff Site Reliability Engineer | £136k–£180k + equity | Remote Europe or London
Candidates should take the time to read all the elements of this job advert carefully Please make your application promptly.
We're partnering with a fast-growing developer infrastructure startup on a senior SRE hire at a pivotal moment in their growth.
The platform runs AI agents and background workflows in production at massive scale handling hundreds of millions of executions per month on infrastructure they run themselves. The team is ~13 people. No engineering managers. Engineers own large parts of the system and work directly with the founders.
The core challenge right now is scale. Execution volume is growing faster than the team can build, which means the next hires are walking into genuine distributed systems problems — not a greenfield rebuild or a dashboard feature.
What you'll be working on
- Owning observability across the platform OpenTelemetry, metrics, logs, traces, and making them genuinely useful at 3am
- Designing and operating distributed systems primitives under real production load — queues, schedulers, checkpoints, backpressure
- Architecting and tuning auto-scaling infrastructure that runs untrusted customer code at high throughput
- Hardening multi-tenant sandbox isolation, secrets handling, network policy, and supply chain security
- Owning Terraform and IaC as a first principle across a cloud-native footprint
- Running on-call practice: SLOs, runbooks, blameless postmortems, paging hygiene
What they're looking for
- Strong observability background production experience with OpenTelemetry, Prometheus or equivalent
- Distributed systems experience you've designed or operated systems with non-trivial failure modes
- Strong with in TypeScript and/or Go the codebase is TypeScript-heavy with Go emerging as a second language.
- Self-managed Kubernetes in production, not just managed control planes
- Performance and scaling instincts you've chased real bottlenecks across app, database, and infra layers
- Terraform as a first principle, run at meaningful scale
- Security mindset — multi-tenant isolation, least privilege, threat modelling
- Postgres and Redis under load, AWS strongly preferred
The process
Screening call, hiring manager conversation, Technical with roughly a 10% pass rate, then a final with the wider team. xwzovoh The bar is high but if you find that motivating rather than off-putting, that's probably a good sign.
About this listing
This role passed our automated spam and quality filters and was active in our feed when last checked. Joboru is an aggregator — here is how we screen listings. If anything looks off, tell us.
Similar jobs you may like
TDM - Credit Risk
1 day agoAdecco
Application Support Manager
1 day agoTiger Resourcing Group
Head of Cyber Defend / CERT
1 day agoPrime Personnel UK
Head of Supplier Relationship Management
1 day agoTXP
Solution Architect
1 day agoExperis
Lead AWS Cloud Architect
1 day agoMastek UK Limited
Test Manager / Lead £600/d Financial Services Remote
1 day agoAdecco
Business Change Analyst (London Market Insurance)
1 day agoRed King Resourcing
Senior Software Developer
1 day agoSpectrum IT Recruitment