JobbSafariLediga jobbSite Reliability Engineer

Site Reliability Engineer

Epidemic Sound AB

Sammanfattning

Epidemic Sound is seeking a Site Reliability Engineer to join their central platform team in Stockholm. This role focuses on building and maintaining a reliable, scalable, and secure platform for engineering teams. Responsibilities include managing GKE clusters, CI/CD processes, networking, and observability, while promoting self-service capabilities for product teams. The company values innovation and collaboration, offering a dynamic work environment across multiple global locations.
Visa hela jobbannonsen

Jobbet i korthet

Arbetstid

heltid


Förmåner

Opportunity to work in a global and innovative environment.Collaborative culture that values diversity and inclusion.Chance to influence the sound of streaming and content globally.


Ansök senast: 2650-08-06
Publicerad: 2026-06-27

Beskrivning

Join our global force of 400+ innovators, blending the latest in tech with the greatest in soundtracking, from our Stockholm HQ to offices in London, New York, Los Angeles, Berlin, Paris, Oslo, and Seoul. We're an industry leader with a startup mentality. We take what we do seriously, but we don't take ourselves too seriously. Creating and collaborating to transform the sound of streaming, content, and culture. Come join us, and let the world feel your work

As a Site Reliability Engineer at Epidemic Sound, you will be a core member of the central platform team that builds and operates the platform the rest of Engineering ships on - keeping it reliable, scalable, and secure is what this team exists to do. This is infrastructure-flavoured software engineering: you will write the code that defines and automates the platform, and treat it as a product whose customers are the rest of Engineering. The goal is to make the reliable way the easy way - self-service paths that let product teams build and ship safely without waiting for anyone.

Your key responsibilities include
  • Build and operate the platform our services run on - GKE clusters, the controllers that extend them, and the Terraform that defines our cloud.
  • Own the path from commit to production - CI/CD, GitOps, and the progressive-delivery patterns that turn a merge into a safe release.
  • Strengthen the networking and routing layer - traffic management on top of the VPC, firewalls, and network policies that keep it safe and predictable.
  • Govern access and guardrails - IAM across every layer, policy-as-code, and break-glass paths - so teams move fast within safe defaults rather than waiting on tickets.
  • Grow reliability and observability - alert hygiene, runbooks, SLOs, and the metrics and tracing that show how the platform behaves in production.
  • Enable product teams and raise the bar - make production readiness the default, and drive healthy adoption of the standards and docs you would rather share than gatekeep.


Requirements
  • Kubernetes fundamentals: a solid grasp of controllers, core components, and CNI and networking - depth in the domain matters more than any single tool (GKE a plus).
  • Infrastructure as code and delivery: Terraform, Helm or Kustomize, CI/CD and GitOps (ArgoCD), and the traffic-management and progressive-delivery mechanisms that move releases out safely.
  • Networking and access: routing fundamentals, the VPC, firewall, and network-policy primitives beneath it, and IAM and access management at different levels.
  • Operational depth: monitoring fundamentals (a clear view of when to reach for metrics versus tracing, and experience with an open-source observability stack), strong troubleshooting across distributed systems, and solid Unix/Linux.
  • Agentic development mindset: you use AI agents actively in your own work, knowing where they add leverage and where human judgement is non-negotiable.
  • Collaboration and judgement: you do your best work on large, cross-cutting projects, communicate openly, and stay opinionated but open to discussion - reaching for the right tool over your own creation.


It would also be music to our ears if you have
  • Familiarity with GCP and an observability stack with Prometheus, Thanos, and Grafana.
  • Experience running containerised platforms at scale.
  • Service mesh experience with Cilium eBPF, Linkerd, or Istio.
  • Familiarity with platform building blocks like cert-manager, external-secrets, or external-dns.

Equal opportunity employerWe believe that bringing people together from different backgrounds, experiences and perspectives makes for a healthy workplace, a more successful business and a better world. We value diversity and encourage everyone to come and soundtrack the world with us.

ApplicationReady to make the world feel your work? Please apply, in English.

Ansök till tjänsten

Site Reliability Engineer

Denna arbetsplats har annonserats på Compilation Source (Sweden)-tjänsten den 2026-06-27 och publicerades av Compilation Source (Sweden).

OM FÖRETAGET

Epidemic Sound AB

Hittade du inte vad du letade efter?

Beskriv med dina egna ord vad du söker, precis som om du skulle förklara det för en kompis. Josi hittar jobb som matchar dig på riktigt.

Sök efter fler liknande jobb

Läs också