Avaron AB

Senior DevOps/SRE Engineer

Publicerad2026-02-17

Ansök senast2026-02-20

Om jobbet

About the Company

Avaron AB is a growing consultancy focused on technology, finance, and business support. We match your expertise with the market's most interesting assignments, offering a platform where your professional development is central.

About the Assignment

We are looking for a Senior DevOps/SRE Engineer to take end-to-end technical ownership of cloud infrastructure, observability, reliability engineering, and cloud cost optimization across AWS and GCP. You will be accountable for designing, delivering, and operating solutions with measurable outcomes, helping the organization move from reactive incident handling to systematic reliability engineering. The assignment also includes on-call duty and occasional travel.

Job DescriptionDesign, implement, and continuously improve end-to-end observability across infrastructure, platform services, applications, and business metrics

Define an observability architecture and data model for metrics, logs, and traces, including standards for naming, tracing, and structured logging

Build operational dashboards for infrastructure, platform services, and business-critical KPIs

Establish alerting with clear severity levels, noise reduction practices, and automated routing

Define and implement SLI/SLO/SLA practices, including error budgets for critical flows

Lead systematic cloud cost optimization (FinOps) across AWS and GCP, creating visibility and an actionable optimization plan across compute, storage, databases, and network

Introduce shift-left cost controls such as CI/CD cost checks, ownership rules, budget limits, and recurring cost reviews

Establish incident management practices, RCA templates, and follow-up tracking to drive systemic improvements

Set up reliability governance, including gradual rollout practices and automated rollback mechanisms

Create and maintain a risk register covering systemic risks and technical debt, with a prioritized remediation roadmap

Optimize Kubernetes platforms (EKS/GKE) to improve stability and resource utilization

Participate in on-call rotation and support production operations and major incident response

Requirements5+ years of DevOps / SRE / Cloud Platform experience

At least 3 years in a Staff/Principal or Tech Lead role

Experience operating large-scale distributed systems in production

Deep expertise in both AWS and GCP, including cross-cloud architecture design

Strong experience with Terraform, Pulumi, or CDK

Proven experience designing and implementing observability from scratch

Hands-on experience with Prometheus, Grafana, Loki, Elastic, and Kibana

Deep understanding of Kubernetes internals (e.g., Scheduler, Controllers, etcd, CNI, CRI) and managing large-scale production clusters

Proficiency in Java or Python/Go

Nice to haveGoogle SRE background or deep SRE practice

Experience with chaos engineering

Documented FinOps success cases

Knowledge of eBPF and performance profiling

Open-source contributions

Experience designing multi-cloud disaster recovery (active-active or active-passive)

Professional working proficiency in Mandarin (Chinese)

Application

Selections are made on an ongoing basis, so we recommend that you apply as soon as possible.

Tillbaka till toppen