Remote

Site Reliability Engineer

Observability Scaleup logo Observability Scaleup
Remote - United States 🇺🇸

Site Reliability Engineer (FedRAMP Cloud)

US East Coast · Remote/Hybrid

A high-scale observability and data infrastructure company is hiring a Site Reliability Engineer to join its Cloud Infrastructure team, focused on Enterprise FedRAMP environments.

The platform processes tens of terabytes of data daily and serves engineering teams operating mission-critical systems.

What You’ll Do

  • Operate and scale high-volume cloud infrastructure
  • Support FedRAMP High/Moderate cloud environments
  • Handle deployments, on-call rotations, and incident management
  • Build internal tools to expand platform capabilities
  • Collaborate with R&D to improve system reliability and stability
  • Contribute to and influence infrastructure roadmap decisions

Tech Stack

  • Kubernetes, Kops
  • AWS
  • Kafka
  • Prometheus, Thanos, Grafana
  • Argo CD
  • Istio
  • Git
  • Infrastructure as Code (Terraform, Crossplane)
  • Golang

Requirements

  • 5+ years as a DevOps Engineer or SRE in production environments
  • Strong Kubernetes operational experience
  • 2+ years working with FedRAMP (High or Moderate), including vulnerability management, scanning, patching, and compliance reporting
  • Strong experience with Golang in production environments
  • Experience with monitoring and observability tooling
  • Experience operating infrastructure in AWS or other major cloud providers
  • Infrastructure as Code experience
  • Solid networking knowledge (HTTP, gRPC, SSL, networking layers)

Nice to Have

  • Experience operating large-scale data pipelines
  • Familiarity with Apache Kafka

Apply

Send your resume to sre.observability@golang.cafe

Looking for more roles like this?

Join our talent network and get matched with similar opportunities from top companies.