Site Reliability Engineer (FedRAMP Cloud)

US East Coast · Remote/Hybrid

A high-scale observability and data infrastructure company is hiring a Site Reliability Engineer to join its Cloud Infrastructure team, focused on Enterprise FedRAMP environments.

The platform processes tens of terabytes of data daily and serves engineering teams operating mission-critical systems.

What You’ll Do

Operate and scale high-volume cloud infrastructure
Support FedRAMP High/Moderate cloud environments
Handle deployments, on-call rotations, and incident management
Build internal tools to expand platform capabilities
Collaborate with R&D to improve system reliability and stability
Contribute to and influence infrastructure roadmap decisions

Tech Stack

Kubernetes, Kops
AWS
Kafka
Prometheus, Thanos, Grafana
Argo CD
Istio
Git
Infrastructure as Code (Terraform, Crossplane)
Golang

Requirements

5+ years as a DevOps Engineer or SRE in production environments
Strong Kubernetes operational experience
2+ years working with FedRAMP (High or Moderate), including vulnerability management, scanning, patching, and compliance reporting
Strong experience with Golang in production environments
Experience with monitoring and observability tooling
Experience operating infrastructure in AWS or other major cloud providers
Infrastructure as Code experience
Solid networking knowledge (HTTP, gRPC, SSL, networking layers)

Nice to Have

Experience operating large-scale data pipelines
Familiarity with Apache Kafka

Apply

Send your resume to sre.observability@golang.cafe

Site Reliability Engineer