Site Reliability Engineer (FedRAMP Cloud)
US East Coast · Remote/Hybrid
A high-scale observability and data infrastructure company is hiring a Site Reliability Engineer to join its Cloud Infrastructure team, focused on Enterprise FedRAMP environments.
The platform processes tens of terabytes of data daily and serves engineering teams operating mission-critical systems.
What You’ll Do
- Operate and scale high-volume cloud infrastructure
- Support FedRAMP High/Moderate cloud environments
- Handle deployments, on-call rotations, and incident management
- Build internal tools to expand platform capabilities
- Collaborate with R&D to improve system reliability and stability
- Contribute to and influence infrastructure roadmap decisions
Tech Stack
- Kubernetes, Kops
- AWS
- Kafka
- Prometheus, Thanos, Grafana
- Argo CD
- Istio
- Git
- Infrastructure as Code (Terraform, Crossplane)
- Golang
Requirements
- 5+ years as a DevOps Engineer or SRE in production environments
- Strong Kubernetes operational experience
- 2+ years working with FedRAMP (High or Moderate), including vulnerability management, scanning, patching, and compliance reporting
- Strong experience with Golang in production environments
- Experience with monitoring and observability tooling
- Experience operating infrastructure in AWS or other major cloud providers
- Infrastructure as Code experience
- Solid networking knowledge (HTTP, gRPC, SSL, networking layers)
Nice to Have
- Experience operating large-scale data pipelines
- Familiarity with Apache Kafka
Apply
Send your resume to sre.observability@golang.cafe