- Upload Your CV
- Go to your Inbox & Confirm Your Application
For any enquiries on this job please contact the job poster
Gtmhub is the world’s most beautiful and intuitive Objectives and Key Results (OKRs) management and employee experience solution. We build enterprise-scale software with a consumer-grade experience.
We help organizations amplify revenue growth by aligning every employee with their corporate purpose using the OKRs method. We are big believers in the power of employee experience to drive productivity, so our product facilitates best practice employee success features.
At heart, we are product people who love data so much that we built the only solution that integrates more than 150 data connectors to allow for true automation of progress and productivity management.
The term site reliability engineering is credited to Benjamin Treynor Sloss, Vice President of Engineering at Google. He said site reliability engineering is “what happens when a software engineer is tasked with what used to be called operations.”
To us, a Site Reliability Engineer (SRE) is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their services.
SREs design and implement automation with software to replace human labor. They want systems that are automatic, not just automated—such that their services are able to run and repair themselves.
Engage in and improve the entire lifecycle of services—from inception and design, through to deployment, operation, and refinement/system tuning
Support services before they go live through activities like system design consulting, developing software platforms and frameworks, capacity planning and launch reviews
Maintain services once they are live by measuring and monitoring availability, latency, and overall system health
Identify performance bottlenecks and troubleshoot performance issues
Scale systems sustainably through mechanisms like automation, and evolve systems by advocating for changes that improve reliability and velocity
Practice sustainable incident response and postmortems
Experience with algorithms, data structures, complexity analysis, and software design
Ability to work across teams (business and technical) to continuously analyze system performance in production, troubleshoot consumer reported issues, and proactively identify areas requiring optimization
Expertise in designing, analyzing and troubleshooting large-scale distributed systems
A systematic problem-solving approach, accompanying effective communication skills, a sense of ownership, self-direction, and drive
Ability to debug and optimize code and to automate routine tasks
Practical experience in supporting application reliability practices for consumer-facing web and mobile experiences
Our tech stack includes (but is not limited to):
Kubernetes, Docker, Golang, Java, GAP, ELK, OpenTracing, Python, OpenShift, Terraform, Ansible
We started in Sofia in 2015 with a mission to ship a world-class data management and analytics engine which allows companies to automatically track and visualize KPIs in real-time and create custom insights to inform goal setting, performance management, and long-term strategic decision making. Today we operate across offices in Sofia, London, Berlin, and San Francisco.
Apply today if our mission inspires you! Join us in developing yourself and others as our Site Reliability Engineer.