Search Jobs

Site Reliability Engineer at Gtmhub - Sofia, Bulgaria

Salary €30k – €35k • Company Website https://gtmhub.com/

Job Description

Gtmhub is the world’s most beautiful and intuitive Objectives and Key Results (OKRs) management and employee experience solution. We build enterprise-scale software with a consumer-grade experience.

We help organizations amplify revenue growth by aligning every employee with their corporate purpose using the OKRs method. We are big believers in the power of employee experience to drive productivity, so our product facilitates best practice employee success features.

At heart, we are product people who love data so much that we built the only solution that integrates more than 150 data connectors to allow for true automation of progress and productivity management.

The Role

The term site reliability engineering is credited to Benjamin Treynor Sloss, Vice President of Engineering at Google. He said site reliability engineering is “what happens when a software engineer is tasked with what used to be called operations.”

To us, a Site Reliability Engineer (SRE) is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their services.

SREs design and implement automation with software to replace human labor. They want systems that are automatic, not just automated—such that their services are able to run and repair themselves.

Responsibilities

Engage in and improve the entire lifecycle of services—from inception and design, through to deployment, operation, and refinement/system tuning

Support services before they go live through activities like system design consulting, developing software platforms and frameworks, capacity planning and launch reviews

Maintain services once they are live by measuring and monitoring availability, latency, and overall system health

Identify performance bottlenecks and troubleshoot performance issues

Scale systems sustainably through mechanisms like automation, and evolve systems by advocating for changes that improve reliability and velocity

Practice sustainable incident response and postmortems

Basic Qualifications

Experience with algorithms, data structures, complexity analysis, and software design

Ability to work across teams (business and technical) to continuously analyze system performance in production, troubleshoot consumer reported issues, and proactively identify areas requiring optimization

Preferred Qualifications

Expertise in designing, analyzing and troubleshooting large-scale distributed systems

A systematic problem-solving approach, accompanying effective communication skills, a sense of ownership, self-direction, and drive

Ability to debug and optimize code and to automate routine tasks

Practical experience in supporting application reliability practices for consumer-facing web and mobile experiences

The Stack

Our tech stack includes (but is not limited to):

Kubernetes, Docker, Golang, Java, GAP, ELK, OpenTracing, Python, OpenShift, Terraform, Ansible

We started in Sofia in 2015 with a mission to ship a world-class data management and analytics engine which allows companies to automatically track and visualize KPIs in real-time and create custom insights to inform goal setting, performance management, and long-term strategic decision making. Today we operate across offices in Sofia, London, Berlin, and San Francisco.

Apply today if our mission inspires you! Join us in developing yourself and others as our Site Reliability Engineer.