Site Reliability Engineer at PubNative - Berlin, GermanySalary
€40k – €65k
PubNative is a mobile publisher platform that serves native ads via a scalable and flexible API for mobile apps and web. Our publisher-first approach focuses on the specific needs of each publisher across all verticals. Our ad serving technology is used by developers and publishers around the world.
Our system consists of a myriad of high load Golang-based APIs, iOS SDKs, Ruby/Rails 5 dashboard, Scala and Spark data- and ML pipelines, Druid OLAP system, running on a Mesos and Kubernetes cluster.
We’re always on call to keep our networks up and running, ensuring our users have the best and fastest experience possible. We follow “Infrastructure as Code” model and immutable deployment strategies.
We are looking for a Site Reliability Engineer (m/f) to help us build and operate infrastructure platforms, and provide technical consultancy to engineering teams on how to build reliable, scalable and efficient services.
Our Responsibilities: - You help us build a hybrid, poly-cloud-provider environment - You help to design, develop and operate monitoring, tracking platforms - You drive scalability and operability of supported systems/infrastructure - You participate in on-call rotation and be on-call for the services you build and support - You work with other teams to provide consultations in systems architecture support for new and existing production systems - You write code so that you can automate tasks, support SLA for Production Systems, you support other engineering teams on reliability, scalability and efficiency topics - You manage OS image/templates via Packer, provision infrastructure via Terraform - You support CI/CD and make new pipelines - You engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation, and refinement - You support services before they go live through activities such as system design consulting - You maintain services once they are live by measuring and monitoring availability, latency, and overall system health
Our Requirements: - 3+ years of experience in a Site Reliability role/Full-stack developer - Experience with public cloud providers (AWS, Google Cloud, Digital Ocean, etc.) and Infrastructure as Code (Terraform) - Strong programming skills and familiarity with modern programming languages: Go, Ruby, Python, Shell etc. - Knowledge of managing docker containers and microservices via Kubernetes - Experience building and monitoring systems and metric collection pipelines - Track record of building automation and solving multi-datacenter/clouds infrastructure problems - Knowledge of algorithms, data structures, complexity analysis, software design and reverse engineering - Interest in designing, analyzing and troubleshooting large-scale distributed systems - Experience working with source control - Git - Experience with continuous integration platforms such as TeamCity, Jenkins, CircleCI etc. - Understanding of Agile, DevOps practices such as CI/CD, automated testing etc.