Feedback
Need help? Have any feedback, feature requests or bugs? Submit it here
Feedback
Get a weekly email with all new Go jobs
x

Golang Site Reliability Jobs


Hand-Picked Go Jobs • Apply directly to companies • Clear salary ranges

Browse 50+ Golang Site Reliability Jobs in February 2021 at companies like People Connect (formerly The Control Group), Seldon and Form3 with salaries from $100,000 to $500,000 working as a Site Reliability Engineer, Site Reliability Engineer (Platform) and Site Reliability Engineer. Last post

2-Click Apply

  1. Upload Your CV
  2. Go to your Inbox & Confirm Your Application




For any enquiries on this job please contact the job poster [email protected]

Sponsored Jobs
People Connect (formerly The Control Group) Logo
Senior Software Engineer (Go)Sponsored
People Connect (formerly The Control Group)
San Diego CA, USA / Partially Remote
$80,000 to $130,000 a year
February 2021

Job Description

Senior Software Engineer Developer (Go)

This position is currently remote due to the pandemic. The role is based in San Diego, CA, USA

PeopleConnect is hiring for our People Search Division (aka The Control Group TCG) as our business is GROWING!!! We are looking for a talented, collaborative Senior Software Engineer Developer who is excited to learn/grow Go/Golang skills by moving to a Go environment on an award-winning team. Competitive salary plus quarterly bonus. Would you like to be part of a pioneering tech community in a highly successful company? Does the idea of keeping up with and learning the newest technologies with other brilliant techies sound exciting? If so, then read on!

People Search (aka TCG) is an award-winning web development company with over 15 million customers nationwide. Our cutting-edge technology connects, informs and protects people — both online and off. Our websites are consistently ranked in the top 500 top traffic sites in the US. Our products have been featured on the Discovery Channel, Mashable, Vice, Entrepreneur, Business Insider — and even made a cameo in a Disney animated comedy! A pioneer of new ideas, we’re constantly looking to develop and deploy innovative strategies and solutions. Our people and culture are second to none: we’re innovative, creative, collaborative and talented. We work hard, play hard, and together — we work magic!

Our new San Diego state-of-the-art office has stunning views of beautiful downtown, Petco Park and the Harbor. Our dog-friendly office is packed with snacks and crazy-good perks (like free massages, kombucha on tap, free catered lunches, ping pong, video games, offsite team events and more)! We offer a highly competitive salary + bonus package, 100% company paid health insurance (Medical, Dental, Vision), UNLIMITED vacation, Paid Sick Leave, Paid Holidays, Student Loan Repayment Program, 529 Education Savings Plan, Training/Education Reimbursement, free Gym Membership, Paid Parking and 401k Plan with Company Match. Check us out here!

You will already have extensive experience building applications in a service oriented or microservice architecture. You are excited to learn/grow Go/Golang skills by moving to a Go environment. You relish complex technical challenges yet prioritize simplicity in your solutions. You understand the business requirements behind the software you build. You are passionate about learning and stay current with new technologies. You play well with others yet can operate independently as needed.

Responsibilities Include (but not limited to):

  • Build and maintain ecosystem of high-volume services and APIs.
  • Scaling and optimizing services and databases for performance.
  • Own features from technical design through maintenance.
  • Build features, investigate and fix bugs, write routine-complex tests.
  • Break down complex tasks/requests into sub-tasks, make consistently good decisions, operate independently between regular or periodic check-ins.
  • Consistently use software engineering best practices.
  • Lead day to day tasks and priorities, accurately estimate time to complete tasks, resulting in high quality and high productivity for at least one (or more) product team(s).
  • Independently lead tasks to completion gathering requirements from stakeholders. May be responsible for driving initiative to completion.
  • Provide training and mentorship to other Software Engineers on their team. May do same for other teams.
  • Other duties as required.

Requirements:

  • Bachelor’s degree (or higher) in Computer Science or relevant field (or equivalent).
  • At least 3-5 years’ directly related software development experience. 5-7+ years’ preferred.
  • Advanced-expert programming skills using one or more backend languages such as Go/Golang, Python, C++, C#, Node.js or Ruby. Go/Golang preferred and highly desired.
  • Advanced-expert experience with relational databases (preferably PostgreSQL) and a deep understanding of database performance optimization.
  • Advanced–expert experience using Linux.
  • Advanced-expert understanding of underlying architecture and infrastructure that runs their team’s projects.
  • Extensive experience with cloud computing.
  • Solid experience with git.
  • Experience using Docker in production.
  • Some experience using Kubernetes and Terraform highly desirable.
  • Intermediate-advanced ability to assess/improve performance and increase observability within team’s projects.
  • Strong ability to develop unique, outside the box ideas.
  • Strong troubleshooting and problem-solving abilities.
  • Strong attention to detail.
  • Excellent communications skills and highly collaborative within their team, other teams and cross-functionally.
  • Ability and willingness to lead projects and mentor other growing software engineers.
  • Able to work with teams as well as independently with minimal supervision.
  • Exceptional work ethic, driven, self-motivated, highly accountable with strong initiative and passion.
  • Excited to learn new things and share knowledge and best practices with others.

Note for Principle Agencies - Principle agents should not forward resumes to The Control Group (TCG). TCG will not be responsible for any fees arising from the use of resumes submitted from agencies without a prior written and signed agreement and authorized job order for this position in place.


Perks & Benefits

100% paid health insurance for employee; 70% for dependents. 401k with 4% company matching. Unlimited paid vacation, 10 paid holidays, 80 hours paid sick leave. Amazing, talented, collaborative team! Leading edge tehcnology, innovation is our jam! Free massages, free gym membership and much more! Check out our website at peopleconnect.us for more info.

Apply ⎘ Copy Link ↗ Visit Link
Seldon Logo
Software EngineerSponsored
Seldon
London, United Kingdom
£60,000 to £90,000 a year
October 2020

Job Description

Seldon is looking for a Software Engineer to join our team. We are focused on making it easy for machine learning models to be deployed and managed at scale in production. We provide Cloud Native products that run on top of Kubernetes and are open-core with several successful open source projects including Seldon Core, Alibi:Explain and Alibi:Detect. We also contribute to open source projects under the Kubeflow umbrella including KFServing.

About the role Design and build scalable machine learning solutions on top of the open source and enterprise Seldon products. Working on bring the Explainable AI and ML Monitoring available in the Alibi projects into the enterprise products for general use.

Essential skills A degree or higher level academic background in a scientific or engineering subject. Familiarity with linux based development. At least 2 years of experience in industry or academia showing completed projects.

Core skills (The role will be focused on these skills so we would expect existing experience or a demonstrable desire to learn these) Experience with GoLang and Python Experience with Kubernetes and the ecosystem of Cloud Native tools. Experience using machine learning tools in production. Bonus skills (Any of these will be of great interest to us) A broad understanding of data science and machine learning. Understanding of explainable AI or machine learning monitoring in production Familiarity with Kubeflow, MLFlow or Sagemaker Familiarity with python tools for data science

About our tech stack Some of our high profile technical projects: We are core authors and maintainers of Seldon Core, the most popular Open Source model serving solution in the Cloud Native (Kubernetes) ecosystem We built and maintain the black box model explainability tool Alibi We are co-founders of the KFServing project, and collaborate with Microsoft, Google, IBM, etc on extending the project We are core contributors of the Kubeflow project and meet on several workstreams with Google, Microsoft, RedHat, etc on a weekly basis We are part of the SIG-MLOps Kubernetes open source working group, where we contribute through examples and prototypes around ML serving We run the largest Tensorflow meetup in London And much more 🚀

Some of the technologies we use in our day-to-day: Go is our primary language for all-things backend infrastructure including our Kubernetes Operator, and our new GoLang Microservice Orchestrator) Python is our primary language for machine learning, and powers our most popular Seldon Core Microservices wrapper, as well as our Explainability Toolbox Alibi We leverage the Elastic Stack to provide full data provenance on inputs and outputs for thousands of models in production clusters Metrics from our models collected using Prometheus, with custom Grafana integrations for visualisation and monitoring Our primary service mesh backend leverages the Envoy Proxy, fully integrated with Istio, but also with an option for Ambassador We leverage gRPC protobufs to standardise our schemas and reach unprecedented processing speeds through complex inference graphs We use React.js for our all our enterprise user products and interfaces Kubernetes and Docker to schedule and run all of our core cloud native technology stack

Benefits Share options to align you with the long-term success of the company. Exciting phase of fast-paced start-up challenges with an ambitious team and unlimited potential for professional growth. Access to discounted lunches, gyms, shopping and cinema tickets. Healthcare benefits. Cycle To Work Scheme.

Logistics Our interview process is normally a phone interview, a coding task, and 2-3 hours of final interview (carried out virtually). We promise not to ask you any brain teasers or trick questions. We might design a system together on a whiteboard, the same way we often work together, but we won’t make you write code on one. Our recruitment process has an average length of 3 weeks.


Apply ⎘ Copy Link ↗ Visit Link
Form3 Logo
Senior Software Engineer (Go)Sponsored
Form3
100% remote (UK/EU only)
€60,000 to €95,000 a year
November 2020

Job Description

THE TEAM

Our awesome Software Engineering team is 100% remote and consists of talented Senior Software Engineers that collaborate across 15 European countries. Our software engineers work in small, highly agile, self-managed teams. They share a common interest in engineering best practices and understand that quality is everyone’s responsibility. Their philosophy is to favour open-source collaborative development – leveraging open-source tools and communities, whilst always making sure to share their know-how back up stream. Put simply, they are cloud-native enthusiasts and DevOps advocates.

THE ROLE

At Form3 you will have the opportunity to design, develop and deploy backend cloud-native services within a powerful state-of-the-art microservices architecture. The work is cutting edge, constantly changing and focused on building highly available, low latency, scalable solutions.

Play an active role in introducing new technologies and ways of working to stay ahead of the competition, without ever compromising on quality. Contribute and collaborate with other engineers on technical and architectural decisions. Enjoy end-to-end ownership from concept to deployment, including building and operating infrastructure, toolset and deployment pipelines. Develop your skills, work on cool projects with the latest tech, all whilst working with a talented, diverse and friendly group of people.

OUR STACK

Infrastructure: AWS, GCP, Kubernetes

Platform: CockroachDB, Elasticsearch, PostgresDB, Vault, Consul, Linkerd, NATS

Tools: Terraform, GitHub, Prometheus, Pact.io

Code: Go, containerised microservices, CQRS, open-source

Ways of working: TDD/BDD, Pair Programming, 100% remote, DevSecOps

WE’RE LOOKING FOR ENGINEERS WITH

Experience in designing and building complex distributed systems

Familiarity with cloud and containerisation technologies, test automation tools and CI/CD pipelines

Interest in owning projects end-to-end and supporting them as they go live in production

Appreciation of clean code and software engineering best-practice

A passion for learning and an interest in Go (previous experience isn’t required), along with a “right tool for the job” mentality

Exceptional communication skills who enjoy sharing knowledge and collaborating with others

BENEFITS

30 days annual leave plus bank holidays

Remote first environment

Flexible working arrangements

Training tools such as Udemy and educational reimbursements

Full details are available on our careers page

ABOUT US

We are an award-winning cloud-native payment technology provider for financially regulated institutions. Launched in 2016, we’ve doubled in size year on year as we continue to redefine what a truly instant payment experience means.

We celebrate diversity, promote entrepreneurialism and are committed to giving everyone a say in shaping our business. Here you will grow as a person and accomplish incredible things. A career at Form3 is empowering, inspiring and fun. Join us and help shape the future of payments.

EQUAL OPPORTUNITIES

At Form3 we embrace equal opportunity and are committed to building a diverse team of exceptional individuals. We do not discriminate on the basis of race, religion, colour, national origin, gender, sexual orientation, age, marital status, or disability status and it is our strong belief that the more inclusive we are as a business, the better our work will be.


Apply ⎘ Copy Link ↗ Visit Link
Tilia Pay Logo
Software EngineerSponsored
Tilia Pay
Remote, USA
$120,000 to $150,000 a year
January 2021

Job Description

Remote within USA (CA, CO, FL, GA, MA, NH, TX, VA, WA only)

Our mission is to build new economies by enabling our partners to compensate their content creators for the digital goods and services they produce. Here on the Ecom engineering team, we accomplish this by building a growing set of financial capabilities on top of our regulatory licenses. Some of these capabilities include processing payments and payouts, verifying user identities, detecting fraud and enforcing sanctions. Additionally, these systems have an expanding set of tools around them to be used by our partners and customers.

Basically, we write code that lets users sell digital hats in video games and get paid real money.

This position is for a Software Engineer on the Ecom team. The primary responsibility is to design and build the APIs that facilitate our capabilities. This is a fast-paced team and we are responsible for the full life cycle of our code. We break large systems down into component parts to be concurrently worked on, which requires that we be in lock step with each other. This means we highly value dependability and communication. We are iterative in nature, both as it applies to the code as well as our own processes. We build cool stuff, we weigh risk/reward, and when we make mistakes, we respond quickly and together and without blame. This is a team in the truest sense.

You will:

  • Take features through their entire lifecycle - design, implementation, test, documentation, deployment, production monitoring, outage response, and usage analysis
  • Design the API spec and implement it, to enable core business capabilities around payments, payouts, identity verification, fraud detection, sanction enforcement, and tooling
  • Communicate not just with the team, but also directly with our partners and vendors
  • Participate in our culture of continuous improvement to make both the tech and the team even better
  • Learn about and contribute to financial technology

You need:

  • Experience with Golang
  • Experience with SQL
  • Experience with UNIX/Linux
  • Broad exposure to common web technologies
  • Proficiency in scripting languages
  • The ability to work independently and collaboratively in a remote environment
  • Excellent written and verbal communication skills
  • 6 years of experience in web software engineering
  • Bachelor’s degree in a technical field or equivalent experience

What we use and teach:

  • Golang, MySQL, Python
  • Docker, Drone, Jenkins, Terraform
  • Automated testing, Continuous Integration and Deployment
  • Microservices, Lambda Functions, SNS/SQS

Apply ⎘ Copy Link ↗ Visit Link
20 of 83 Site Reliability Jobs • Sort by Date
Site Reliability Engineer
Gtmhub
Sofia, Bulgaria
€30,000 to €35,000 a year
July 2019

Job Description

Gtmhub is the world’s most beautiful and intuitive Objectives and Key Results (OKRs) management and employee experience solution. We build enterprise-scale software with a consumer-grade experience.

We help organizations amplify revenue growth by aligning every employee with their corporate purpose using the OKRs method. We are big believers in the power of employee experience to drive productivity, so our product facilitates best practice employee success features.

At heart, we are product people who love data so much that we built the only solution that integrates more than 150 data connectors to allow for true automation of progress and productivity management.

The Role

The term site reliability engineering is credited to Benjamin Treynor Sloss, Vice President of Engineering at Google. He said site reliability engineering is “what happens when a software engineer is tasked with what used to be called operations.”

To us, a Site Reliability Engineer (SRE) is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their services.

SREs design and implement automation with software to replace human labor. They want systems that are automatic, not just automated—such that their services are able to run and repair themselves.

Responsibilities

Engage in and improve the entire lifecycle of services—from inception and design, through to deployment, operation, and refinement/system tuning

Support services before they go live through activities like system design consulting, developing software platforms and frameworks, capacity planning and launch reviews

Maintain services once they are live by measuring and monitoring availability, latency, and overall system health

Identify performance bottlenecks and troubleshoot performance issues

Scale systems sustainably through mechanisms like automation, and evolve systems by advocating for changes that improve reliability and velocity

Practice sustainable incident response and postmortems

Basic Qualifications

Experience with algorithms, data structures, complexity analysis, and software design

Ability to work across teams (business and technical) to continuously analyze system performance in production, troubleshoot consumer reported issues, and proactively identify areas requiring optimization

Preferred Qualifications

Expertise in designing, analyzing and troubleshooting large-scale distributed systems

A systematic problem-solving approach, accompanying effective communication skills, a sense of ownership, self-direction, and drive

Ability to debug and optimize code and to automate routine tasks

Practical experience in supporting application reliability practices for consumer-facing web and mobile experiences

The Stack

Our tech stack includes (but is not limited to):

Kubernetes, Docker, Golang, Java, GAP, ELK, OpenTracing, Python, OpenShift, Terraform, Ansible

We started in Sofia in 2015 with a mission to ship a world-class data management and analytics engine which allows companies to automatically track and visualize KPIs in real-time and create custom insights to inform goal setting, performance management, and long-term strategic decision making. Today we operate across offices in Sofia, London, Berlin, and San Francisco.

Apply today if our mission inspires you! Join us in developing yourself and others as our Site Reliability Engineer.


Apply ⎘ Copy Link ↗ Visit Link
Monzo Logo
Site Reliability Engineer (Platform)
Monzo
London, UK / Remote (EU)
£59,000 to £116,000 a year
September 2020

Job Description

At Monzo we’re aiming to build the best current account in the world. We are always keen to hear from capable, creative engineers who want to help us accomplish that goal 🚀

We’re currently looking for Site Reliability Engineers (SREs) to join our Platform team.

We’re looking for SREs who are software engineers at heart - you’re as comfortable writing software to solve problems as you are operating AWS or Kubernetes. If you’re a software engineer who has some good cloud infrastructure experience already, or you’re eager to get really familiar with systems, tooling and libraries, this could be the role for you.

As a team, we’re responsible for designing, building, and operating the services we consume from AWS, along with the software we run on top like Kubernetes, Cassandra, Prometheus, and Kafka. We’re also responsible for operating our three physical data centres, our network, and being on-call for the things we own and run.

To achieve this, we’re organised into three squads within the Platform Group; Infrastructure Platform, Storage Platform, and Backend Platform. Each squad is responsible for solving a specific set of problems for our customers and our engineers. We’re looking for engineers who are interested in joining our Infrastructure Platform or Storage Platform squads right now, but there are opportunities to move between them as you gain experience with our platform.

We’ve posted a good overview of our platform on our blog if you’d like to learn more.

We’re investing a lot of up-front effort in building a scalable, secure, and extensible architecture for our millions of customers. Come and help us build a state-of-the-art microservices platform and build the kind of bank you want to use.

Our engineers have a variety of different backgrounds

We have several non-graduates; only some of us studied Computer Science; some of us have worked in huge companies; some have only ever worked in startups; others are former consultants. As long as you enjoy learning new things, we’d love to talk to you. We do not ask for formal qualifications or degree requirements for any of our engineering roles.

We are actively creating an equitable environment for all of our engineers to thrive

Diversity and inclusion are a priority for us and we are making sure we have lots of support for all of our people to grow at Monzo. We provide a sponsorship framework in Engineering for women and people of colour; all of our leaders are trained on privilege awareness and we are creating partnerships with organisations dedicated to supporting underrepresented groups. You can read more in our 2020 Diversity and Inclusion report.

Monzo works in project-based sprints in small, interdisciplinary teams

We have around 150 engineers out of roughly 1,400 people in total - and we have big ambitions. There are many interesting challenges ahead, and we’re happy for people to move between teams or to specialise, whatever you prefer. As an engineer here you’d be able to work directly with anyone across the company, and we run regular knowledge-sharing sessions so you’ll learn heaps about everything from how banks work to effective communication.

We encourage an open and transparent working environment

You can get involved in any aspect of the business you are interested in and, following Stripe’s example, all emails in the company are visible in an email archive. We contribute to open source software as much as possible. We’ve also made our product roadmap public and give sneak peeks of features in our community forum. Our technology blog is a good place to learn even more about what we do!

At Monzo you will get to work with a lot of exciting new technology.

We rely heavily on the following tools and technologies:

You should apply if:

Our open roles are for mid-level to senior Site Reliability Engineers at present. Apply if:

  • the work we’re doing sounds exciting!
  • you’re a software engineer at heart and you’re comfortable writing software to solve problems
  • you’re interested in distributed systems and writing resilient, scalable software
  • you have strong experience working on the backend of a technology product
  • you’re familiar with some of our Platform technologies, or specialise in just one part
  • you want to help build, scale and operate a platform to support a product that you (and everyone you know) use every day
  • you’re keen to learn more about new technologies and the arcane inner workings of the financial industry
  • you’re comfortable working in a team that deals with ambiguity

Logistics

Salary ranges between £59,000 - £116,000 plus stock options and other benefits.

We can help you relocate to London & we can sponsor visas.

This role can be based in our London office, but we’re open to distributed working (as long as you can spend around 20% of your time in London).

We have payroll set up in four countries: the UK, Ireland, France, and Spain. Right now, we can only hire people who work from those countries and we’ll keep this updated with new ones as we expand and are able to hire from more places 🌎

We’re usually always hiring for engineers, so there’s no closing date for this job.

We offer flexible working hours and trust you to work enough hours to do your job well, at times that suit you and your team.

Diversity and inclusion is a priority for us – if we want to solve problems for people around the world, our team has to represent our customers. So we need to attract the best talent and create an environment that supports and includes them. You can read more about diversity and inclusion on our blog.

If you prefer to work part-time, we’ll make this happen whenever we can - whether this is to help you meet other commitments or strike a great work-life balance.

Our interview process is normally a phone interview, a coding task and call to discuss it, and 2-3 hours of onsite interviews that can be conducted via hangouts as well. We promise not to ask you any brain teasers or trick questions. We might design a system together on a whiteboard, the same way we often work together, but we won’t make you write code on one.

Equal Opportunity Statement

At Monzo, embracing diversity in all of its forms and fostering an inclusive environment for all people to do the best work of their lives with us. This is integral to our mission of making money work for everyone.

We’re an equal opportunity employer. All applicants will be considered for employment without attention to ethnicity, religion, sexual orientation, gender identity, family or parental status, national origin, veteran, neurodiversity status or disability status.


Perks & Benefits

https://monzo.com/careers/#benefits

Apply ⎘ Copy Link ↗ Visit Link
Site Reliability Engineer
PubNative
Berlin, Germany
€40,000 to €65,000 a year
October 2018

Job Description

PubNative is a mobile publisher platform that serves native ads via a scalable and flexible API for mobile apps and web. Our publisher-first approach focuses on the specific needs of each publisher across all verticals. Our ad serving technology is used by developers and publishers around the world.

Our system consists of a myriad of high load Golang-based APIs, iOS SDKs, Ruby/Rails 5 dashboard, Scala and Spark data- and ML pipelines, Druid OLAP system, running on a Mesos and Kubernetes cluster.

We’re always on call to keep our networks up and running, ensuring our users have the best and fastest experience possible. We follow “Infrastructure as Code” model and immutable deployment strategies.

We are looking for a Site Reliability Engineer (m/f) to help us build and operate infrastructure platforms, and provide technical consultancy to engineering teams on how to build reliable, scalable and efficient services.

Our Responsibilities: - You help us build a hybrid, poly-cloud-provider environment - You help to design, develop and operate monitoring, tracking platforms - You drive scalability and operability of supported systems/infrastructure - You participate in on-call rotation and be on-call for the services you build and support - You work with other teams to provide consultations in systems architecture support for new and existing production systems - You write code so that you can automate tasks, support SLA for Production Systems, you support other engineering teams on reliability, scalability and efficiency topics - You manage OS image/templates via Packer, provision infrastructure via Terraform - You support CI/CD and make new pipelines - You engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation, and refinement - You support services before they go live through activities such as system design consulting - You maintain services once they are live by measuring and monitoring availability, latency, and overall system health

Our Requirements: - 3+ years of experience in a Site Reliability role/Full-stack developer - Experience with public cloud providers (AWS, Google Cloud, Digital Ocean, etc.) and Infrastructure as Code (Terraform) - Strong programming skills and familiarity with modern programming languages: Go, Ruby, Python, Shell etc. - Knowledge of managing docker containers and microservices via Kubernetes - Experience building and monitoring systems and metric collection pipelines - Track record of building automation and solving multi-datacenter/clouds infrastructure problems - Knowledge of algorithms, data structures, complexity analysis, software design and reverse engineering - Interest in designing, analyzing and troubleshooting large-scale distributed systems - Experience working with source control - Git - Experience with continuous integration platforms such as TeamCity, Jenkins, CircleCI etc. - Understanding of Agile, DevOps practices such as CI/CD, automated testing etc.


Apply ⎘ Copy Link ↗ Visit Link
Senior Engineer Tools & Platforms SRE
Digital Ocean
New York / Cambridge / Palo Alto, USA / Remote
$155,000 to $190,000 a year
July 2019

Job Description

Do you ever wonder what happens inside the cloud?

Based in New York, DigitalOcean is a dynamic, high-growth technology company that serves a robust and passionate community of developers, teams, and businesses around the world. We believe that today’s entrepreneurs are changing the world through software. Our mission is to empower these entrepreneurs by bringing modern app development within reach for any developer, anywhere in the world.

We want people who are passionate about building the systems, culture, and processes that will improve the resiliency, reliability, scaling, and performance for cloud services.

We are looking for an experienced Site Reliability Engineer to work closely with our product engineering and infrastructure teams. Reporting to the Director of Platform Systems, the Site Reliability Engineer will be performing a mix of hands-on development, coaching, and collaborating with other teams and stakeholders to help bring DigitalOcean’s engineering systems and culture up to the next level.

This is a key opportunity to make a significant impact in DigitalOcean’s engineering and operational systems and influence future product designs and requirements. This role is essential to accelerate the improvement of the high expectations our customers have of DigitalOcean as we continue to grow and expand.

What You’ll Be Doing:

  • Performing hands on technical work to directly improve the reliability, resiliency, and scaling of our key platform systems
  • Working with stakeholders to develop and implement reliability and performance metrics
  • Facilitate DigitalOcean’s culture of learning by providing insight and recommendations for improvement
  • Coaching teams and individuals on reliability best practices and solutions
  • Working with other SREs and engineering leaders to define the architectures and practices that should be adopted in order to deliver on our engineering and operational goals
  • Establishing best practices for development, architecture, deployment, and operations
  • Working with peer SREs to improve services and processes (including architecture reviews, incident response, monitoring) in a cross-functional manner throughout the engineering organization

What We’ll Expect From You:

  • Distinguished track record as SRE (or similar role) with hands-on experience implementing reliability, process, and scaling solutions
  • History of fostering positive relationships with stakeholders and a track record of successful collaboration and coaching
  • Clear communication skills (both written and verbal) to document processes and architectures
  • Experience implementing disaster recovery best practices
  • Developing robust solutions that facilitate streamlined resolution of customer inquiries through use of technologies for automation, deflection, and issue management
  • Adept in Ruby and Go with a broad understanding of the full technology stack for a modern infrastructure
  • Advocate of effective development environments with the use of CI/CD tooling and configuration management technologies such as Chef or Ansible

Why You’ll Like Working for DigitalOcean:

  • We have amazing people. We can promise you will work with some of the smartest and most interesting people in the industry. We work hard but we always have fun doing it. We care deeply about each other and take our “no jerks” rule very seriously.
  • We value development. We are a high-performance organization that is always challenging ourselves to continuously grow. That means we maintain a growth mindset in everything we do and invest deeply in employee development. You’ll need to be great to get hired here and we promise you’ll get even better.
  • We care about you. We offer competitive health, dental, and vision benefits for employees and their dependents, a monthly gym reimbursement to support your physical health, and a monthly commute allowance to make your trips to and from work easier.
  • We invest in your future. We offer competitive compensation and a 401k plan with up to a 4% employer match. We also provide all employees with Kindles and reimbursement for relevant conferences, training, and education.
  • We want you to love where you work. We have great office spaces located in the heart of SoHo NYC and Cambridge and offer daily catered lunches to keep your hunger at bay. We’re also very remote-friendly—we use Slack to communicate across the company—and all remote employees have the opportunity to onboard in-office and take an all-expenses paid trip to our annual company offsite, Shark Week, to get quality in-person time with the team at least once a year. We also allow employees to customize their workstations to meet their needs—whether remote or in office.
  • We value diversity and inclusivity. We are an equal opportunity employer and we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Apply ⎘ Copy Link ↗ Visit Link
Rebellion Defense Logo
Site Reliability Engineer
Rebellion Defense
Washington, DC / Chicago, Illinois, USA
$100,000 to $200,000 a year
November 2020

Job Description

We are looking for a Site Reliability Engineer (SRE). As an SRE, you will be tasked with the reliability and operation of our production environments. SREs are tasked with ensuring teams within the company receive help maintaining software at scale, as well as help designing and developing software for scale. SREs are expected to engage with the product teams to ensure the delivery of our software is as seamless as possible.

These position is based out of our Washington D.C. or Chicago Illinois office locations. An active clearance or ability to obtain TS/SCI clearance will be required.

We look for a track record of the following:

  • Coming alongside high energy engineering teams to enable the adoption of best practices to enable the scalability and reliability of deployed software,
  • Defined architecture and built services at scale on public infrastructure such as AWS and Azure,
  • Experience designing, implementing, deploying, and operating high scale production services,
  • Experience facilitating the definition and implementation of SLIs and SLOs,
  • Understanding how to carefully spend error budget to handle regular deployment of large changes to production,
  • Deep experience in Linux operating systems, and systems engineering,
  • Comfort delivering critical software in Go and Python,
  • Willingness to debug problems across the stack,
  • Comfortability with working on underspecified problems and are capable of rapidly learning and iterating on solutions,
  • Experience building the wrong system enough times to avoid the common pitfalls, whether building something personally or advising others.

You might be a good fit if you:

  • 5+ years of relevant SRE experience in the tech industry,
  • demonstrable knowledge of TCP/IP, HTTP, web application security and experience supporting web application architecture,
  • experience working with a variety of storage systems, application architectures, compute infrastructure and network management systems,
  • experience designing, implementing, deploying, and operating high scale production service,
  • defined architecture and built services at scale on public infrastructure such as AWS and Azure, proven knowledge at least one higher-level language (eg. Python and Golang),
  • The ability and desire to build and learn new systems with new technologies.

Rebellion is a well-capitalized technology start-up firm that is passionate about defining and delivering modern, life-changing software products to the US Department of Defense (DoD), the UK Ministry of Defence (MoD), and their allies. At Rebellion we believe in operating what we own, we deliver all of our products as managed services, this allows our product teams to maintain operational ownership across all deployments. Expect talented, motivated, intense, and interesting co-workers.

Compensation includes meaningful equity ownership, competitive salaries, full medical coverage, disability and life insurance, and transit reimbursement.

An Equal Opportunity Employer/Veterans/Disabled. Rebellion Defense is an equal opportunity employer and makes employment decisions on the basis of merit and business needs. Rebellion Defense does not discriminate against applicants on the basis of race, color, religion, sex, sexual orientation, gender, gender identity, national origin, veteran status, disability, or any other protected characteristic in accordance with federal, state, and local law.


Apply ⎘ Copy Link ↗ Visit Link
Castor EDC Logo
Site Reliability Engineer
Castor EDC
Amsterdam, The Netherlands
€60,000 to €80,000 a year
February 2020

Job Description

Our true purpose at Castor

Castor is one of the leading platforms for data collection in medical research. We believe standardizing and reusing datasets is key to overcoming the healthcare challenges of the future.

How we operate

Our main Electronic Data Capture (EDC) application runs on a proven stack consisting of Ubuntu, Nginx, PHP and MySQL. For our cloud installations, we orchestrate these setups by using Terraform combined with Ansible for the server configuration management.

Due to the nature of processing medical data, we have clients in different regions across the globe, often with specific regulatory constraints around where and how their research data is stored. To meet these customer demands we combine both traditional as well as cloud-based hosting solutions.

Most of our clients prefer to run in Azure, but we’re using Google Cloud Platform for things like Kubernetes hosting of greenfield projects, blob storage for scalable file upload storage and their Key Management System (KMS) to further secure our data.

For our metrics we’ve begun standardizing on Prometheus and we’re moving towards Loki for log aggregation. We use PagerDuty for alerting, communicate via Slack and host our code on Github.

Why we’re growing our team

With our recent expansion have come new challenges, both in how we organize ourselves and in how we manage and scale our infrastructure in the future.

To further these efforts we have formed a Platform team consisting of SRE and Software Engineering, which we are now looking to grow with the addition of an additional SRE.

Additionally, due to the sensitive nature of medical data, Castor is certified for both ISO/IEC 9001 (quality) and 27001 (Information security). In addition, we have to adhere to a number of other regulations, including Good Clinical Practice (GCP) guidelines.

Our goal is to unite these requirements with emerging SRE practices around infrastructure as code and other principles to create a well designed and documented system, while still allowing us to remain flexible to change.

How you will contribute

Our absolute commitment to patient data security and privacy informs our vendor selection with certified datacenter and cloud providers. To achieve real impact in medical research, Castor needs to operate security around the world.

Historically, our production platform has run on top of managed hosting services. This model doesn’t scale well for our global, international footprint, which is why we are currently expanding our in-house knowledge and transitioning to Infrastructure-as-a-Service providers.

As a Site Reliability Engineer, you’ll have the ability to shape our operations and continuously deliver a working product. Working very closely with the development teams, you’ll collaborate in supporting and structuring our efforts around automation, observability and security. With your help we plan to scale the Castor platform to the next level.

Some things we worked on recently

Whilst there are many operational challenges as we continue to grow and scale at Castor, our Platform team has made great improvements to a variety of our systems already. To give you some examples of what we achieved last month:

  • Migrated our DNS to AWS Route53
  • Set up automatic documentation pipelines using MkDocs
  • Moved our CI/CD pipelines from Jenkins to CircleCI
  • Built a key-service on AWS Lambda to store disk encryption keys off-site for an otherwise region-local setup

Your background

You have helped run web-facing services under production workloads and have experienced the challenges that come with maintaining and scaling these systems. Making and owning decisions about systems architecture together with your team is something you enjoy and feel comfortable with.

Qualities we’re looking for include:

  • A good grasp on how *NIX systems operate
  • The ability to evaluate and implement best practices for IT operations
  • A working knowledge of both cloud-native and traditional systems architecture and the trade-offs between them
  • Experience with a configuration management framework such as Ansible, Chef, Puppet or SaltStack
  • The ability and desire to work with a wide range of open source technologies
  • A strong privacy- and security mindset
  • Experience with some aspects of Observability and distributed systems: from monitoring, logging and metrics instrumentation to resiliency to failure
  • A good understanding of how relational databases operate
  • Experience with at least one programming or scripting language, preferably Python or Go(lang)
  • Knowledge that a list of skills and requirements doesn’t mean you have to tick every single box to apply ;)

How we say thank you

At Castor we truly live our core values, believing we can achieve anything with a healthy and happy team. With this in mind, we offer the following benefits:

  • Our own ‘Castor Burrow’ - brand new offices by Amsterdam Amstelstation
  • A competitive salary plus an annual company bonus plan
  • Employee Stock Option Programme incentive
  • 30 days annual leave plus 6 public holidays
  • An individual training and professional development budget
  • Flexible working with the opportunity to work from home 1 day per week
  • Meditation room with daily yoga, mindfulness and company subscription to Calm
  • Lunch and healthy snacks in the office every day
  • A new Mac or Dell laptop

Apply ⎘ Copy Link ↗ Visit Link
Site Reliability Engineer
Goldman Sachs
London, United Kingdom
£40,000 to £100,000 a year
November 2018

Job Description

Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. At Goldman Sachs, SRE is responsible for the availability and reliability of our firm’s most critical platform services, and ensures they meet the requirements of our internal and external users. We look for engineers who are motivated to collaborate with our businesses to build and run sustainable production systems, which can evolve and adapt to changes in our fast-paced, global business environment.

Skills & Requirements

  • Proficiency in one or more of the following: Go, Python, C, C++, Java, Perl, Ruby or shell scripting
  • Experience with algorithms, data structures and software design
  • Experience with UNIX operating systems internals and / or networking
  • Experience with distributed systems design, maintenance, and troubleshooting
  • Hands-on experience with debugging and optimizing code, as well as automation
  • Strong interpersonal skills, drive, and ownership
  • Coding beyond simple scripts
  • Solving novel problems from first principles

ABOUT GOLDMAN SACHS

The Goldman Sachs Group, Inc. is a leading global investment banking, securities and investment management firm that provides a wide range of financial services to a substantial and diversified client base that includes corporations, financial institutions, governments and individuals. Founded in 1869, the firm is headquartered in New York and maintains offices in all major financial centers around the world.


Apply ⎘ Copy Link ↗ Visit Link
Netflix Logo
Senior Site Reliability Engineer, CORE
Netflix
Los Gatos, California, USA
$250,000 to $500,000 a year
January 2020

Job Description

At Netflix, we strive to bring joy to people across the world through amazing stories. As we grow internationally, we are continually enhancing our cloud-based infrastructure to improve our performance, scalability, and reliability.

The SRE team’s goal is to ensure customer joy by successfully managing risk and minimizing impact across Netflix. We do this through cross-functional engagement with other engineering teams, managing issues when they happen, as well as promoting reliability and resilience practices throughout the organization.

Outcomes

  • Improve our incident management lifecycle to identify, mitigate, and learn from reliability risks
  • Increase our reliability through establishing guidance and methods of improvement
  • Form and maintain relationships with internal and external partners
  • Develop deeper insights and analysis into the quality of experience for our customers

We Value

  • Curiosity about how complex sociotechnical systems successfully operate at scale when failure is inevitable
  • People who see influence as their preferred tool for cultivating relationships
  • Collaboration and continuous improvement
  • A desire to learn and readiness to teach
  • Iteration as the path forward

Our Work

  • Drive incidents to resolution by coordinating with multiple engineering teams
  • Identify sources of instability in large-scale distributed systems and drive operational excellence
  • Analyze complex systems from a reliability and resilience perspective
  • Engage with product teams to diagnose operational surprises and carry forward improvements
  • Improve reliability and drive down the burden of toil with tooling and automation

Nice to Have

  • Experience with global, continuous delivery methods
  • Development with Python, Go, Java, or JavaScript/Node.js
  • Involvement with incident management and response
  • Knowledge of cloud platforms like AWS and microservices architecture
  • Deep network analysis
  • Linux systems engineering capability

Things that show how we think


Apply ⎘ Copy Link ↗ Visit Link
Micro Logo
Senior Software Engineer or Site Reliability Engineer
Micro
London, United Kingdom
£65,000 to £80,000 a year
October 2019

Job Description

We’re looking for a senior software engineer or site reliability engineer with experience in Go, microservices, distributed systems and cloud-native technology to come help build a global services platform for developers.

Cloud-native development has become massively complex in a world filled with docker, kubernetes, envoy, istio and much more. We want to abstract away all of this complexity and build a global platform for developers to build and share services.

You should have experience building distributed systems in Go and have battled with cloud-native technologies. You should have a disdain for the way software is built today and want to play a role in changing how we build software in the future.


⎘ Copy Link ↗ Visit Link
Site Reliability / Go Software Engineer
collectAI
Berlin, Germany
€50,000 to €70,000 a year
October 2018

Job Description

collectAI provides receivables management, covering the end-to-end process from e-invoicing and dunning to debt collection. Focusing on digital communication channels, automation and machine learning gives our solution an edge over traditional approaches. We communicate with customers via their preferred channels, at their favored time and enable them to pay easily. Companies benefit due to higher customer retention rate, reduced costs and improved repayment rates.

collectAI was founded in 2016 and is part of Germany’s largest e-commerce retailer, the Otto Group. Our international team currently consists of 30 professionals mostly working in our Hamburg office.We are looking for a (Senior) Golang Software Engineer / Site Reliability Engineer to join our Berlin-based engineering team: You will be creating, improving and operating micro-services written in Go as well as contributing to tools and systems that enable other teams to deploy services quickly and operate them reliably.

Our architecture is currently based on micro-services written in JavaScript, Python and Go. We use NATS for event streaming and utilize AWS’ RDS in our persistence layer. Services are deployed in Kubernetes and monitored with Prometheus. We build our frontends mostly with React.

Basic Qualifications

  • Strong problem-solving skills

  • Good understanding of computer science fundamentals

  • Passion for clean, simple and robust code

  • Solid knowledge of Go

  • Preferred Qualifications

  • Exposure to Docker, Kubernetes and Prometheus

  • Experience with JavaScript

  • Knowledge of micro-service principles and best practices

Benefits - Regularly visit our headquarter in Hamburg’s beautiful Hafencity - Shape our Berlin-based team as one of its first members - Well-funded and part of Germany’s largest online retailer, the Otto Group - Option to partially work remotely - Budget for conferences, books, trainings etc. - Free choice of hardware and software


Apply ⎘ Copy Link ↗ Visit Link
Golang Software Engineer
World Open Network
Menlo Park, USA
$100,000 to $130,000 a year
July 2019

Job Description

Please provide your personal blog and Github address in your notice of interest.

Job Description We are an exciting start-up company founded by proven leaders with repeated success in the technology space. Our newest company is developing a cryptocurrency platform based on an open-source third generation blockchain that we’re creating. Our goal is to set a new standard in security and protection for our end users and community.

We’re looking for a Golang Software Engineer who combines software and systems engineering to build and run large-scale, scalable, massively distributed, fault-tolerant systems for the Crypto market place. Reporting directly to the VP of Engineer the Golang Software Engineer makes sure that WON’s services, both internal and externally- visible systems have reliability, rock solid uptime to meet our users’ needs and quick improvement while being responsible for capacity, performance and scalability.

Responsibilities - Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation and refinement. - Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews. - Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.

Qualifications Minimum qualifications: - 2+ years of recent server-side experience in Golang - Knowledge of web technologies including HTML, CSS, JavaScript (JQuery or AngularJS), WebSockets is an advantage - Experience working with MySQL - Experience with Redis, MongoDB or other NoSQL solutions - Understanding of how to build and consume REST APIs - Building modular and scalable code - A sense of humor and thirst for knowledge

Preferred qualifications: - Interest in designing, analyzing and troubleshooting large-scale distributed systems. - Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive. - Ability to debug and optimize code and automate routine tasks.

Please provide your personal blog and Github address in your notice of interest.

Benefits - Competitive Salary - Awesome bonus - 20 days annual leave - 8 days personal leave - 100% medical, dental and vision insurance - Life insurance - 401(k) and FSA - Free shuttles between Caltrain Menlo Park and office - Gym on site, accessible 247 - Located on corner of Marsh Road and 101, by the Dumbarton Bridge exit. - Loads more!


Apply ⎘ Copy Link ↗ Visit Link
Reliability Engineer (Software)
Agricool
Paris, France
€30,000 to €80,000 a year
October 2018

Job Description

MISSION : Your mission will be to ensure that Agricool’s cooltainers operate as designed to deliver best of class taste and yield. Your responsibilities will include :

  • You will handle software and hardware outages of production systems in a timely manner, document those incidents and communicate about them transparently
  • Fix issues at their root origin, help design and architect systems to minimize risk in production environments
  • Continuously improve the monitoring & alerting aspects of systems and services from an operations perspective
  • Apply state of the art security practices to ensure cooltainers systems and services are protected against intrusion and malevolence
  • Help build necessary tools & infrastructure to grow the operations and support team to handle very large fleets of cooltainers
  • Work closely with the product teams to improve the reliability, resilience and security of the cooltainers systems
  • Define and implement with the support and operations teams processes to interact with the cooltainers systems and provide remote assistance to on-site teams

REQUIREMENTS : - Problem-solving mindset, appetite for diagnosing various classes of complex problems - Very good knowledge of the linux kernel, including the network stack - Good knowledge of at least one scripting language (shell, ruby, python, etc.) - Experience with at least one IT automation platform (Chef, puppet, ansible, etc.) - Software polyglot: ability to read and understand advanced Java, C++, golang, TypeScript, etc.) - Ability to jump between various technologie, open-minded and believing in the “right tool for the job” - Hardware and micro-electronics culture


Apply ⎘ Copy Link ↗ Visit Link
Senior Site Reliability Engineer
Tendermint
San Francisco, USA / Berlin, Germany / Toronto
$100,000 to $150,000 a year
October 2018

Job Description

We’re looking for someone who has: - At least 5 years of software engineering experience with open source contributions. - Written structured, high-quality programs and scripts for automation. - Significant experience writing Golang or the ability and desire to become proficient in new languages. - Experience developing, releasing, and maintaining production software and infrastructure tools like Elastic stack, InfluxDB stack, DataDog, PagerDuty, or VictorOps. - Built solutions with a broad set of technologies in and around cloud solutions (AWS EC2, ECS, Route53, DynamoDB, RDS, Lambda, Docker, - Google Container Engine, Kubernetes or Docker Swarm). - Implemented continuous deployment before (Jenkins, CircleCI, Travis, Ansible, Chef, Puppet). - Experience with SDLC tools (Git, GitHub, Atlassian Stash/Bitbucket, GitLab, JIRA). - Experience with QA/SIT tools (Selenium). - Experience in Linux System administration including package management, network management, and security management. - Familiarity with open source P2P networking protocols. - Experience working in an agile development environment. - The ability to take ownership and see initiatives through. - Exceptional communication skills. - Experience working with distributed teams.

What your primary responsibilities will be: - Help scale software systems with automation, in an effort to improve reliability, velocity, and simplicity. - Create, maintain, and improve the tooling for continuous integration and continuous delivery. - Build and maintain tooling for deploying, monitoring, and maintaining clusters of Tendermint nodes on our testnets and mainnets. - Build and maintain tooling to help shorten feedback cycles within teams and projects. - Plan, build, and maintain public facing services in association with business goals. - Build tools to measure and monitor availability, latency and overall system health.

Apply : https://goo.gl/forms/jpdRI1wD8pdfoqKl2


Apply ⎘ Copy Link ↗ Visit Link
Netlify Logo
Senior Backend Engineer (Go/Rust)
Netlify
Remote (Americas, EMEA)
€70,000 to €90,000 a year
February 2021

Job Description

Company Overview

At Netlify, we’re building a platform to empower digital designers and developers to build better, more elaborate web projects than ever before. We’re aiming to change the landscape of modern web development.

We recently raised $53M in Series C funding to bring forward the next generation of tooling for a more accessible web. This round was led by the EQT Ventures with participation from existing investors Andreessen Horowitz & Kleiner Perkins. This brings Netlify’s funding raised in total to $97M to date. Other past investors include Bloomberg Beta, Designer Fund, and Tank Hill Ventures, as well as the founders of Figma, GitHub, Slack and Yelp.

Netlify is a diverse group of incredible talent from all over the world. We’re ~44% woman or non-binary, and are composed of about half as many nationalities as we are team members.

About the Opportunity:

At Netlify, we’re building a system that supports millions of customer sites, processing over a petabyte of data. Over 10% of Internet users visit at least one site hosted by Netlify every 30 days. With our team, we truly empower our engineers through an autonomous pod-based model that allows our teams to own various stages of the customer journey. We’ve been remote-first since our inception and are globally distributed, spanning across North America, Europe, and Africa. We’re biased towards asynchronous planning and communication, meaning less meetings and more execution. We take documentation seriously and place our values of transparency, empowerment, and commitment at the forefront of everything we do. We’re driven by passion and we make sure that everyone on the team knows their value, feels ownership over their work, and can quickly see the impact of their efforts. Beyond just hiring smart, empathetic team members, we foster a culture where there are no dumb questions and our team can get access to the resources that they need to continue to learn. As a remote-first company, diversity drives our identity. Whether you’re looking to launch a new career or grow an existing one, Netlify is the type of company where you can balance great work with great life.

As a Backend Engineer at Netlify, you’ll work with a smart set of team members who are very motivated to keep learning and continuing to grow each other in a supportive way. We have a blameless culture where we solve problems as a team and everyone works together towards a common goal. There are different backend-oriented teams that your interests and experience could lead you into.

With our Observability team, your mission is to help our customers monitor and troubleshoot their apps, and evaluate their health and performance when exposed to real user traffic. You’ll be working with large amounts of streaming data, using a variety of technologies to process and store this data, providing our customers valuable information about their apps. If you’re excited about working with Go, Kafka, and Clickhouse, among other technologies, you’ll have plenty of opportunities to do so with our team. The team ships changes several times a day, so you’ll quickly see the impact of your work.

With our Runtime & Integrated Apps team, your mission is to design and implement fault-tolerant distributed systems and create the supporting features that they leverage. You’ll be working across a variety of technologies to solve problems around the massive traffic that we receive on the platform, so if you’re excited about working in complex Go or Rust code, you’ll have plenty of opportunities to do so with our team! The platform is at the core of Netlify, where you’ll be developing software that powers the lifecycle of a web request, enables developers to interact with the edge, and strives for better availability and higher throughput. This includes functionality like Edge Lambda invocation, caching & invalidation, request rules, pre-rendering, and logging aggregation. The platform is built on top of 6 different cloud providers and is truly global, supporting constant traffic from all over the world. We move quickly and adjust to changing priorities and conditions, and you’ll be able to help us focus on key priorities and pragmatic solutions.

What You’ll Bring:

  • A breadth of experience in compiled programming languages. Our main language is Go, but we also have projects span across multiple languages. We believe in picking the right language for the right problem.
  • An extensive history of delivering product features & deploying services with a high level of comfort iterating on a system while it is constantly serving traffic. Our system is always on with demanding availability and throughput challenges.
  • A good sense of how to work with web & CDN technologies, with some experience around systems performance and analysis and previous exposure to HTTP, DNS, and TLS.
  • A familiarity of working with databases like MongoDB and SQL and a high level of comfort working with data pipelines built with Kafka, Zookeeper, Consul
  • Curiosity and openness to learning new technologies and best practices
  • Passion for working in a collaborative environment, where you enjoy working with a diverse group of people with different expertise working across distributed locations around the world

Within 1 month, you’ll:

  • Learn about the business and dive into the inner workings of our platform.
  • Have one-on-one’s and pairing sessions with some of the people you’ll be working closely with and get to know your engineering peers across our product umbrella.
  • Do a deep dive into the code base and learn more about Go, Rust, and Ruby.
  • Tackle your first ticket by committing changes & helping perform code reviews with the team.

Within 3 months, you’ll:

  • Establish strong async communication rhythms with your peers and leaders, practicing transparency and visibility in your progress against areas of focus
  • Join the on-call rotation and help the team pay down technical debt and improve reliability
  • Gain a more robust understanding of the needs of the product and become more comfortable with diagnosing problems
  • Deliver on your first project and help teams iterate on meaningful customer outcomes
  • Solicit feedback from your peers, including other engineers and teammates in your product team, and support your team through thoughtful feedback

Within 6 months, you’ll:

  • Elevate the work of the team and become a subject matter expert in an area that interests you

  • Contribute to building reliable microservices that are deployed into our Kubernetes cluster

  • Make a significant impact to our team by designing an extensive scalable solution to accommodate our rapidly growing user base

  • Develop automated abuse prevention tooling and building cutting edge features to empower developers

  • Fortify relationships with cross functional team members as well as broaden your connections across the organization

  • Example projects you’ll dive into:

  • Refactoring the way that we serve content. This involve a complex interaction between multiple services that are getting a constant load with the goal of distributing more knowledge onto the edge

  • Innovating on our functions product, adding more capabilities, better observability, and handling questions of how to scale the offering (we have 1 million+ functions deployed now)

  • Increasing our developer velocity by partnering with other teams to improve how we update our edge software, without incurring any customer impacts

  • Expanding on our analytics product. This involves dealing with high cardinality data that is constantly streaming into the system via Kafka. Finding an efficient way to store and search the data to drive customer insights.

Within 12 months, you’ll:

  • Have significant ownership over making extensive contributions to a large scale system that delivers insights about traffic, function invocations, and other edge visibility issues.
  • Fully revamped & iterated on the way our edge logic works and how it resolves content.
  • Play a significant role in implementing globally distributed, latency-sensitive, high throughput services.
  • Extensively collaborate with engineering leadership to level up the team and continually improve the scalability and observability of the platform.
  • Start to coach and mentor other team members within Netlify’s engineering teams

At Netlify, we are a growing company that is constantly evolving so this timeline is intended to show you an example of what you can expect from the role. Keep in mind we’re always iterating, learning, and growing, thus expect these guidelines to continue to evolve as we expand. We’re excited for you to join us on the journey!

About Netlify

Of everything we’ve ever built at Netlify, we are most proud of our team.

We believe that empowered, engaged colleagues do their best work. We’ll be giving you the tools you need to succeed and looking to you for suggestions to improve not just in your daily job, but every aspect of building a company. Whether you work from our main office in San Francisco or you are a remote employee, we’ll be working together a lot—paring, collaborating, debating, and learning. We want you to succeed! About 60% of the company are remote across the globe, the rest are in our HQ in San Francisco.

To learn a bit more about our team and who we are, make sure to visit our about page.

Applying

Not sure you meet 100% of our qualifications? Please apply anyway!

When applying please include: A resume or short listing of your job history & skills. (A link to a LinkedIn profile would be fine). A cover letter explaining why you would enjoy working in this role and why you’d like to work at Netlify would be great, though not required & will not impact your application. When we receive your application we’ll get back to you about the next steps.

Netlify is an Equal Opportunity Employer. We are devoted to building a team of people with diverse backgrounds and lifestyles. We believe that the unique contributions of all Netlifolks is the driver of our success. We are all responsible for bringing on people from all walks of life. Driving equality empowers our team, enables us to innovate, and helps us maintain a more inclusive environment. We don’t discriminate against employees or applicants based on gender identity or expression, sexual orientation, religion, age, race, military/veteran status, citizenship, pregnancy status, or any other differences. If we can do anything to provide a better interview, i.e. accommodate a disability, then please let us know.

Please note, the salary listed is just an example of our range and it will vary based on multiple factors


Apply ⎘ Copy Link ↗ Visit Link
nextmv Logo
DevOps Engineer
nextmv
Remote (Europe, USA) / New York / Philadelphia
$100,000 to $140,000 a year
January 2021

Job Description

nextmv (YC W20) is changing how companies automate and optimize their operations. We provide developers with the building blocks to create and test decision models, quickly. From logistics to healthcare to finance, every company can benefit from decision engineering using optimization and simulation. We’re looking for incredibly motivated people to help!

In a little over a year we have made substantial progress. We’re already landing enterprise clients. We’ve raised over $11 million from leading VC firms including Y Combinator, Firstmark Capital, Dynamo Ventures, and 2048 VC. And we’re just getting started.

We are looking for a DevOps Engineer II who is familiar with cloud platforms, container technology and loves automation. As the first dedicated hire supporting cloud infrastructure, internal tooling and automation you will have an impact on how we operate all our systems and services. In this role you will help build and maintain cloud infrastructure for our tools and products as well as assist with customer deployments ensuring we are following best practices and industry standards. You’ll directly contribute to the success of our new hosted product by serving a hybrid DevOps / SRE function. This role will participate in our on-call rotation.

Requirements

  • 3+ years as a software engineer, DevOps engineer, cloud engineer, site reliability engineer or systems administrator
  • Demonstrable experience administering AWS, especially VPCs, Lambda, RDS, S3 and IAM Roles & Policies
  • Experience with Infrastructure as Code (IAC) using Terraform
  • Excellent understanding of Docker & container technologies
  • Hands on experience with configuration management tools such as Ansible
  • Demonstrable understanding of modern software development practices including pair programming, peer reviews, Git-based workflows, continuous integration and delivery, and automated testing
  • Comfortable with Bash and Python
  • Familiarity with monitoring tools and services (DataDog)

Not required, but a plus:

  • Experience with Go or another statically typed and compiled language
  • Experience with serverless systems
  • Hands on experience with Kubernetes
  • Experience with software package management (RPM, APT, npm, Maven, Nexus, Artifactory, etc)
  • Ability to evaluate the benefits of using in-house vs off-the-shelf solutions
  • Software development experience
  • Familiarity with on-call / incident response practices
  • 2+ years of remote work experience

These are some of your traits:

  • The idea of working in a fast-paced startup environment excites you
  • You thrive on automating everything and adding structure to processes and procedures
  • Working together as a team to accomplish goals is more important than working alone
  • You are eager to support our customers when they have DevOps or cloud engineering questions and researching technologies to find solutions
  • You value simplicity over complexity
  • You embrace challenging technical work
  • You thrive on discovering and documenting simple, pragmatic solutions
  • You’re not afraid to speak up when you have a point of view, but can “disagree and commit” once a final decision is reached
  • You just read this whole list and got more excited than concerned

How we work

We are remote first

We value amazing work and a strong work-life balance. The majority of our collaboration happens on Slack and Zoom. We get together quarterly for team offsites so we can get some facetime (Covid Pending).

Salary Transparency

We believe that financial transparency creates trust, and that teams with a high level of trust are able to execute more effectively. We view salary transparency as a way to challenge a rampant problem in our industry: the wage gap. The base salary for any two employees in the same role is the same. Performance in that role is the differentiator, not upfront negotiation.

Benefits

This is a salaried role. In addition, nextmv offers:

  • Health Care Plan (Medical, Dental & Vision)
  • Minimum Vacation Policy - (3 weeks minimum)
  • Stock Option Plan
  • 401k
  • Home Office Stipend
  • Parental Leave

This role (and all roles at nextmv) is remote. That being said, all employees should be able to travel to company retreats quarterly (when COVID settles down).

About nextmv

nextmv helps companies automate and optimize even the most complicated operational decisions. The nextmv platform allows any developer to quickly build, test, and deploy models that automate routing, assignment, matching and scheduling.

Our Values

Our values are aspirational and affect everything we do. At nextmv, we hope to instill core attributes and practices into our daily lives. We will work toward these goals together, and help each other along the way.

Community
We act as a group of skilled contributors with diverse backgrounds and a common mission.
We listen to each other to actively instill empathy in ourselves.
We introspect about our actions and their impacts.

Candor
We share information, from company strategy to small insights and feedback.
We collaboratively review our decisions and code using the same process.
We own our mistakes and admit our vulnerabilities.

Focus
We are ambitious and value achievement over status.
We are innately driven to innovate and improve the world.
We apply our time and skills effectively to challenging problems.

Balance
We separate our work from our self-worth to view and improve it objectively.
We don’t overwork, and take regular time away to encourage creativity.
We take care of ourselves so we can give our best to our team.

Also, we love animals.


Apply ⎘ Copy Link ↗ Visit Link
Backend Go Engineer
Torus Labs Pte Ltd
Singapore
S$60,000 to S$108,000 a year
October 2019

Job Description

What are you planning to do next? Why not be a part of Torus?

Responsibilities:

  • Programming in the web stack
  • Design and implement system and network infrastructure
  • Tuning, capacity planning and load demand forecasting of systems
  • Automation and enhancement of existing tools for cloud systems
  • Coordinate on product releases and deployments
  • Contribute to research around decentralized solutions within blockchain technologies
  • Build API’s focused on usability and ease of integration

Requirements:

  • 3+ years of experience in a relevant role (Software Engineering, DevOps, Site Reliability Engineering, Systems Administration)
  • You are familiar with JavaScript / Go
  • Knowledge of cryptography / blockchain

Qualification:

  • Demonstrated software engineering experience from previous internship, work experience, coding competitions, or publications
  • Degree in Computer Science or a related field

⎘ Copy Link ↗ Visit Link
Office for National Statistics Logo
Back-End Engineer
Office for National Statistics
Newport, Wales / Fareham, Hampshire / London, United Kingdom
£29,017 to £41,149 a year
October 2019

Job Description

Working pattern - Flexible working, Full-time, (Job share / Part-time options) Salary - £29,017 - £41,149
Package / Benefits - please follow apply link for further details APPLICATION DEADLINE - 5th November 2019

As a Back-end Software Engineer, you will be a key part of the API and Data team within the Digital Publishing division of the Office for National Statistics.  The successful Developer  will share responsibility for the ONS Website, Developer sites, Dashboards and CMS. You’ll thrive using agile methods and enjoy working openly, collaboratively and as part of a multidisciplinary team of front-end engineers, back-end engineers, site reliability engineers, interaction designers, user researchers, service manager, product owner and performance analyst.

Tech Stack

Our current back-end technology stack includes Go, Java, Python, Apache Kafka, MongoDB and Neo4j.   You will be part of a team with a range of skills and programming languages, so we dont expect you to know all of these.

More details available - for full information on the role, and to progress, please click APPLY to be taken to the CivilServiceJobs website.  

For an informal conversation about the role, please contact the advertising recruiter, Darren Weeks on 01633 651628 or [email protected]


Apply ⎘ Copy Link ↗ Visit Link
Software Engineer - Infrastructure Tooling
Segment
San Francisco / Vancouver / New York, USA / Remote
$115,000 to $230,000 a year
August 2019

Job Description

Who We Are

We’re a small team of experienced engineers with diverse technical backgrounds. We’re passionate about driving our coworkers’ success and building the next generation of software tooling. If you want to work on distributed systems infrastructure and development practices or you have an entrepreneurial spirit and want to make something that your peers use every day, we’d love for you to join us. Tooling handles many different areas, so we’re building a diverse team with a wide range of expertise.

What We Do - We build shared infrastructure and tools to make engineering more productive, reliable, and cost effective. - We maintain several Segment Open Source projects. - We work in Go, Terraform and a bit of Node.js. - Read more about Segment’s infrastructure and how we use: distributed logging and secure secrets. Or, read our code: conf, ksuid, cwlogs, go-prompt, ecs-logs, chamber. - We manage the tooling and process around development environments, testing, CI, and deployment. - Read more on our blog about how we use: CI and Make.

Who we are looking for:

  • You care about simple, practical, reliable, and secure software implementation and the kinds of process needed to produce it.
  • You can research a messy, complicated problem and design an approach that makes working in that area easy and consistent.
  • You empathize with the rest of your company, listen to them, and take pride in supporting their work.

Projects we’re working on:

  • Per-Engineer Dev Environments
  • Logging Pipeline Development
  • AWS Rate Limit Monitoring
  • Application Deployment Improvements
  • Self-Hosted CI
  • Incident Management Automation
  • Large Scale JSON Stream Data Manipulation Tools
  • Standardized Metrics and Alerting Infrastructure
  • Consistent Runbooks and Documentation

Requirements

  • Minimum of 3 years experience as a software engineer, devops engineer, or site reliability engineer.
  • You have experience with AWS, Docker, Go, Node.js, or Terraform.
  • You are motivated to support your coworkers and make them productive.
  • You are a self-directed problem solver.

Bonus

  • Building tooling for distributed systems development.
  • Working on or with a variety of engineering teams.
  • Leading teams or projects.

Apply ⎘ Copy Link ↗ Visit Link
Software Engineer (Go) - Account Team
BlueLabs Software
Remote
€55,000 to €75,000 a year
July 2019

Job Description

A few months ago we started out with the vision of building a next generation sports betting platform focused on performance, reliability, modularity and automation. We believe that our experience paired with today’s technologies, great talent and the agility of a startup environment will enable us to deliver a best-in-class product that meets the demands of the market of tomorrow.

Our Account Team is now on the lookout for an experienced Software Engineer who wants to join our distributed team and help us execute our vision.

The Team

The Account Team is responsible for the development and daily operations of the core services powering business-critical functions such as player account management and wallets. Other focus areas include, but are not limited to: responsible gaming, integration with third-party payment providers, and player acquisition and retention programs with a focus on personalisation and automation.

The services owned by the team are to be simultaneously used by thousands of users around the globe and are expected to be able to handle hundreds of thousands of daily transactions in a timely manner.

Raw performance isn’t everything. The team must also ensure that the platform can be easily adapted to be compliant with the different and ever changing regulatory demands our industry is facing all over the world. The ultimate goal being to ensure a fair and safe sports betting experience to all our players.

Remote Work

We are hiring for talent, not for a specific location. You will find that members of our team are distributed all over Europe. Being a distributed team enables us to hire only the best, without being restricted to the talent pool available at a specific geographic location. However, to facilitate team communication and collaboration we currently require you to be located in a European time zone (between UTC-1 and UTC+3). You must also be able to travel to other European locations a few times a year for on-site meetings and workshops.

Compensation

The budgeted compensation range for this role is €55k-75k annually, depending on your background and experience. As an independent contractor you will be responsible for paying any taxes or applicable fees in your country of residence (unless you are based in Malta, in which case you will be employed). In addition to that, we offer a number of perks to each of our team members as we truly believe in a healthy work-life balance and continuous learning.

Requirements

  • BS degree in Computer Science or similar technical field

  • 2+ years of professional software development experience using Go

  • Interest in or previous experience with Elixir will be considered an asset

  • Experience building large-scale distributed systems, communicating asynchronously via message passing using RabbitMQ or Kafka

  • Deep understanding of DDD, CQRS, microservices architecture, and SQL/NoSQL data stores

  • Interest in test automation, cloud and containerization technologies, code instrumentation and CI/CD pipelines

  • Interest and ability to keep yourself up to date and learn new languages, frameworks and technologies as required

  • Interest in taking full ownership of your services and managing them in a production environment including the troubleshooting of live incidents

  • Ability to work autonomously in a fully distributed team

  • Good communication skills in verbal and written English


Apply ⎘ Copy Link ↗ Visit Link
Site Reliability Engineer
Dollar Shave Club
Los Angeles, CA, USA
$120,000 to $150,000 a year
May 2019

Job Description

For our fundamental philosophy please see our Medium article on the subject.

  • Work with and contribute to k8s-native infrastructure services to speed and stabilize software delivery and stability.
  • Write libraries to deliver “free” additions to our common software.
    • For example, monitoring and logging built-ins, RPC wrapping and stats display within running binaries.
  • Maintain and contribute to shared infrastructure services.
    • For example, Kafka, k8s clusters, service discovery and internal load balancing.
  • Write documentation, tutorials and blog posts (both public and internal).
  • Develop OSS to help define DSC’s technical brand to the open source community
    • All systems should be designed at with open source in mind (within reason)
  • Contribute to DSC’s OSS products (See: https://github.com/dollarshaveclub/psst for an example of SRE developed OSS at DSC)

Perks & Benefits

  • Relocation assistance may be available
  • Weekly free lunches
  • Free DSC grooming products
  • Dog-friendly office
  • In-office haircuts, massage, car washes
Apply ⎘ Copy Link ↗ Visit Link
Get a weekly email with all new Go jobs
20 of 83 Site Reliability jobs found