Get a weekly email with all new Golang jobs
x

Golang Site Reliability Jobs Paying 150,000 USD a Year


Hand-Picked Golang jobs • Apply directly to companies • Clear salary ranges

Browse 33 Golang Site Reliability Jobs (2 new this week) in March 2025 at companies like Monzo, Digital Ocean and TextNow paying at least 150,000 USD per year working as a Site Reliability Engineer (Platform), Senior Engineer Tools & Platforms SRE and Senior Site Reliability Engineer. Last post

Hiring Golang Developers?

Create your profile to continue

Onur Ata Saritas Priya Gupta Irfan Saf Orfeas Voutsaridis Patel Sarthak Charles Michael Boegner Mohammed Raqeeb Shafeeque Rafael Pol Sanjay Bhattacharya Sergei Iudenichev
  • 48 direct messages sent by companies to developers on Golang Cafe in the last 30 days
  • 53 developers joined Golang Cafe in the last 30 days
  • 17,300 developer profiles page views in the last 30 days
  • Get access to our Salary Explorer
  • Get access to exclusive discount on Golang courses up to 25% off
  • Last developer joined

2-Click Apply

  1. Upload Your CV
  2. Go to your Inbox & Confirm Your Application





10 of 33 Site Reliability Jobs paying at least 150,000 USD per year • Sort by Date
Monzo Logo
Site Reliability Engineer (Platform)
Monzo
London, UK / Remote (EU)
£59,000 to £116,000 a year
September 2020
5 Applicants This Week
More Than 6 Months Old

Job Description

At Monzo we’re aiming to build the best current account in the world. We are always keen to hear from capable, creative engineers who want to help us accomplish that goal 🚀

We’re currently looking for Site Reliability Engineers (SREs) to join our Platform team.

We’re looking for SREs who are software engineers at heart - you’re as comfortable writing software to solve problems as you are operating AWS or Kubernetes. If you’re a software engineer who has some good cloud infrastructure experience already, or you’re eager to get really familiar with systems, tooling and libraries, this could be the role for you.

As a team, we’re responsible for designing, building, and operating the services we consume from AWS, along with the software we run on top like Kubernetes, Cassandra, Prometheus, and Kafka. We’re also responsible for operating our three physical data centres, our network, and being on-call for the things we own and run.

To achieve this, we’re organised into three squads within the Platform Group; Infrastructure Platform, Storage Platform, and Backend Platform. Each squad is responsible for solving a specific set of problems for our customers and our engineers. We’re looking for engineers who are interested in joining our Infrastructure Platform or Storage Platform squads right now, but there are opportunities to move between them as you gain experience with our platform.

We've posted a good overview of our platform on our blog if you’d like to learn more.

We're investing a lot of up-front effort in building a scalable, secure, and extensible architecture for our millions of customers. Come and help us build a state-of-the-art microservices platform and build the kind of bank you want to use.

Our engineers have a variety of different backgrounds

We have several non-graduates; only some of us studied Computer Science; some of us have worked in huge companies; some have only ever worked in startups; others are former consultants. As long as you enjoy learning new things, we’d love to talk to you. We do not ask for formal qualifications or degree requirements for any of our engineering roles.

We are actively creating an equitable environment for all of our engineers to thrive

Diversity and inclusion are a priority for us and we are making sure we have lots of support for all of our people to grow at Monzo. We provide a sponsorship framework in Engineering for women and people of colour; all of our leaders are trained on privilege awareness and we are creating partnerships with organisations dedicated to supporting underrepresented groups. You can read more in our 2020 Diversity and Inclusion report.

Monzo works in project-based sprints in small, interdisciplinary teams

We have around 150 engineers out of roughly 1,400 people in total - and we have big ambitions. There are many interesting challenges ahead, and we're happy for people to move between teams or to specialise, whatever you prefer. As an engineer here you'd be able to work directly with anyone across the company, and we run regular knowledge-sharing sessions so you’ll learn heaps about everything from how banks work to effective communication.

We encourage an open and transparent working environment

You can get involved in any aspect of the business you are interested in and, following Stripe’s example, all emails in the company are visible in an email archive. We contribute to open source software as much as possible. We’ve also made our product roadmap public and give sneak peeks of features in our community forum. Our technology blog is a good place to learn even more about what we do!

At Monzo you will get to work with a lot of exciting new technology.

We rely heavily on the following tools and technologies:

You should apply if:

Our open roles are for mid-level to senior Site Reliability Engineers at present. Apply if:

  • the work we’re doing sounds exciting!
  • you’re a software engineer at heart and you’re comfortable writing software to solve problems
  • you’re interested in distributed systems and writing resilient, scalable software
  • you have strong experience working on the backend of a technology product
  • you’re familiar with some of our Platform technologies, or specialise in just one part
  • you want to help build, scale and operate a platform to support a product that you (and everyone you know) use every day
  • you’re keen to learn more about new technologies and the arcane inner workings of the financial industry
  • you’re comfortable working in a team that deals with ambiguity

Logistics

Salary ranges between £59,000 - £116,000 plus stock options and other benefits.

We can help you relocate to London & we can sponsor visas.

This role can be based in our London office, but we're open to distributed working (as long as you can spend around 20% of your time in London).

We have payroll set up in four countries: the UK, Ireland, France, and Spain. Right now, we can only hire people who work from those countries and we’ll keep this updated with new ones as we expand and are able to hire from more places 🌎

We're usually always hiring for engineers, so there's no closing date for this job.

We offer flexible working hours and trust you to work enough hours to do your job well, at times that suit you and your team.

Diversity and inclusion is a priority for us – if we want to solve problems for people around the world, our team has to represent our customers. So we need to attract the best talent and create an environment that supports and includes them. You can read more about diversity and inclusion on our blog.

If you prefer to work part-time, we'll make this happen whenever we can - whether this is to help you meet other commitments or strike a great work-life balance.

Our interview process is normally a phone interview, a coding task and call to discuss it, and 2-3 hours of onsite interviews that can be conducted via hangouts as well. We promise not to ask you any brain teasers or trick questions. We might design a system together on a whiteboard, the same way we often work together, but we won’t make you write code on one.

Equal Opportunity Statement

At Monzo, embracing diversity in all of its forms and fostering an inclusive environment for all people to do the best work of their lives with us. This is integral to our mission of making money work for everyone.

We're an equal opportunity employer. All applicants will be considered for employment without attention to ethnicity, religion, sexual orientation, gender identity, family or parental status, national origin, veteran, neurodiversity status or disability status.


Perks & Benefits

https://monzo.com/careers/#benefits

Apply ⎘ Copy Link ↗ Visit Link
Digital Ocean Logo
Senior Engineer Tools & Platforms SRE
Digital Ocean
New York / Cambridge / Palo Alto, United States / Remote
$155,000 to $190,000 a year
July 2019
2 Applicants This Week
More Than 6 Months Old

Job Description

Do you ever wonder what happens inside the cloud?

Based in New York, DigitalOcean is a dynamic, high-growth technology company that serves a robust and passionate community of developers, teams, and businesses around the world. We believe that today’s entrepreneurs are changing the world through software. Our mission is to empower these entrepreneurs by bringing modern app development within reach for any developer, anywhere in the world.

We want people who are passionate about building the systems, culture, and processes that will improve the resiliency, reliability, scaling, and performance for cloud services.

We are looking for an experienced Site Reliability Engineer to work closely with our product engineering and infrastructure teams. Reporting to the Director of Platform Systems, the Site Reliability Engineer will be performing a mix of hands-on development, coaching, and collaborating with other teams and stakeholders to help bring DigitalOcean’s engineering systems and culture up to the next level.

This is a key opportunity to make a significant impact in DigitalOcean’s engineering and operational systems and influence future product designs and requirements. This role is essential to accelerate the improvement of the high expectations our customers have of DigitalOcean as we continue to grow and expand.

What You’ll Be Doing:

  • Performing hands on technical work to directly improve the reliability, resiliency, and scaling of our key platform systems
  • Working with stakeholders to develop and implement reliability and performance metrics
  • Facilitate DigitalOcean’s culture of learning by providing insight and recommendations for improvement
  • Coaching teams and individuals on reliability best practices and solutions
  • Working with other SREs and engineering leaders to define the architectures and practices that should be adopted in order to deliver on our engineering and operational goals
  • Establishing best practices for development, architecture, deployment, and operations
  • Working with peer SREs to improve services and processes (including architecture reviews, incident response, monitoring) in a cross-functional manner throughout the engineering organization

What We’ll Expect From You:

  • Distinguished track record as SRE (or similar role) with hands-on experience implementing reliability, process, and scaling solutions
  • History of fostering positive relationships with stakeholders and a track record of successful collaboration and coaching
  • Clear communication skills (both written and verbal) to document processes and architectures
  • Experience implementing disaster recovery best practices
  • Developing robust solutions that facilitate streamlined resolution of customer inquiries through use of technologies for automation, deflection, and issue management
  • Adept in Ruby and Go with a broad understanding of the full technology stack for a modern infrastructure
  • Advocate of effective development environments with the use of CI/CD tooling and configuration management technologies such as Chef or Ansible

Why You’ll Like Working for DigitalOcean:

  • We have amazing people. We can promise you will work with some of the smartest and most interesting people in the industry. We work hard but we always have fun doing it. We care deeply about each other and take our “no jerks” rule very seriously.
  • We value development. We are a high-performance organization that is always challenging ourselves to continuously grow. That means we maintain a growth mindset in everything we do and invest deeply in employee development. You’ll need to be great to get hired here and we promise you’ll get even better.
  • We care about you. We offer competitive health, dental, and vision benefits for employees and their dependents, a monthly gym reimbursement to support your physical health, and a monthly commute allowance to make your trips to and from work easier.
  • We invest in your future. We offer competitive compensation and a 401k plan with up to a 4% employer match. We also provide all employees with Kindles and reimbursement for relevant conferences, training, and education.
  • We want you to love where you work. We have great office spaces located in the heart of SoHo NYC and Cambridge and offer daily catered lunches to keep your hunger at bay. We’re also very remote-friendly—we use Slack to communicate across the company—and all remote employees have the opportunity to onboard in-office and take an all-expenses paid trip to our annual company offsite, Shark Week, to get quality in-person time with the team at least once a year. We also allow employees to customize their workstations to meet their needs—whether remote or in office.
  • We value diversity and inclusivity. We are an equal opportunity employer and we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Apply ⎘ Copy Link ↗ Visit Link
TextNow Logo
Senior Site Reliability Engineer
TextNow
Remote (United States)
$150,000 to $230,000 a year
October 2021
1 Applicants This Week
More Than 6 Months Old
This job posting is no longer available

Job Description

TextNow is based around a simple idea: Communication belongs to everyone. We work hard to help people stay connected by offering a solution that makes phone service free. At TextNow, we work together to solve complex and interesting problems that have a positive impact on our customers' lives.

Join us in our mission to help people stay connected with technology that is free (or as close to free as possible.)

TextNow is looking for motivated Site Reliability Engineers (SRE's) to own infrastructure, monitoring, logging, ci/cd, reliability and everything in between!

What You’ll Do:

  • Be responsible for maintaining and scaling production services and servers for complex and high throughput.
  • Improve scalability, service reliability, capacity, and performance.
  • Write automation code for provisioning and operating infrastructure at scale.
  • Build tools for internal use to support software engineering best practices.
  • You are not an operator; you’re an experienced software engineer focused on operations.
  • Work with development teams to make sure the applications fit nicely within the infrastructure and scalability/reliability/security is designed and implemented from the start.
  • Participate in on-call rotation, being responsible for uptime and support.
  • Roll up the sleeves to troubleshoot incidents, formulate and test your hypotheses, and narrow down possibilities to find the root cause.

Who You Are:

  • Creator of cool stuff with experience deploying web apps and distributed, service-oriented architectures.
  • Brilliant Collaborator with 8+ years of professional experience in an operationally focused role, preferably in DevOps or SRE, with a B.S., M.S., or PhD. in Computer Science (or equivalent).
  • Someone who takes action and ownership with proven ability to use automation tools.
  • Respectfully candid with the ability to motivate people to act and work on behalf of our customer.
  • A bold risk-taker and self-starter who loves to solve challenging problems.
  • Resourceful and scrappy with the ability to be strategic, roll up your sleeves and execute.

Other:

  • Strong knowledge of Linux and open source software
  • Understanding of modern web architecture (HTTPS, REST) and technology stacks
  • 2+ years of experience with programming/scripting languages (Bash, Go, Python, Ruby, etc.)
  • Experience with deployment automation using Ansible, Puppet, and Terraform
  • Experience supporting various databases such as MariaDB, Redis, and various NOSQL engines
  • Experience deploying containers using Docker and Kubernetes
  • Experience working in the Amazon public cloud (AWS)
  • Experience supporting mobile applications (Android and iOS)
  • Experience in the telecommunications industry

#LI-SW1

Benefits:

· Strong work life blend

· Flexible work arrangements (wfh, remote)

· Employee Stock Options

· Unlimited vacation

· Competitive pay and benefits

· Parental leave

· Benefits for both physical and mental well being

Diversity and Inclusion:

At TextNow, our mission is built around inclusion and offering a service for EVERYONE, in an industry that traditionally only caters to the few who have the means to afford it. We believe that diversity of thought and inclusion of others promotes a greater feeling of belonging and higher levels of engagement. We know that if we work together, we can do amazing things, and that our differences are what make our product and company great.

For TextNow Candidates:

The People and Culture team is available to support you through the hiring process by providing reasonable accommodations to help enable a barrier-free interview experience. If you need assistance applying for a role due to a disability or special need, please let us know by completing this form. Once received our Equity, Diversity and Inclusion Specialist will reach out to you and assist with accommodations that you may require.


⎘ Copy Link ↗ Visit Link
Rebellion Defense Logo
Site Reliability Engineer
Rebellion Defense
Washington, DC / Chicago, Illinois, United States
$100,000 to $200,000 a year
November 2020
2 Applicants This Week
More Than 6 Months Old

Job Description

We are looking for a Site Reliability Engineer (SRE). As an SRE, you will be tasked with the reliability and operation of our production environments. SREs are tasked with ensuring teams within the company receive help maintaining software at scale, as well as help designing and developing software for scale. SREs are expected to engage with the product teams to ensure the delivery of our software is as seamless as possible.

These position is based out of our Washington D.C. or Chicago Illinois office locations. An active clearance or ability to obtain TS/SCI clearance will be required.

We look for a track record of the following:

  • Coming alongside high energy engineering teams to enable the adoption of best practices to enable the scalability and reliability of deployed software,
  • Defined architecture and built services at scale on public infrastructure such as AWS and Azure,
  • Experience designing, implementing, deploying, and operating high scale production services,
  • Experience facilitating the definition and implementation of SLIs and SLOs,
  • Understanding how to carefully spend error budget to handle regular deployment of large changes to production,
  • Deep experience in Linux operating systems, and systems engineering,
  • Comfort delivering critical software in Go and Python,
  • Willingness to debug problems across the stack,
  • Comfortability with working on underspecified problems and are capable of rapidly learning and iterating on solutions,
  • Experience building the wrong system enough times to avoid the common pitfalls, whether building something personally or advising others.

You might be a good fit if you:

  • 5+ years of relevant SRE experience in the tech industry,
  • demonstrable knowledge of TCP/IP, HTTP, web application security and experience supporting web application architecture,
  • experience working with a variety of storage systems, application architectures, compute infrastructure and network management systems,
  • experience designing, implementing, deploying, and operating high scale production service,
  • defined architecture and built services at scale on public infrastructure such as AWS and Azure, proven knowledge at least one higher-level language (eg. Python and Golang),
  • The ability and desire to build and learn new systems with new technologies.

Rebellion is a well-capitalized technology start-up firm that is passionate about defining and delivering modern, life-changing software products to the US Department of Defense (DoD), the UK Ministry of Defence (MoD), and their allies. At Rebellion we believe in operating what we own, we deliver all of our products as managed services, this allows our product teams to maintain operational ownership across all deployments. Expect talented, motivated, intense, and interesting co-workers.

Compensation includes meaningful equity ownership, competitive salaries, full medical coverage, disability and life insurance, and transit reimbursement.

An Equal Opportunity Employer/Veterans/Disabled. Rebellion Defense is an equal opportunity employer and makes employment decisions on the basis of merit and business needs. Rebellion Defense does not discriminate against applicants on the basis of race, color, religion, sex, sexual orientation, gender, gender identity, national origin, veteran status, disability, or any other protected characteristic in accordance with federal, state, and local law.


Apply ⎘ Copy Link ↗ Visit Link
Netflix Logo
Senior Site Reliability Engineer, CORE
Netflix
Los Gatos, California, United States
$250,000 to $500,000 a year
May 2020
1 Applicants This Week
More Than 6 Months Old

Job Description

At Netflix, we strive to bring joy to people across the world through amazing stories. As we grow internationally, we are continually enhancing our cloud-based infrastructure to improve our performance, scalability, and reliability.

The SRE team's goal is to ensure customer joy by successfully managing risk and minimizing impact across Netflix. We do this through cross-functional engagement with other engineering teams, managing issues when they happen, as well as promoting reliability and resilience practices throughout the organization.

Outcomes

  • Improve our incident management lifecycle to identify, mitigate, and learn from reliability risks
  • Increase our reliability through establishing guidance and methods of improvement
  • Form and maintain relationships with internal and external partners
  • Develop deeper insights and analysis into the quality of experience for our customers

We Value

  • Curiosity about how complex sociotechnical systems successfully operate at scale when failure is inevitable
  • People who see influence as their preferred tool for cultivating relationships
  • Collaboration and continuous improvement
  • A desire to learn and readiness to teach
  • Iteration as the path forward

Our Work

  • Drive incidents to resolution by coordinating with multiple engineering teams
  • Identify sources of instability in large-scale distributed systems and drive operational excellence
  • Analyze complex systems from a reliability and resilience perspective
  • Engage with product teams to diagnose operational surprises and carry forward improvements
  • Improve reliability and drive down the burden of toil with tooling and automation

Nice to Have

  • Experience with global, continuous delivery methods
  • Development with Python, Go, Java, or JavaScript/Node.js
  • Involvement with incident management and response
  • Knowledge of cloud platforms like AWS and microservices architecture
  • Deep network analysis
  • Linux systems engineering capability

Things that show how we think


Apply ⎘ Copy Link ↗ Visit Link
Senior Site Reliability Engineer
Tendermint
San Francisco, United States / Berlin, Germany / Toronto
$100,000 to $150,000 a year
October 2018
2 Applicants This Week
More Than 6 Months Old
This job posting is no longer available

Job Description

We're looking for someone who has: - At least 5 years of software engineering experience with open source contributions. - Written structured, high-quality programs and scripts for automation. - Significant experience writing Golang or the ability and desire to become proficient in new languages. - Experience developing, releasing, and maintaining production software and infrastructure tools like Elastic stack, InfluxDB stack, DataDog, PagerDuty, or VictorOps. - Built solutions with a broad set of technologies in and around cloud solutions (AWS EC2, ECS, Route53, DynamoDB, RDS, Lambda, Docker, - Google Container Engine, Kubernetes or Docker Swarm). - Implemented continuous deployment before (Jenkins, CircleCI, Travis, Ansible, Chef, Puppet). - Experience with SDLC tools (Git, GitHub, Atlassian Stash/Bitbucket, GitLab, JIRA). - Experience with QA/SIT tools (Selenium). - Experience in Linux System administration including package management, network management, and security management. - Familiarity with open source P2P networking protocols. - Experience working in an agile development environment. - The ability to take ownership and see initiatives through. - Exceptional communication skills. - Experience working with distributed teams.

What your primary responsibilities will be: - Help scale software systems with automation, in an effort to improve reliability, velocity, and simplicity. - Create, maintain, and improve the tooling for continuous integration and continuous delivery. - Build and maintain tooling for deploying, monitoring, and maintaining clusters of Tendermint nodes on our testnets and mainnets. - Build and maintain tooling to help shorten feedback cycles within teams and projects. - Plan, build, and maintain public facing services in association with business goals. - Build tools to measure and monitor availability, latency and overall system health.

Apply : https://goo.gl/forms/jpdRI1wD8pdfoqKl2


⎘ Copy Link ↗ Visit Link
Rialtic Logo
Staff Golang Engineer
Rialtic
USA, Remote (EST, CST, MNT)
$200,000 to $250,000 a year
September 2024
14 Applicants This Week
More Than 6 Months Old

Job Description

*Please note that we can only consider candidates in the US within EST, CST, MST time zones.

About Rialtic

Rialtic is an enterprise software platform empowering health insurers and healthcare providers to run their most critical business functions. Founded in 2020 and backed by leading investors including Oak HC/FT, F-Prime Capital, Health Velocity Capital and Noro-Moseley Partners, Rialtic's best-in-class payment accuracy product brings programs in-house and helps health insurance companies gain total control over processes that have been managed by disparate and misaligned vendors. Currently working with leading healthcare insurers and providers, we are tackling a $1 trillion problem to reduce costs, increase efficiency and improve quality of care. For more information, please visit www.rialtic.io.

The Role

We seek a motivated and curious Staff Engineer with extensive background experience in cloud-native distributed systems who hates manual processes and feels compelled to build tools to automate them away. As a key contributor to our core healthcare claims processing platform team and senior member of the technical staff, you will play a vital role in building solutions to improve workflows across multiple engineering teams, supporting client evaluations and implementations, live system support, site reliability, system testing and monitoring, and logging/alerting integrations. This position requires a customer-first, quality-oriented mindset. We are a data-driven organization, so instrumentation and measurement are how we determine the success or failure of our engineering efforts.

We tackle challenges that are common to healthcare companies and healthcare data, but we do it using a modern, cloud-native stack. Our core processing platform and related services are written in Go, while our clinical and financial analytics components that run inside the platform are written in Python. This is a back-end systems focused role: we won’t ask you to write Javascript (but being able to read it never hurts, and we have many APIs and interfaces between us, our clients, and our own systems). Our ability to parse, validate, process, write code against, and manage enormous volumes of data while performing complex analyses quickly and accurately is critical to our success.

If that sounds like a fun challenge, then you should apply for this position!

You will

During any given week in this role, you might:

  • Develop core platform features using Golang, Python, PostgreSQL, Kafka, and various cloud (AWS) services, with a particular focus on developer experience, tools, and testing;
  • Apply your experience with distributed systems to our architecture and services, drawing on your hard-won knowledge of the places where whole new classes of fun and exciting bugs lurk;
  • Collaborate with your engineering peers and build productive relationships with members of the Go-to-Market, Product Management, Clinical Content, and other teams that need our expertise to translate their requirements into coherent technical solutions;
  • Partner with our cloud/SRE team to understand the performance characteristics and storage needs for our Kubernetes clusters and the pods and containers that run there, which requires continual tuning as we dynamically scale throughout the day to meet client usage patterns and data flows while meeting sub-second SLA performance requirements;
  • Assist our infosec team in reviewing the findings of automated and manual security testing and audits, including both HITRUST and SOC 2 Type II, and work with the engineering team to implement and refactor code and services in a secure fashion;
  • Influence the whole Engineering organization to adopt best practices in software development and testing, helping us all develop high-quality, scalable, testable, and maintainable code;
  • Participate with internal and external stakeholders to understand the business logic and other requirements (such as refresh latency) for our Web-based payment integrity solution, client data warehouse exports, and one-time/ad-hoc analysis needs;
  • Write and help maintain specifications, documentation, diagrams, test plans, and other artifacts that represent the current and planned future state of our systems;
  • Serve as a peer reviewer for a colleague’s code, participate in an engineering architecture specification review, work with the product management team to refine a set of requirements or break a story down into concrete tasks for implementation; or
  • Mentor less-experienced developers as they grow in their own mastery of these topics and more. Our systems and services tech stack includes (but is not limited to) Golang, Python, SQL, shell scripts, AWS EC2, Athena, Aurora / PostgreSQL, Kafka / MSK, Kubernetes, SQLite, Airflow, Spark, and more!

Apply ⎘ Copy Link ↗ Visit Link
smlXL Logo
Staff Site Reliability Engineer
smlXL
New York City, United States
$170,000 to $250,000 a year
May 2023
15 Applicants This Week
More Than 6 Months Old

Job Description

About the job

smlXL is a 'stealth' start-up building an Information retrieval service with Consumer and Enterprise applications. Our first focus is providing a far richer understanding of the semantics of blockchain activity, making data and information accessible and useful to all.

We aren't ready to talk broadly about what we are working on, but we might be a good place for you if:

You are highly technical; you care about your craft; you are constantly learning; used to working on baremetal servers and running your own stack, you are fascinated by Information Systems, Semiotics, and Blockchain data; you get excited by turning black boxes transparent; and you love working on things that add a ton of value to consumers and prosumers alike; or you are into the EVM, decompilers, databases, and distributed systems.

About You

  • Experience keeping production systems running smoothly, experienced with working on private cloud/colo/bare-metal environments
  • Experience building software and systems to manage platform infrastructure and applications
  • Experience with and/or a desire to go deeper into blockchain technology and crypto protocols
  • HashiCorp or Nomad experience is a plus
  • You care about polish and adding value to our users but not perfectionism for perfectionism’s sake
  • You love working collaboratively with different disciplines and learning from others
  • You are an expert who stays curious with a beginner’s mindset
  • You are a thoughtful communicator and collaborator and work to gain consensus with your peers and stakeholders, but you’re not afraid to speak up
  • You want to win, but prefer to win as a team
  • You are proactive
  • You are thoughtful and open about your priorities, goals, and aspirations so we can help you achieve them
  • You have specific passions outside of work
  • We believe that on average it will take 5+ years of experience in an engineering role to get to the level we want, but don’t let that stop you

Benefits and Support

  • Comprehensive health benefits (Medical, Dental, Vision, Life)
  • Flexible working hours, flexible WFH policy and unlimited time off with approval
  • Gender-neutral parental leave program for primary and secondary caregivers
  • Competitive salary and equity compensation with 401K retirement plan options
  • Physical, Mental, and Financial Well-being applications are provided at little to no cost, including fertility benefits, fitness classes, mental health, physical therapy, and healthcare apps (One Medical)
  • We encourage, support, and make time for our team members to invest in side projects and community projects

Apply ⎘ Copy Link ↗ Visit Link
Software Engineer - Infrastructure Tooling
Segment
San Francisco / Vancouver / New York, United States / Remote
$115,000 to $230,000 a year
August 2019
1 Applicants This Week
More Than 6 Months Old
This job posting is no longer available

Job Description

Who We Are

We’re a small team of experienced engineers with diverse technical backgrounds. We’re passionate about driving our coworkers’ success and building the next generation of software tooling. If you want to work on distributed systems infrastructure and development practices or you have an entrepreneurial spirit and want to make something that your peers use every day, we’d love for you to join us. Tooling handles many different areas, so we’re building a diverse team with a wide range of expertise.

What We Do - We build shared infrastructure and tools to make engineering more productive, reliable, and cost effective. - We maintain several Segment Open Source projects. - We work in Go, Terraform and a bit of Node.js. - Read more about Segment’s infrastructure and how we use: distributed logging and secure secrets. Or, read our code: conf, ksuid, cwlogs, go-prompt, ecs-logs, chamber. - We manage the tooling and process around development environments, testing, CI, and deployment. - Read more on our blog about how we use: CI and Make.

Who we are looking for:

  • You care about simple, practical, reliable, and secure software implementation and the kinds of process needed to produce it.
  • You can research a messy, complicated problem and design an approach that makes working in that area easy and consistent.
  • You empathize with the rest of your company, listen to them, and take pride in supporting their work.

Projects we’re working on:

  • Per-Engineer Dev Environments
  • Logging Pipeline Development
  • AWS Rate Limit Monitoring
  • Application Deployment Improvements
  • Self-Hosted CI
  • Incident Management Automation
  • Large Scale JSON Stream Data Manipulation Tools
  • Standardized Metrics and Alerting Infrastructure
  • Consistent Runbooks and Documentation

Requirements

  • Minimum of 3 years experience as a software engineer, devops engineer, or site reliability engineer.
  • You have experience with AWS, Docker, Go, Node.js, or Terraform.
  • You are motivated to support your coworkers and make them productive.
  • You are a self-directed problem solver.

Bonus

  • Building tooling for distributed systems development.
  • Working on or with a variety of engineering teams.
  • Leading teams or projects.

⎘ Copy Link ↗ Visit Link
Site Reliability Engineer
Dollar Shave Club
Los Angeles, CA, United States
$120,000 to $150,000 a year
May 2019
5 Applicants This Week
More Than 6 Months Old

Job Description

For our fundamental philosophy please see our Medium article on the subject.

  • Work with and contribute to k8s-native infrastructure services to speed and stabilize software delivery and stability.
  • Write libraries to deliver “free” additions to our common software.
    • For example, monitoring and logging built-ins, RPC wrapping and stats display within running binaries.
  • Maintain and contribute to shared infrastructure services.
    • For example, Kafka, k8s clusters, service discovery and internal load balancing.
  • Write documentation, tutorials and blog posts (both public and internal).
  • Develop OSS to help define DSC’s technical brand to the open source community
    • All systems should be designed at with open source in mind (within reason)
  • Contribute to DSC’s OSS products (See: https://github.com/dollarshaveclub/psst for an example of SRE developed OSS at DSC)

Perks & Benefits

  • Relocation assistance may be available
  • Weekly free lunches
  • Free DSC grooming products
  • Dog-friendly office
  • In-office haircuts, massage, car washes
Apply ⎘ Copy Link ↗ Visit Link

Get a weekly email with all new Golang jobs

10 of 33 Site Reliability jobs paying at least 150,000 USD per year found