Golang Site Reliability Jobs in United States Paying 150,000 USD a Year
Hand-Picked Golang jobs • Apply directly to companies •
Clear salary ranges
Browse 22 Golang Site Reliability Jobs (1 new this month) in United States 🇺🇸 in December 2024 at companies like Digital Ocean, TextNow and Rebellion Defense paying at least 150,000 USD per year working as a Senior Engineer Tools & Platforms SRE, Senior Site Reliability Engineer and Site Reliability Engineer. Last post
Hiring Golang Developers?
Create your profile to continue
48 direct messages sent by companies to developers on Golang Cafe
in the last 30 days
37 developers joined Golang Cafe in the last 30 days
12,574 developer profiles page views in the last 30 days
Based in New York, DigitalOcean is a dynamic, high-growth technology company that serves a robust and passionate community of developers, teams, and businesses around the world. We believe that today’s entrepreneurs are changing the world through software. Our mission is to empower these entrepreneurs by bringing modern app development within reach for any developer, anywhere in the world.
We want people who are passionate about building the systems, culture, and processes that will improve the resiliency, reliability, scaling, and performance for cloud services.
We are looking for an experienced Site Reliability Engineer to work closely with our product engineering and infrastructure teams. Reporting to the Director of Platform Systems, the Site Reliability Engineer will be performing a mix of hands-on development, coaching, and collaborating with other teams and stakeholders to help bring DigitalOcean’s engineering systems and culture up to the next level.
This is a key opportunity to make a significant impact in DigitalOcean’s engineering and operational systems and influence future product designs and requirements. This role is essential to accelerate the improvement of the high expectations our customers have of DigitalOcean as we continue to grow and expand.
What You’ll Be Doing:
Performing hands on technical work to directly improve the reliability, resiliency, and scaling of our key platform systems
Working with stakeholders to develop and implement reliability and performance metrics
Facilitate DigitalOcean’s culture of learning by providing insight and recommendations for improvement
Coaching teams and individuals on reliability best practices and solutions
Working with other SREs and engineering leaders to define the architectures and practices that should be adopted in order to deliver on our engineering and operational goals
Establishing best practices for development, architecture, deployment, and operations
Working with peer SREs to improve services and processes (including architecture reviews, incident response, monitoring) in a cross-functional manner throughout the engineering organization
What We’ll Expect From You:
Distinguished track record as SRE (or similar role) with hands-on experience implementing reliability, process, and scaling solutions
History of fostering positive relationships with stakeholders and a track record of successful collaboration and coaching
Clear communication skills (both written and verbal) to document processes and architectures
Experience implementing disaster recovery best practices
Developing robust solutions that facilitate streamlined resolution of customer inquiries through use of technologies for automation, deflection, and issue management
Adept in Ruby and Go with a broad understanding of the full technology stack for a modern infrastructure
Advocate of effective development environments with the use of CI/CD tooling and configuration management technologies such as Chef or Ansible
Why You’ll Like Working for DigitalOcean:
We have amazing people. We can promise you will work with some of the smartest and most interesting people in the industry. We work hard but we always have fun doing it. We care deeply about each other and take our “no jerks” rule very seriously.
We value development. We are a high-performance organization that is always challenging ourselves to continuously grow. That means we maintain a growth mindset in everything we do and invest deeply in employee development. You’ll need to be great to get hired here and we promise you’ll get even better.
We care about you. We offer competitive health, dental, and vision benefits for employees and their dependents, a monthly gym reimbursement to support your physical health, and a monthly commute allowance to make your trips to and from work easier.
We invest in your future. We offer competitive compensation and a 401k plan with up to a 4% employer match. We also provide all employees with Kindles and reimbursement for relevant conferences, training, and education.
We want you to love where you work. We have great office spaces located in the heart of SoHo NYC and Cambridge and offer daily catered lunches to keep your hunger at bay. We’re also very remote-friendly—we use Slack to communicate across the company—and all remote employees have the opportunity to onboard in-office and take an all-expenses paid trip to our annual company offsite, Shark Week, to get quality in-person time with the team at least once a year. We also allow employees to customize their workstations to meet their needs—whether remote or in office.
We value diversity and inclusivity. We are an equal opportunity employer and we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
TextNow is based around a simple idea: Communication belongs to everyone. We work hard to help people stay connected by offering a solution that makes phone service free. At TextNow, we work together to solve complex and interesting problems that have a positive impact on ourcustomers'lives.
Join us in our mission to help people stay connected with technology that is free (or as close to free as possible.)
TextNow is looking for motivated Site Reliability Engineers (SRE's) to own infrastructure, monitoring, logging, ci/cd, reliability and everything in between!
What You’ll Do:
Be responsible for maintaining and scaling production services and servers for complex and high throughput.
Improve scalability, service reliability, capacity, and performance.
Write automation code for provisioning and operating infrastructure at scale.
Build tools for internal use to support software engineering best practices.
You are not an operator; you’re an experienced software engineer focused on operations.
Work with development teams to make sure the applications fit nicely within the infrastructure and scalability/reliability/security is designed and implemented from the start.
Participate in on-call rotation, being responsible for uptime and support.
Roll up the sleeves to troubleshoot incidents, formulate and test your hypotheses, and narrow down possibilities to find the root cause.
Who You Are:
Creator of cool stuff with experience deploying web apps and distributed, service-oriented architectures.
Brilliant Collaborator with 8+ years of professional experience in an operationally focused role, preferably in DevOps or SRE, with a B.S., M.S., or PhD. in Computer Science (or equivalent).
Someone who takes action and ownership with proven ability to use automation tools.
Respectfully candid with the ability to motivate people to act and work on behalf of our customer.
A bold risk-taker and self-starter who loves to solve challenging problems.
Resourceful and scrappy with the ability to be strategic, roll up your sleeves and execute.
Other:
Strong knowledge of Linux and open source software
Understanding of modern web architecture (HTTPS, REST) and technology stacks
2+ years of experience with programming/scripting languages (Bash, Go, Python, Ruby, etc.)
Experience with deployment automation using Ansible, Puppet, and Terraform
Experience supporting various databases such as MariaDB, Redis, and various NOSQL engines
Experience deploying containers using Docker and Kubernetes
Experience working in the Amazon public cloud (AWS)
Experience supporting mobile applications (Android and iOS)
Experience in the telecommunications industry
#LI-SW1
Benefits:
· Strong work life blend
· Flexible work arrangements (wfh, remote)
· Employee Stock Options
· Unlimited vacation
· Competitive pay and benefits
· Parental leave
· Benefits for both physical and mental well being
Diversity and Inclusion:
At TextNow, our mission is built around inclusion and offering a service for EVERYONE, in an industry that traditionally only caters to the few who have the means to afford it. We believe that diversity of thought and inclusion of others promotes a greater feeling of belonging and higher levels of engagement. We know that if we work together, we can do amazing things, and that our differences are what make our product and company great.
For TextNow Candidates:
The People and Culture team is available to support you through the hiring process by providing reasonable accommodations to help enable a barrier-free interview experience. If you need assistance applying for a role due to a disability or special need, please let us know by completing this form. Once received our Equity, Diversity and Inclusion Specialist will reach out to you and assist with accommodations that you may require.
Site Reliability Engineer Rebellion Defense Washington, DC / Chicago, Illinois, United States $100,000 to $200,000 a year
November 2020
5 Applicants This Week
More Than 6 Months Old
Job Description
We are looking for a Site Reliability Engineer (SRE). As an SRE, you will be tasked with the reliability and operation of our production environments. SREs are tasked with ensuring teams within the company receive help maintaining software at scale, as well as help designing and developing software for scale. SREs are expected to engage with the product teams to ensure the delivery of our software is as seamless as possible.
These position is based out of our Washington D.C. or Chicago Illinois office locations. An active clearance or ability to obtain TS/SCI clearance will be required.
We look for a track record of the following:
Coming alongside high energy engineering teams to enable the adoption of best practices to enable the scalability and reliability of deployed software,
Defined architecture and built services at scale on public infrastructure such as AWS and Azure,
Experience designing, implementing, deploying, and operating high scale production services,
Experience facilitating the definition and implementation of SLIs and SLOs,
Understanding how to carefully spend error budget to handle regular deployment of large changes to production,
Deep experience in Linux operating systems, and systems engineering,
Comfort delivering critical software in Go and Python,
Willingness to debug problems across the stack,
Comfortability with working on underspecified problems and are capable of rapidly learning and iterating on solutions,
Experience building the wrong system enough times to avoid the common pitfalls, whether building something personally or advising others.
You might be a good fit if you:
5+ years of relevant SRE experience in the tech industry,
demonstrable knowledge of TCP/IP, HTTP, web application security and experience supporting web application architecture,
experience working with a variety of storage systems, application architectures, compute infrastructure and network management systems,
experience designing, implementing, deploying, and operating high scale production service,
defined architecture and built services at scale on public infrastructure such as AWS and Azure,
proven knowledge at least one higher-level language (eg. Python and Golang),
The ability and desire to build and learn new systems with new technologies.
Rebellion is a well-capitalized technology start-up firm that is passionate about defining and delivering modern, life-changing software products to the US Department of Defense (DoD), the UK Ministry of Defence (MoD), and their allies. At Rebellion we believe in operating what we own, we deliver all of our products as managed services, this allows our product teams to maintain operational ownership across all deployments. Expect talented, motivated, intense, and interesting co-workers.
Compensation includes meaningful equity ownership, competitive salaries, full medical coverage, disability and life insurance, and transit reimbursement.
An Equal Opportunity Employer/Veterans/Disabled.
Rebellion Defense is an equal opportunity employer and makes employment decisions on the basis of merit and business needs. Rebellion Defense does not discriminate against applicants on the basis of race, color, religion, sex, sexual orientation, gender, gender identity, national origin, veteran status, disability, or any other protected characteristic in accordance with federal, state, and local law.
At Netflix, we strive to bring joy to people across the world through amazing stories. As we grow internationally, we are continually enhancing our cloud-based infrastructure to improve our performance, scalability, and reliability.
The SRE team's goal is to ensure customer joy by successfully managing risk and minimizing impact across Netflix. We do this through cross-functional engagement with other engineering teams, managing issues when they happen, as well as promoting reliability and resilience practices throughout the organization.
Outcomes
Improve our incident management lifecycle to identify, mitigate, and learn from reliability risks
Increase our reliability through establishing guidance and methods of improvement
Form and maintain relationships with internal and external partners
Develop deeper insights and analysis into the quality of experience for our customers
We Value
Curiosity about how complex sociotechnical systems successfully operate at scale when failure is inevitable
People who see influence as their preferred tool for cultivating relationships
Collaboration and continuous improvement
A desire to learn and readiness to teach
Iteration as the path forward
Our Work
Drive incidents to resolution by coordinating with multiple engineering teams
Identify sources of instability in large-scale distributed systems and drive operational excellence
Analyze complex systems from a reliability and resilience perspective
Engage with product teams to diagnose operational surprises and carry forward improvements
Improve reliability and drive down the burden of toil with tooling and automation
Nice to Have
Experience with global, continuous delivery methods
Development with Python, Go, Java, or JavaScript/Node.js
Involvement with incident management and response
Knowledge of cloud platforms like AWS and microservices architecture
Senior Site Reliability Engineer Tendermint San Francisco, United States / Berlin, Germany / Toronto $100,000 to $150,000 a year
October 2018
4 Applicants This Week
More Than 6 Months Old
This job posting is no longer available
Job Description
We're looking for someone who has:
- At least 5 years of software engineering experience with open source contributions.
- Written structured, high-quality programs and scripts for automation.
- Significant experience writing Golang or the ability and desire to become proficient in new languages.
- Experience developing, releasing, and maintaining production software and infrastructure tools like Elastic stack, InfluxDB stack, DataDog, PagerDuty, or VictorOps.
- Built solutions with a broad set of technologies in and around cloud solutions (AWS EC2, ECS, Route53, DynamoDB, RDS, Lambda, Docker, - Google Container Engine, Kubernetes or Docker Swarm).
- Implemented continuous deployment before (Jenkins, CircleCI, Travis, Ansible, Chef, Puppet).
- Experience with SDLC tools (Git, GitHub, Atlassian Stash/Bitbucket, GitLab, JIRA).
- Experience with QA/SIT tools (Selenium).
- Experience in Linux System administration including package management, network management, and security management.
- Familiarity with open source P2P networking protocols.
- Experience working in an agile development environment.
- The ability to take ownership and see initiatives through.
- Exceptional communication skills.
- Experience working with distributed teams.
What your primary responsibilities will be:
- Help scale software systems with automation, in an effort to improve reliability, velocity, and simplicity.
- Create, maintain, and improve the tooling for continuous integration and continuous delivery.
- Build and maintain tooling for deploying, monitoring, and maintaining clusters of Tendermint nodes on our testnets and mainnets.
- Build and maintain tooling to help shorten feedback cycles within teams and projects.
- Plan, build, and maintain public facing services in association with business goals.
- Build tools to measure and monitor availability, latency and overall system health.
Staff Site Reliability Engineer smlXL New York City, United States $170,000 to $250,000 a year
May 2023
2 Applicants This Week
More Than 6 Months Old
Job Description
About the job
smlXL is a 'stealth' start-up building an Information retrieval service with Consumer and Enterprise applications. Our first focus is providing a far richer understanding of the semantics of blockchain activity, making data and information accessible and useful to all.
We aren't ready to talk broadly about what we are working on, but we might be a good place for you if:
You are highly technical; you care about your craft; you are constantly learning; used to working on baremetal servers and running your own stack, you are fascinated by Information Systems, Semiotics, and Blockchain data; you get excited by turning black boxes transparent; and you love working on things that add a ton of value to consumers and prosumers alike; or you are into the EVM, decompilers, databases, and distributed systems.
About You
Experience keeping production systems running smoothly, experienced with working on private cloud/colo/bare-metal environments
Experience building software and systems to manage platform infrastructure and applications
Experience with and/or a desire to go deeper into blockchain technology and crypto protocols
HashiCorp or Nomad experience is a plus
You care about polish and adding value to our users but not perfectionism for perfectionism’s sake
You love working collaboratively with different disciplines and learning from others
You are an expert who stays curious with a beginner’s mindset
You are a thoughtful communicator and collaborator and work to gain consensus with your peers and stakeholders, but you’re not afraid to speak up
You want to win, but prefer to win as a team
You are proactive
You are thoughtful and open about your priorities, goals, and aspirations so we can help you achieve them
You have specific passions outside of work
We believe that on average it will take 5+ years of experience in an engineering role to get to the level we want, but don’t let that stop you
Benefits and Support
Comprehensive health benefits (Medical, Dental, Vision, Life)
Flexible working hours, flexible WFH policy and unlimited time off with approval
Gender-neutral parental leave program for primary and secondary caregivers
Competitive salary and equity compensation with 401K retirement plan options
Physical, Mental, and Financial Well-being applications are provided at little to no cost, including fertility benefits, fitness classes, mental health, physical therapy, and healthcare apps (One Medical)
We encourage, support, and make time for our team members to invest in side projects and community projects
Software Engineer - Infrastructure Tooling Segment San Francisco / Vancouver / New York, United States / Remote $115,000 to $230,000 a year
August 2019
1 Applicants This Week
More Than 6 Months Old
This job posting is no longer available
Job Description
Who We Are
We’re a small team of experienced engineers with diverse technical backgrounds. We’re passionate about driving our coworkers’ success and building the next generation of software tooling. If you want to work on distributed systems infrastructure and development practices or you have an entrepreneurial spirit and want to make something that your peers use every day, we’d love for you to join us.
Tooling handles many different areas, so we’re building a diverse team with a wide range of expertise.
What We Do
- We build shared infrastructure and tools to make engineering more productive, reliable, and cost effective.
- We maintain several Segment Open Source projects.
- We work in Go, Terraform and a bit of Node.js.
- Read more about Segment’s infrastructure and how we use: distributed logging and secure secrets. Or, read our code: conf, ksuid, cwlogs, go-prompt, ecs-logs, chamber.
- We manage the tooling and process around development environments, testing, CI, and deployment.
- Read more on our blog about how we use: CI and Make.
Who we are looking for:
You care about simple, practical, reliable, and secure software implementation and the kinds of process needed to produce it.
You can research a messy, complicated problem and design an approach that makes working in that area easy and consistent.
You empathize with the rest of your company, listen to them, and take pride in supporting their work.
Projects we’re working on:
Per-Engineer Dev Environments
Logging Pipeline Development
AWS Rate Limit Monitoring
Application Deployment Improvements
Self-Hosted CI
Incident Management Automation
Large Scale JSON Stream Data Manipulation Tools
Standardized Metrics and Alerting Infrastructure
Consistent Runbooks and Documentation
Requirements
Minimum of 3 years experience as a software engineer, devops engineer, or site reliability engineer.
You have experience with AWS, Docker, Go, Node.js, or Terraform.
You are motivated to support your coworkers and make them productive.
You are a self-directed problem solver.
Bonus
Building tooling for distributed systems development.
Working on or with a variety of engineering teams.
Senior DevOps Engineer DroneDeploy San Francisco / Los Angeles / Portland, United States / Remote $130,000 to $180,000 a year
October 2018
4 Applicants This Week
More Than 6 Months Old
This job posting is no longer available
Job Description
DroneDeploy is the leading cloud software platform for commercial drones, making the power of aerial data accessible and productive for everyone. Trusted by businesses and individuals in over 140 countries worldwide, we are transforming the way drone users collect, manage and digest impactful data in a variety of industries, including agriculture, real estate, mining and construction. Simple by design and easy to use, DroneDeploy builds revolutionary software compatible with any drone. If you’re excited about drones and want to help us create a simple and seamless experience for drone users across the world, we’d love to hear from you!
The Challenge
The DevOps team is tasked with ensuring the reliability and security of our exponentially scaling platform, while serving as a force multiplier for the rest of the engineering organization. Other teams rely upon our expert guidance to design a product that earns the trust of our users, without slowing down the pace of development. We believe that automation and developer empowerment are the key to creating systems that are reliable and secure by default, while minimizing cycle times. We use a collection of SaaS, open source, and proprietary technologies; whichever provides the right solution and seamless integrations for that piece of the puzzle. Some of the key technologies we leverage include Docker (for code packaging and deployment), Kubernetes (for container orchestration), Ansible (for lightweight config management), and Terraform (to control our cloud infrastructure).
The Role
In this position you will be expected to:
-Have a mind for simplifying unnecessary complexity.
-Empathize with the people who use the systems you build.
-Excel at critical thinking and adapt to new situations.
-Anticipate future problems, without over-engineering the present.
-Share your expertise with others, but never stop learning new things.
We are looking for someone with:
-A depth of knowledge in at least one domain.
-Minimum of 2 years’ experience managing complex systems using software.
-Experience writing and maintaining software applications in languages such as Golang, Python, Ruby, Java, C#, JavaScript, C, C++, etc. (not just scripts, side projects ok).
-Available to work on-site within our San Francisco office, or work remotely on Pacific Standard Time hours.
-Familiarity with configuration management systems (e.g. Ansible, Puppet, Chef, Salt, Terraform, CloudFormation).
-Experience solving difficult problems with a scripting language (e.g. Bash, Ruby, Python) in a Linux environment.
Bonus points:
-Experience with container technology (Docker/cgroups/LXC/etc) and container orchestration (Kubernetes/Mesos/CloudFoundry/etc).
-Experience with major cloud providers (AWS, GCP, Azure, etc).
Life at DroneDeploy
We’re a team of star wars loving, hot sauce eating, tech enthusiasts with inspirational talents. Everyone is empowered to explore and implement new ideas and improvements. We enjoy our collaborative office environment and encourage each other to push boundaries. We host weekly Friday night BBQs on our rooftop deck, offer great salaries, generous equity,100% employee health coverage, unlimited vacation and delicious catered meals among other perks.
Software Engineer Algorithmia Seattle / San Francisco, United States / Vancouver, Canada / Remote $100,000 to $150,000 a year
August 2018
3 Applicants This Week
More Than 6 Months Old
This job posting is no longer available
Job Description
Software Engineer (Production & Deployment)
Seattle, Vancouver, NYC, or Remote
Empower large enterprise to run AI/ML at scale, leveraging the best in modern distributed systems and automation technology
Join a truly remote-friendly company - work anywhere in the US or Canada including your sofa, the beach, or our Seattle waterfront office
Experience rapid growth in the first AI startup to be funded by Google
Algorithmia automates, optimizes, and accelerates every step of the journey to deploying of AI/ML at scale. We allow anyone to run models on massively parallel infrastructure in minutes instead of months. In our cloud or your datacenter - all completely managed for maximum performance at minimum cost. Already trusted by over 60k developers and major enterprise customers, Algorithmia makes scalable Machine Learning fast, simple, and cost-effective for everyone.
Undergoing enormous customer growth, we’re rapidly scaling our Customer Operations team to meet demand. We’re looking for talented Software Engineers to join a passionate, distributed group that's driving the design, deployment, and optimization of Algorithmia with our Enterprise customers. This unique role is a broad mix of automation, DevOps, infrastructure engineering, and software development - offering an unparalleled opportunity to learn, grow, and impact the most important financial institutions, intelligence agencies, and private companies in the country.
As a Software Engineer on the Customer Operations team at Algorithmia, you will:
Deploy Algorithmia Enterprise into Fortune 500 and Government environments
Design, build, and maintain the automation and infrastructure needed to deliver Algorithmia effectively, and to help us achieve even greater scale
Work cross-team to ensure Algoritmia supports unique customer environments, and to design solutions to meet specific customer needs
Eventually automate your role out of existence - then join us in doing something even more amazing
Handle the highest-tier of engineering support for AI/ML leaders
Have a real career plan, with mentorship and fast-track opportunities to promotion, technical leadership, people management, or wherever your interests may be
Work from anywhere in the USA or Canada. We have teams in Seattle, NYC, Vancouver BC, Nova Scotia - or go 100% remote from home (Snuggie, bunny slippers, and all - no judgement!)
And we might make the perfect match if you:
Want to work with modern cloud technologies and large scale distributed systems
Have experience multiple languages (Java, Scala, Go, Python, Bash, etc.), deployment tools (Docker, Kubernetes, Ansible, Terraform, etc.), and cloud providers (AWS, Azure, GCP, OpenStack, etc.)
Are passionate about automation, and believe nothing should ever be done manually twice
Enjoy working with customers to deliver solutions that meet business need, empower engineers (and data scientists!), and solve real-world problems
Feel most comfortable in hybrid roles that blur the line between Developer, Site Reliability Engineer, Deployment Engineer, Solutions Architect, and Consultant
Bonus points for a love of data science, any kind of AI/ML experience, interesting public code, or the implementation of something cool on our AI marketplace (hint: free trial!)
As a Software Engineer at Algorithmia you’ll join a passionate team that’s changing the way everyone uses AI and ML. You’ll solve real problems, make an impact, and work in a flexible environment that encourages you to follow your own interests as well. You’ll be welcomed into an intelligent, quirky, and diverse group and gain access to fantastic perks beyond just salary, equity, and insurance benefits - all from the comfort of your own sofa (or our dog-friendly office).
If this sounds like you APPLY NOW, or learn more at algorithmia.com
Algorithmia is an equal opportunity employer and we value diversity at our core. We will never discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status and encourage everyone to apply.