Staff Site Reliability Engineer

Company: Moloco, Inc.

Location: Seattle

Closing Date: 23/10/2024

Salary: £150 - £200 Per Annum

Hours: Full Time

Type: Permanent

Apply Now

Job Requirements / Description

About the Role

Moloco is a machine learning company that operates at massive scale (we ingest 10 petabytes of training data per day), and our models are blazingly fast (return predictions in 10 milliseconds or less); and a profitable unicorn (we are valued at $2 billion and have been profitable for the last 17+ quarters).

We are looking for an exceptional Senior Site Reliability Engineer to help us build a state-of-the-art ML model serving infrastructure for our mobile advertising platform. You will be part of an engineering team that manages the infrastructure that serves deep neural network machine learning (ML) models to clients, CI/CD infrastructure to deploy infrastructure updates in real time, and develops infrastructure tools and platforms that improve the productivity of engineering teams.

We are looking for someone who is passionate about solving infrastructure problems with software engineering skills, a desire to grow and learn new technologies, a love of working in collaborative teams, and a commitment to customer service.

What you'll do

Play a role in engineering partner teams for company-wide infrastructure adoption and standard methodologies
Contribute to technical direction and decisions across the organization by conducting / leading research with other technical leaders in the organization
Traditional SRE/Operational support areas such as tooling and automation, monitoring, workflow management, maintaining and improving data pipelines, CI/CD, etc.
Actively participate in and contribute to code reviews and technical design documents to identify performance and reliability bottlenecks.
Partner with and support other engineering teams with operational guidance and expertise on various project initiatives.
Participate in capacity planning and scaling
Ensure that Moloco is delivered in a highly performant manner that can handle viral traffic spikes.
Collaborate with others in SRE and SWE to leverage tools, processes and techniques to improve service reliability.
Reduce business risk in areas such as infrastructure and configuration management, provisioning, capacity modeling and planning, and incident handling, mitigation, root cause analysis, and post-mortems.
Identify common patterns in the challenges of operating services in production, and work with others in SRE and SWE to design and implement reusable solutions and/or other cross-functional work that reduces the complexity, difficulty, cost, and risk of operating the business.

What you’ll need to succeed

Hands-on experience working with GCP or other cloud platforms (e.g. AWS, Azure)
Practical, proven knowledge of a high-level language (e.g. Go, Python)
Experience working with infrastructure-related software (e.g. Kubernetes, Helm, Terraform, etc.)
Experience developing infrastructure, configuration and deployment scripting and automation for large scale / high complexity services in a microservices environment
At least 5 years of experience in large-scale software development
Passionate about operational excellence and thrive in an environment where you are able to provide extremely high levels of customer support
Tenacious problem solver who takes ownership of issues from end-to-end to full resolution

#J-18808-Ljbffr

Apply Now

Share this job

Moloco, Inc.

Useful Links

More Jobs in Seattle
Full Time Jobs in Seattle
Part Time Jobs in Seattle
Engineering Jobs

Similar Jobs
Staff Site Reliability Engineer
Seattle
View Job
Staff Site Reliability Engineer
Seattle
View Job
Staff DevOps Engineer - Site Reliability Engineer
Seattle
View Job
Staff Site Reliability Engineer - Cloud Infrastructure
Seattle
View Job
Staff Site Reliability Engineer - Performance Engineering
Seattle
View Job

Staff Site Reliability Engineer

Similar Jobs