Staff Site Reliability Engineer New

Company:  Tbwa Chiat/Day Inc
Location: Mission
Closing Date: 16/10/2024
Salary: £100 - £125 Per Annum
Hours: Full Time
Type: Permanent
Job Requirements / Description

Recognized by Forbes as one of the fastest-growing private companies in the United States, Palmetto believes that choosing to source clean energy from renewable resources like solar power should be a right, not a privilege. As such, we connect homeowners with renewable energy options such as solar power and energy storage systems. Through our marketplace business model, we empower solar sales professionals and solar installation companies with access to our proprietary design platform, financing, customer management system, logistics, and project management. Our #1 focus is a phenomenal experience for our customers and partners, evidenced in our industry-leading Net Promoter Score.

Location

This position will be remote in the US.

Summary of Role

We are seeking a Staff Site Reliability Engineer (SRE) to drive our infrastructure reliability, performance, and security. As the most senior SRE/DevOps contributor in our organization, you will take full ownership of our systems, ensuring they are performant, scalable, secure, observable, and available. You’ll collaborate with multiple engineering teams to deploy new applications, while maintaining and enhancing our existing CI/CD pipelines and cloud infrastructure.

This is a senior-level role recognized for deep technical expertise, leadership in complex software projects, and a strategic approach to development. You will lead the design and implementation of critical software components, mentor less experienced engineers, and shape the technical direction of the team. Your impact will be seen across the infrastructure as you drive systems to align with organizational objectives and business goals. You will be working with technologies such as GCP, AWS, and MongoDB, and orchestrate the deployment and management of applications built with the MERN stack, Python (e.g., FastAPI), and Ruby on Rails.

Strategic & Tactical

  • Infrastructure Ownership: Take full responsibility for the design, implementation, and operation of cloud infrastructure (GCP and AWS), ensuring the system is resilient, secure, and optimized.
  • Monitoring & Observability: Build and maintain robust observability systems. Define and monitor SLAs, SLOs, and other key service health metrics, ensuring teams have the necessary tools to meet operational goals. Manage and optimize cloud costs to ensure they remain in check.
  • Incident Response and Support: Champion the incident response process, provide guidance to teams on post-mortems, and foster a blameless culture that emphasizes learning. Provide supplementary on-call support to engineering teams and help maintain a stable, customer-facing platform.
  • Automation and Efficiency: Automate repetitive tasks across deployment pipelines, monitoring, alerting, and scaling to reduce manual effort and improve reliability.
  • Cross-functional Collaboration: Collaborate with engineering, IT, and security teams to meet our reliability, compliance, and operational goals. Serve as a subject matter expert for cloud technologies, CI/CD processes, and DevOps best practices.

Qualifications:

  • 5+ years of experience managing infrastructure on GCP, AWS, or Azure.
  • Proficient with Infrastructure as code (IaC) tools such as Terraform, CloudFormation, or CDK. Expertise with Linux systems administration and shell scripting (e.g., Bash, Python).
  • In-depth knowledge of DNS, TLS/SSL, CDNs, Load Balancing, and network security.

Monitoring & Observability:

  • Experience with an observability stack (e.g., DataDog, Prometheus, Grafana).
  • Knowledge of log aggregation and monitoring best practices.
  • Ability to assist teams with defining and monitoring SLOs and service health metrics.

DevOps & Automation:

  • Thorough knowledge of CI/CD systems (GitHub Actions, Jenkins, or similar).
  • Deep expertise with containerization and orchestration tools (e.g., Docker, Kubernetes).
  • Familiarity with build automation and package management tools like npm, Yarn, etc.

Nice to Have:

  • Experience supporting containerized microservices in a high-traffic environment.
  • Familiarity with modern authentication and authorization patterns (e.g., OAuth, SAML, Auth0).
  • Experience with data-intensive applications, data engineering workflows, and MLOps practices.
  • Passion for renewable energy and sustainable technology.

Why You’ll Love this Role:

  • Impact: Be the go-to person for all infrastructure and DevOps-related responsibilities, with the opportunity to make significant improvements across systems.
  • Ownership: As the sole SRE, you’ll have a high degree of autonomy and the chance to lead strategic initiatives around infrastructure and reliability.
  • Growth: Work at the intersection of cloud infrastructure, security, automation, and software development, enhancing your expertise in each area.
  • Mission: Join a company at the forefront of renewable energy and be part of a team driving innovation and sustainability.

Palmetto embraces diversity and is an Equal Employment Opportunity employer. Employment is decided on the basis of qualifications, merit, and business need. We do not discriminate based upon race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, or any other status protected under federal, state, or local law.

#J-18808-Ljbffr
Apply Now
Share this job
Tbwa Chiat/Day Inc
  • Similar Jobs

  • Site Reliability Engineer

    Mission
    View Job
  • Site Reliability Engineer

    Mission
    View Job
  • Manager, Site Reliability Engineering

    Mission
    View Job
  • Security Engineer.

    Mission
    View Job
  • PowerBI Engineer

    Mission
    View Job
An error has occurred. This application may no longer respond until reloaded. Reload 🗙