Site Reliability Engineer

Company:  NVIDIA
Location: Santa Clara
Closing Date: 07/11/2024
Salary: £150 - £200 Per Annum
Hours: Full Time
Type: Permanent
Job Requirements / Description

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and outstanding people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, encouraging environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.

We are looking for a Staff Site Reliability Engineer to join our team. You should have experience supporting and working with teams across the company to improve the usability, reliability, and performance for enterprise applications.

What You'll Be Doing

  1. Design, develop, and evolve the Site Reliability Engineering practice.
  2. Deploy and support tools from a system engineering perspective and be able to solve any issues in-depth.
  3. Help the SRE teams define technology and business strategies that deliver iterative enhancements to the tools and processes that improve availability, observability, and scalability.
  4. Recognize, validate, and publish emerging technologies and architectures that align with business objectives.
  5. Lead and build the proven foundation for the Infrastructure and Application lifecycle on installation, monitoring, observability, and user experience.
  6. Build tooling to lower the barrier of entrance for engineering teams to plug in and enjoy the benefits of Reliability.
  7. Documenting institutional knowledge.
  8. Building software to help operations and support teams.

What We Need To See

  1. Bachelor’s and/or Masters in computer science or related field of study (or equivalent experience).
  2. 8+ demonstrable experience deploying and supporting applications in a Cloud environment.
  3. Having Confluence, Jira, and Service Desk experience is a plus.
  4. Excellent Windows and Linux system skills.
  5. Good understanding of security components like SSL, load balancer, firewalls, etc.
  6. Extensive experience supporting applications in high-availability environments.
  7. Scripting skills to automate repetitive and basic tasks.
  8. Experience in documenting processes and procedures.
  9. Strong interpersonal skills with the ability to understand and explain technical issues to a non-technical audience.

Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family .

The base salary range is 160,000 USD - 247,250 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits.

NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

#J-18808-Ljbffr
Apply Now
Share this job
NVIDIA
  • Similar Jobs

  • Site Reliability Engineer

    Mountain View
    View Job
  • Site Reliability Engineer

    San Jose
    View Job
  • Site Reliability Engineer

    San Jose
    View Job
  • Site Reliability Engineer

    San Jose
    View Job
  • Site Reliability Engineer

    Cupertino
    View Job
An error has occurred. This application may no longer respond until reloaded. Reload 🗙