HPC Cluster Engineer Santa Clara, CA

Company:  Tbwa Chiat/Day Inc
Location: Santa Clara
Closing Date: 06/11/2024
Salary: £150 - £200 Per Annum
Hours: Full Time
Type: Permanent
Job Requirements / Description

Are you ready to make your mark in the forefront of technological innovation? As an HPC Cluster Engineer , you'll play a pivotal role in shaping the future of AI, deep learning, and machine learning initiatives. Join us and leverage Nvidia's cutting-edge GPU technology to drive groundbreaking discoveries and revolutionize industries.

Sustainable Talent is thrilled to partner with Nvidia , a global powerhouse with over 25 years of trailblazing advancements in computer graphics, gaming, and accelerated computing.

This is a W-2 full-time contract based in Santa Clara, CA - Hybrid work option. We offer competitive pay based on factors like experience, education, location, etc. and provide full benefits, PTO, and amazing company culture!

What you'll be doing:

  • You'll lead the charge in optimizing our Infiniband network and managing Lustre and GPFS storage solutions, ensuring seamless performance for our cutting-edge initiatives.
  • Your expertise in the SLURM job scheduler will be instrumental in orchestrating the smooth operation of our clusters, from scheduling tasks to managing resources efficiently.
  • As a Linux sysadmin guru, you'll be responsible for maintaining the stability and security of our systems, leveraging your deep understanding of Linux environments.
  • Harnessing the power of Ansible, you'll automate routine tasks and streamline operations, freeing up time for innovation and optimization.
  • Advanced python and bash scripting will drive automation efforts and enable dynamic solutions to complex challenges.

What We Need to See:

  • Demonstrated experience with SLURM, coupled with a solid understanding of Infiniband networks and Lustre/GPFS storage systems, is essential.
  • A proven track record in Linux system administration, ensuring robustness and security in our computing environment.
  • Proficiency in Ansible is a must-have, enabling you to automate tasks and workflows efficiently.
  • Strong scripting abilities in Python and bash are critical for developing custom solutions and optimizing cluster performance.

Ways to Stand Out From the Crowd:

  • Showcase your knowledge of best practices in HPC cluster operations, automation, and upgrades, setting you apart as a seasoned professional in the field.

Sustainable Talent is a M/F+, disabled, and veteran equal employment opportunity and affirmative action employer.

#J-18808-Ljbffr
Apply Now
Share this job
Tbwa Chiat/Day Inc
  • Similar Jobs

  • Software Engineer - Santa Clara, CA

    Santa Clara
    View Job
  • Software Engineer - Santa Clara, CA

    Santa Clara
    View Job
  • Mechanical Prototyping Engineer Santa Clara, CA

    Santa Clara
    View Job
  • Lead Reliability Engineer Santa Clara, CA

    Santa Clara
    View Job
  • Technical Support Engineer Santa Clara CA

    Santa Clara
    View Job
An error has occurred. This application may no longer respond until reloaded. Reload 🗙