Software Engineer, Systems ML - HPC Specialist

Company:  Meta
Location: Santa Fe
Closing Date: 08/11/2024
Salary: £100 - £125 Per Annum
Hours: Full Time
Type: Permanent
Job Requirements / Description

Meta Software Engineer, Systems ML - HPC Specialist

Location: Santa Fe, New Mexico

Meta is seeking an AI Software Engineer to join our Research & Development teams. The ideal candidate will have industry experience working on AI Infrastructure related topics. The position will involve applying these skills to solve crucial and exciting problems on the web. As an HPC specialist, responsibilities may include:

  1. Authoring components such as cuBLAS, cuDNN, AITemplate, FlashAttention, and developing runtimes like LLM disaggregated runtime.
  2. Optimizing programs to reduce accelerators' idle time.
  3. Developing tools for debugging (cuda-gdb) and profiling utilizing accelerated computing hardware (such as PE’s/SFU in MTIA or Transformer engine in H100).
  4. Designing, debugging, and accelerating AI workloads from single-node to multi-node distributed systems.
  5. Influencing the next generation of Silicon architectures based on evolving AI workload needs.

Required Skills:

Responsibilities include:

  1. Applying relevant AI and machine learning techniques to build and optimize intelligent systems that improve Meta's products and experiences.
  2. Developing custom/novel architectures, defining use cases, and developing methodology and benchmarks to evaluate different approaches.
  3. Applying in-depth knowledge of how the machine learning system interacts with surrounding systems.
  4. Assisting in goal setting related to project impact, AI system design, and ML excellence.

Minimum Qualifications:

  1. Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience.
  2. 2+ years of experience in HPC and parallel computing.
  3. Proficiency in GPU programming using CUDA and familiarity with CUDA libraries (cuBLAS, cuDNN, etc.).
  4. Proven track record of leading successful HPC projects.
  5. Proven technical expertise in HPC architectures and technologies.

Preferred Qualifications:

  1. PhD in Computer Science, Computer Engineering, or relevant technical field.
  2. Experience developing AI algorithms or AI-System infrastructure in C/C++ or Python.
  3. Experience developing AI Compiler (TorchInductor in PyTorch 2.0).

Public Compensation:

$70.67/hour to $208,000/year + bonus + equity + benefits

Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state, and local law. Meta participates in the E-Verify program in certain locations, as required by law.

Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at

#J-18808-Ljbffr
Apply Now
Share this job
Meta
  • Similar Jobs

  • Software Engineer, Systems ML - SW/HW Co-design

    Santa Fe
    View Job
  • Senior Software Engineer - Distributed Systems

    Santa Fe
    View Job
  • Senior Software Developer, Systems Software

    Santa Fe
    View Job
  • Remote Senior Principal Software Engineer - Distributed Systems

    Santa Fe
    View Job
  • Remote Senior Software Engineer - Distributed Systems/Big Data

    Santa Fe
    View Job
An error has occurred. This application may no longer respond until reloaded. Reload 🗙