Meta Software Engineer, Systems ML - HPC Specialist
Location: Santa Fe, New Mexico
Meta is seeking an AI Software Engineer to join our Research & Development teams. The ideal candidate will have industry experience working on AI Infrastructure related topics. The position will involve applying these skills to solve crucial and exciting problems on the web. As an HPC specialist, responsibilities may include:
- Authoring components such as cuBLAS, cuDNN, AITemplate, FlashAttention, and developing runtimes like LLM disaggregated runtime.
- Optimizing programs to reduce accelerators' idle time.
- Developing tools for debugging (cuda-gdb) and profiling utilizing accelerated computing hardware (such as PE’s/SFU in MTIA or Transformer engine in H100).
- Designing, debugging, and accelerating AI workloads from single-node to multi-node distributed systems.
- Influencing the next generation of Silicon architectures based on evolving AI workload needs.
Required Skills:
Responsibilities include:
- Applying relevant AI and machine learning techniques to build and optimize intelligent systems that improve Meta's products and experiences.
- Developing custom/novel architectures, defining use cases, and developing methodology and benchmarks to evaluate different approaches.
- Applying in-depth knowledge of how the machine learning system interacts with surrounding systems.
- Assisting in goal setting related to project impact, AI system design, and ML excellence.
Minimum Qualifications:
- Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience.
- 2+ years of experience in HPC and parallel computing.
- Proficiency in GPU programming using CUDA and familiarity with CUDA libraries (cuBLAS, cuDNN, etc.).
- Proven track record of leading successful HPC projects.
- Proven technical expertise in HPC architectures and technologies.
Preferred Qualifications:
- PhD in Computer Science, Computer Engineering, or relevant technical field.
- Experience developing AI algorithms or AI-System infrastructure in C/C++ or Python.
- Experience developing AI Compiler (TorchInductor in PyTorch 2.0).
Public Compensation:
$70.67/hour to $208,000/year + bonus + equity + benefits
Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state, and local law. Meta participates in the E-Verify program in certain locations, as required by law.
Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at
#J-18808-LjbffrSimilar Jobs
- View Job
Software Engineer, Systems ML - SW/HW Co-design
Santa Fe - View Job
Senior Software Engineer - Distributed Systems
Santa Fe - View Job
Senior Software Developer, Systems Software
Santa Fe - View Job
Remote Senior Principal Software Engineer - Distributed Systems
Santa Fe - View Job
Remote Senior Software Engineer - Distributed Systems/Big Data
Santa Fe