Machine Learning Engineer-Model Training Infrastructure

Company:  ByteDance
Location: Seattle
Closing Date: 19/10/2024
Salary: £200 - £250 Per Annum
Hours: Full Time
Type: Permanent
Job Requirements / Description

Responsibilities:

  • Responsible for the design and implementation of a global-scale machine learning training system for feeds, ads and search ranking models.
  • Responsible for the design and the implementation of orchestration layer of machine learning offline/online training processes.
  • Responsible for improving use-ability and flexibility of the training APIs.
  • Responsible for profiling and optimizing both training and validation frameworks to ensure efficient use of resources.
  • Responsible for creating, managing, and optimizing data pipelines to ensure data availability for training.

Qualifications:

  • Proficient in C/C++/CUDA/Python, and have solid programming skills.
  • Familiar with deep learning frameworks (TensorFlow/Pytorch).
  • Experience in developing and deploying large-scale systems.
  • Ability to work independently and complete projects from beginning to end and in a timely manner.
  • Good communication and teamwork skills to clearly communicate technical concepts with other teammates.
  • Experience on improving core machine learning infrastructure (TensorFlow, Pytorch, and Jax).

Preferred Qualifications:

  • Experience contributing to an open sourced machine learning framework (TensorFlow/PyTorch).
  • Experience in big data frameworks (e.g., Spark/Hadoop/Flink), experience in resource management and task scheduling for large scale distributed systems.
  • Strong background in one of the following fields: Hardware-Software Co-Design, High Performance Computing, ML Hardware Acceleration (e.g., GPU/RDMA) or ML for Systems.

#J-18808-Ljbffr
Apply Now
Share this job
ByteDance
  • Similar Jobs

  • Machine Learning Engineer-Model Training Infrastructure

    Seattle
    View Job
  • Machine Learning Engineer-Model Serving Infrastructure

    Seattle
    View Job
  • Machine Learning Compute Infrastructure Engineer

    Seattle
    View Job
  • Large Machine Learning Model Optimization Engineer

    Seattle
    View Job
  • Software Engineer - Machine Learning Infrastructure

    Seattle
    View Job
An error has occurred. This application may no longer respond until reloaded. Reload 🗙