Company:
CentML
Location: San Francisco
Closing Date: 16/10/2024
Salary: £200 - £250 Per Annum
Hours: Full Time
Type: Permanent
Job Requirements / Description
Overview:
We are seeking highly motivated and skilled systems engineers to join our team to help in developing the CentML platform that offers a cost-effective infrastructure for serving and training large scale machine learning models. As a systems engineer, you will play a crucial role in building a unified solution that brings our innovative in-house technologies such as Hidet compiler, DeepView, and other ML optimizations into a single, cohesive platform. Your expertise will drive the scalability, performance, and reliability of the platform, enabling our customers to seamlessly access and utilize a comprehensive suite of ML services that we offer.
Responsibilities:
- Taking part in the design and development of the CentML platform.
- Designing and building solutions for scheduling large scale ML training and inference workloads on GPU clusters over multiple CSPs.
- Communicate with our product teams and define use cases, and develop methodology & benchmarks to evaluate different approaches.
Qualifications:
- Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Graduate degree with research experience is a plus.
- Experience building large scale systems from scratch. Prior experience in container-based deployment systems like Kubernetes is a big plus.
- Strong coding skills (in at least one of Python and C++).
- Solid fundamentals in other computer science and computer engineering topics: algorithms and data structures, operating systems, computer architecture, etc.
Share this job
CentML
Useful Links