Senior Infrastructure Engineer

Company:  Hillbot
Location: San Diego
Closing Date: 29/10/2024
Hours: Full Time
Type: Permanent
Job Requirements / Description

About Us:

Hillbot is a pioneering start-up headquartered in San Diego, co-founded by leading scientists in artificial intelligence. Our mission is to pioneer the future of robotics by merging cutting-edge Generative AI with advanced robotics technologies. We strive to develop comprehensive robot foundation models that will revolutionize the field and set new industry standards. We are seeking a highly skilled Senior Infrastructure Engineer who is passionate about infrastructure, data, and machine learning, and ready to take on the challenge of building from the ground up.


Key Responsibilities:

Design and Development

  • Collaborate closely with researchers and engineers to design and implement scalable, reliable, and efficient infrastructure solutions for processing and analyzing large volumes of multimedia data;
  • Set up and maintain software architecture to support the scaling of data processing and training of transformer-based models; 
  • Implement and manage cloud-based infrastructure using platforms like AWS, Google Cloud, or Azure.
  • Build and maintain containerization and orchestration systems using Docker and Kubernetes to ensure reproducible software practices.
  • Deploy distributed services on public networks, utilizing reverse proxies to securely expose them.

Software and System Optimization

  • Monitor and troubleshoot software and hardware issues on large GPU clusters, including GPU connectivity issues, storage system outages, stalls, Kubernetes-related errors, and networking degradations or outages.
  • Be able to solve the software and hardware issues quickly if possible and communicate effectively with relevant vendors otherwise.
  • Improve existing storage systems and set up new scalable storage solutions for data processing and model training.
  • Take a quantitative and rigorous approach to measuring and improving code, pipeline, cost, and developer efficiency.

Implementation and Development Support

  • Partner with software engineers to enhance and support developer operations.
  • Contribute to SDKs and APIs used internally.
  • Educate team members and document best practices for coding, testing, and deployment operations.
  • Build and rebuild containers effortlessly, ensuring reproducible software practices using Docker, Kubernetes, and other containerization technologies.

Continuous Learning

  • Stay updated with the latest advancements in infrastructure technologies, machine learning tools, and software solutions relevant to our implementations.
  • Identify opportunities to improve software efficiency and usability, driving initiatives to implement these enhancements.

Leadership

  • Mentor and guide junior engineers, fostering a culture of continuous learning and improvement.
  • Lead projects and initiatives, ensuring timely and successful delivery of solutions.


Required Qualifications:

  • Education: Bachelor's degree in Computer Science, Electrical Engineering, Mathematics, or a related field.
  • Experience:
  • 5+ years of relevant work/research experience in infrastructure engineering, particularly supporting machine learning and data science workloads.
  • Proven experience designing and analyzing performance bottlenecks for large-scale data processing systems and storage systems for model training.
  • Expertise with cloud platforms (AWS, Google Cloud, Azure) and containerization technologies (Docker, Kubernetes).
  • Knowledge of GPU infrastructure (monitoring, GPU/RoCE/networking management commands, Ansible) and experience with hardware acceleration technologies (GPUs, TPUs).
  • Understanding of networking concepts, including routing, firewalls, certificates, and using reverse proxies to securely expose distributed services.
  • Strong programming skills in Python and at least one of C/C++ (both are a plus).
  • Demonstrated proficiency with software development best practices (e.g., test-driven development) and version control systems (Git).
  • Skills:
  • Strong analytical and problem-solving abilities.
  • Excellent communication and teamwork skills.
  • Ability to work in a fast-paced, dynamic environment and adapt to changing priorities.


Preferred Qualifications

  • Solid understanding of distributed, high-performance SQL and NoSQL databases and experience with data management technologies for real-time data analytics (e.g., cloud-native databases, HTAP solutions, Apache Arrow).
  • Familiarity with frameworks such as TensorFlow, PyTorch, Keras, and deployment libraries like GStreamer, ONNX, TorchScript, TensorRT.
  • Experience and enthusiasm for mentoring junior engineers.


What We Offer:

  • Opportunity to build and shape the infrastructure stack from the ground up in a rapidly growing company.
  • Impactful role in driving innovative infrastructure strategies critical to the growth and success of Hillbot.ai.
  • Collaborative and inclusive work environment that values creativity, initiative, and professional growth.
  • Competitive salary and benefits package.
  • Visa and immigration support.
  • Unlimited PTO.
  • Employer 401k match.


How to Apply:

If you are passionate about infrastructure, data, and machine learning and ready to take on the challenge of building from the ground up, we want to hear from you! Please send your resume and a cover letter detailing your relevant experience and why you are the perfect fit for this role to .

Apply Now
Share this job
Hillbot
  • Similar Jobs

  • Senior Engineer - Water Infrastructure

    San Diego
    View Job
  • Senior Engineer - Water Infrastructure

    San Diego
    View Job
  • Senior Engineer - Water Infrastructure

    San Diego
    View Job
  • Infrastructure Engineer

    San Diego
    View Job
  • Senior Software Engineer - Automation Tools and Infrastructure

    San Diego
    View Job
An error has occurred. This application may no longer respond until reloaded. Reload 🗙