HPC Software Engineer (Hybrid)

Company:  XCEL Engineering Inc
Location: Oak Ridge
Closing Date: 05/11/2024
Hours: Full Time
Type: Permanent
Job Requirements / Description
COMPANY OVERVIEW
XCEL Engineering, Inc. is an award-winning small business that provides trusted information technology, engineering, consulting and project management solutions and services to federal agencies and organizations. Originally founded in 1971 by professional engineers at the University of Tennessee, XCEL was acquired in 2003 by U.S. Army and Navy veterans and in 2023 became a MartinFed company.
XCEL Engineering is a part of IT Lab Partners (ITLP) which was created to support a leading research facility in the East Tennessee region in recruiting the best and the brightest technical talent. Consider joining our impressive team today!
JOB OVERVIEW
Xcel Engineering is seeking qualified applicants for an HPC Software Engineering role. Our HPC engineering team facilitates the mission of ORNL through HPC systems engineering, integration, and support for the research community. By providing design, deployment, optimization, monitoring, and tooling support across multiple clustered infrastructures, we facilitate Lab-wide R&D projects. Our HPC clusters range in scope from just a handful of nodes to greater than fifty thousand cores.
ESSENTIAL FUNCTIONS
  • Scientific Software and Application Management:
    • Understand scientific software users' requirements: work closely with researchers to understand their computational needs and translate them into efficient HPC applications. Analyze application performance to identify bottlenecks and develop strategies to improve scalability and efficiency on HPC systems. This may involve profiling code, analyzing communication patterns, and tuning system parameters.
    • Install and manage scientific software: deploy and maintain a wide range of scientific applications, libraries, and development tools on HPC systems to support research activities.
    • Develop custom tools and scripts: develop tools to automate common tasks, improve systems management, and facilitate sophisticated computational workflows. Develop, maintain, and install software for HPC and data intensive architectures, including Graphic Processing Units (GPUs), parallel systems, and other computing environments.
  • User support and collaboration:
    • Provide software technical support: collaborate with HPC support and scientists on technical issues related to scientific software problems. Following industry standards, implement HPC software with novel programming and optimization techniques. Provide solutions and technical recommendations for code optimization, resource utilization, and system tuning.
    • Collaborate on research projects: work closely with researchers to understand their computational requirements and assist in developing efficient computational strategies, code optimization, and parallelization. This includes working with a highly diverse and multidisciplinary team (such as mathematicians, physicists, computer scientists, and engineers) in the research, development, integration, testing, and deployment of research software, data platforms, and machine learning systems for large-scale data analysis.
    • Research information dissemination: support research staff in disseminating results in peer-reviewed journals, technical reports, relevant conferences, and open-source software project repos.
  • Research and development:
    • Stay informed about latest research in HPC and AI.
    • Develop and recommend ideas for new programs, products, and features by staying abreast of new technology developments and trends.
  • Partnerships and collaboration:
    • As applicable/possible- establish and maintain partnerships and collaborations with industry, other groups at the lab, and HPC networks to share knowledge and best practices.
  • Deliver the mission by aligning behaviors, priorities, and interactions with our core values of Impact, Integrity, Teamwork, Safety, and Service. Promote diversity, equity, inclusion, and accessibility by encouraging a respectful workplace - in how we treat one another, work together, and measure success.
BASIC QUALIFICATIONS
  • A BS in computer science, computer engineering, information systems, or a related field of study and five (5) to seven (7) years of proven and aligned experience is required. An overall combination of equivalent experience may be considered.
  • United States Citizen with the ability to obtain and maintain a DOE Security Clearance.
  • Three (3) or more years of demonstrated abilities in the following areas:
    • High Performance Computing (HPC) environments and HPC scheduling software.
    • Software development including version control using GitWith open-source tools and software.
    • Python and data analysis modules such as Pandas, NumPy, and Dask.
    • Developing software in C/C++, Fortran or other programming languages.
DESIRED QUALIFICATIONS
  • In-depth understanding of HPC architectures and their optimization techniques.
  • Experience in the following areas:
    • Optimizing and parallelizing software products for HPC using MPI or other open-source tools.
    • HPC debugging tools such as DDT, GDB or Valgrind.
    • AI toolkits such as PyTorch, RAPIDSAI, TensorFlow, or Keras.
    • Statistical analysis software such as Python or R.
    • Building and running containerized applications in an HPC environment.
    • Cluster deployment tools such as Warewulf, PXEboot, and/or Bright.
    • Managing systems.
    • Working in a government, scientific, or other highly technical environment.
  • Knowledge of multiple operating systems including Linux.
  • Exposure to microservices concepts and understanding of container environments including Podman, Docker, and Kubernetes.
  • Proven ability to balance sophisticated research and security requirements.
PHYSICAL REQUIREMENTS & ENVIRONMENTAL CONDITIONS
  • Inside office environment.
  • Working on a computer for long periods of time.
  • May involve long period of sitting at a desk.
  • The work environment is fast-paced and sometimes involves extreme deadline pressures.

OTHER DUTIES
This job description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee for this job. Duties, responsibilities and activities may change at any time with or without notice.
Xcel Engineering is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regards to race, color, religion, religious creed, gender, sexual orientation, gender identity, gender expression, transgender, pregnancy, marital status, national origin, ancestry, citizenship status, age, disability, protected Veteran Status, genetics or any other characteristics protected by applicable federal, state or local law.
If you are a qualified individual with a disability or disabled veteran, you have the right to request a reasonable accommodation if you are unable or limited in your ability to use or access Xcel Engineering's current openings as a result of your disability. You can request reasonable accommodations by calling 855.212.1810. Thank you for your interest in Xcel Engineering.
All positions at Xcel Engineering, Inc. are contingent upon passing both a background check and drug screening prior to a start date and are subject to random drug screenings during the employment period. In addition, Xcel Engineering is an E-Verify employer.
Apply Now
Share this job
XCEL Engineering Inc
  • Similar Jobs

  • Linux HPC Systems Engineer (Hybrid Eligible)

    Oak Ridge
    View Job
  • Linux HPC Systems Engineer (Hybrid Eligible)

    Oak Ridge
    View Job
  • HPC Engineer, User Assistance, Entry Level (Hybrid Eligible)

    Oak Ridge
    View Job
  • Team Lead for HPC Engineering (Hybrid Eligible)

    Oak Ridge
    View Job
  • Team Lead for HPC Engineering (Hybrid Eligible)

    Oak Ridge
    View Job
An error has occurred. This application may no longer respond until reloaded. Reload 🗙