Principal Systems Engineer – Reliability

Company:  Global Jupiter
Location: Atlanta
Closing Date: 18/10/2024
Salary: £125 - £150 Per Annum
Hours: Full Time
Type: Permanent
Job Requirements / Description

Cox Communications is seeking a skilled Principal Systems Engineer to join our team and lead Reliability improvement efforts across our enterprise, with particular focus on network infrastructure and systems. As a Principal Systems Engineer – Reliability, you will be instrumental in defining and coordinating reliability enhancement initiatives across our network. Leveraging your expertise in systems engineering and reliability analysis, you will drive strategic initiatives to minimize downtime, optimize performance, and enhance the overall reliability of our systems to significantly improve the customer experience.

Key Responsibilities:
• Reliability Strategy: Develop and execute a comprehensive reliability strategy for our enterprise systems, including network infrastructure, server platforms, and software applications.
• Enterprise Coordination: Collaborate with cross-functional teams to prioritize and implement reliability improvement efforts across the organization, ensuring alignment with business objectives and industry standards.
• Root Cause Analysis: Define scalable processes for root cause analysis investigations for system failures and performance degradation, identifying underlying issues and implementing corrective actions to prevent recurrence.
• Risk Management: Assess potential risks to system reliability, such as hardware failures, software bugs, and configuration errors. Develop and implement risk mitigation strategies to enhance system resilience.
• Performance Monitoring: Establish robust monitoring systems to track system performance metrics and reliability indicators. Analyze data to identify trends, anticipate potential issues, and proactively address reliability challenges.
• Continuous Improvement: Drive a culture of continuous improvement by identifying opportunities for process optimization, automation, and efficiency gains in reliability enhancement initiatives.
• Vendor Management: Collaborate with vendors and suppliers to ensure the reliability of system components and software. Evaluate vendor performance and provide feedback to drive product improvements.
• Documentation and Reporting: Maintain comprehensive documentation of reliability improvement efforts, including procedures, policies, and incident reports. Prepare regular reports and presentations for senior leadership, highlighting progress and key performance metrics.

Minimum Qualifications:
• Bachelor’s degree in a related discipline and 10 years’ experience in a related field. The right candidate could also have a different combination, such as a master’s degree and 8 years’ experience; a Ph.D. and 5 years’ experience in a related field; or 22 years’ experience in a related field
• Proven track record of defining and coordinating enterprise-wide reliability improvement initiatives, with a focus on system and network infrastructure.
• Deep understanding of system technologies, including network architecture, server platforms, and software applications.
• Strong analytical and problem-solving skills, with the ability to conduct root cause analysis and drive effective solutions.
• Excellent communication and collaboration skills, with the ability to work effectively with cross-functional teams and senior executives.
• Experience with reliability engineering tools and methodologies, such as FMEA, RCM, and fault tree analysis, is preferred.
• Relevant certifications such as CISSP, ITIL, or Six Sigma Black Belt are a plus

#J-18808-Ljbffr
Apply Now
Share this job
Global Jupiter
  • Similar Jobs

  • Principal Systems Engineer – Reliability

    Atlanta
    View Job
  • Principal Site Reliability Engineer

    Atlanta
    View Job
  • Principal Site Reliability Engineer

    Atlanta
    View Job
  • Principal Site Reliability Developer

    Atlanta
    View Job
  • Principal Engineer – Software – Store Systems

    Atlanta
    View Job
An error has occurred. This application may no longer respond until reloaded. Reload 🗙