Site Reliability Engineer Infra and DevOps

Company:  Western Digital
Location: Milpitas
Closing Date: 19/10/2024
Salary: £150 - £200 Per Annum
Hours: Full Time
Type: Permanent
Job Requirements / Description

Job Description

Western Digital's reliance on software and software development workflows is growing by leaps and bounds as a leading provider of Storage Solutions. As a Secure Development Factory (SDF) Site Reliability Engineer - DevOps, you will be at the heart of Western Digital’s engineering process, delivering the software development tools and infrastructure that empower engineering teams to develop and deliver high-quality products quickly. You will play a pivotal role in ensuring the reliability, scalability, and performance of our IT infrastructure and DevOps tools. You will lead by example and collaborate closely with Engineering teams to align our efforts with customer requirements. Your technical expertise, adaptability, and commitment to excellence will drive the success and empower our stakeholders to develop and deliver high-quality products faster, reducing time to market without sacrificing security, development velocity, stability, code quality, or code health.

The ideal candidate will have a passion for technology, a relentless focus on the customer experience, and an ability to multitask, assimilate data, make decisions, and prioritize complex work while paying attention to the details. Communication with internal customers, vendors, and co-workers in a clear and professional manner is an absolute must. This position is open to candidates located in the PST time zone.

ESSENTIAL DUTIES AND RESPONSIBILITIES

  • Observability and Monitoring : Design, implement, and continuously improve monitoring and observability solutions to ensure effective and real-time visibility into system performance.
  • Best Practices : Advocate for and implement best practices in SRE, DevOps, and automation, with a focus on enhancing platform stability and performance.
  • Automation : Lead automation efforts to streamline processes, reduce manual tasks, and improve operational efficiency.
  • Architecting and Designing : Contribute to the architecture and design of systems and applications, aligning them with reliability and scalability goals.
  • Technical Accountability : Provide technical ownership in the SRE team, fostering a collaborative and growth-oriented environment.
  • Ownership : Take ownership of system reliability, meet Service Level Objectives (SLOs), and ensure customer satisfaction.
  • Collaboration : Work closely with Engineering teams to understand customer requirements and collaborate on solutions.
  • Adaptability : Stay updated with emerging technologies and adapt quickly to evolving requirements and challenges.
  • Upskilling : Continuously upskill in newer technologies and share knowledge within the team.
  • Team Player : Collaborate effectively with team members and contribute to a positive team culture.
  • Professional Behaviour : Demonstrate professionalism, integrity, and a commitment to the highest ethical standards.
  • Documentation : Maintain thorough and well-organized documentation of systems and processes.

Qualifications:

REQUIRED

  • Candidates MUST POSSESS a B.S. in Computer Science, Information Technology, Electrical Engineering, or Mechanical Engineering, with 6 to 10 years of hands-on experience in DevOps tools and SRE practices.
  • MUST POSSESS Administration experience on DevOps tools such as Artifactory, Jenkins, Git, Blackduck, SAST/DAST tools, etc.
  • MUST POSSESS a very good understanding of Infrastructure at the Server, VMWare, Storage, and Networking.
  • Exceptional analytical, problem-solving, and troubleshooting skills to manage complex process and technology issues.
  • Extensive experience in Ansible automation (Research, Write, Maintain, and Optimize roles/playbooks/modules).
  • Expertise in shell scripting, Python, and other configuration management tools like Terraform.
  • Development and customization of CI/CD pipelines and onboarding applications with varying requirements.
  • Experience in monitoring enhancements and metrics dashboarding using tools such as Icinga, Splunk, Prometheus & Grafana.
  • Good to have experience in containerization technologies like Docker and Kubernetes.
  • Automation First mindset.
  • Focus on embedding Security postures in the systems.
  • Working experience in HAProxy, load balancers, LDAP/SSO integration, security endpoint configurations.

SKILLS

  • Knowledge of cloud computing platforms (e.g., AWS, Azure, GCP) is a plus.
  • Excellent communication and collaboration skills.

Additional Information

Equal Employment Opportunity
Western Digital is committed to providing equal opportunities to all applicants and employees and will not discriminate against any applicant or employee based on their race, color, ancestry, religion, sex, gender, age, national origin, sexual orientation, medical condition, marital status, physical or mental disability, or other legally protected characteristics. We also prohibit harassment of any individual on any of the characteristics listed above.

Western Digital thrives on the power and potential of diversity. As a global company, we believe the most effective way to embrace the diversity of our customers and communities is to mirror it from within. We are committed to an inclusive environment where every individual can thrive through a sense of belonging, respect, and contribution.

Based on our experience, we anticipate that the application deadline will be 01/09/2025 (3 months from posting), although we reserve the right to close the application process sooner if we hire an applicant for this position before the application deadline.

#J-18808-Ljbffr
Apply Now
Share this job
Western Digital
  • Similar Jobs

  • Site Reliability Engineer - Infra and DevOps

    San Jose
    View Job
  • Site Reliability Engineer - Infra and DevOps

    San Jose
    View Job
  • Site Reliability Engineer - Infra and DevOps

    San Jose
    View Job
  • Site Reliability Engineer - Infra and DevOps

    Milpitas
    View Job
  • DevOps/Site Reliability Engineer

    San Jose
    View Job
An error has occurred. This application may no longer respond until reloaded. Reload 🗙