Principal Site Reliability Engineer, Sovereign Cloud Operations

Company:  Oracle
Location: Reston
Closing Date: 15/10/2024
Salary: £125 - £150 Per Annum
Hours: Full Time
Type: Standard
Job Requirements / Description

We are looking for a Principal Site Reliability Engineering (SRE) engineer with 10-plus years of industry experience to join our Sovereign Cloud Operations team. This role is responsible for ensuring the reliability and availability of our sovereign cloud production systems and driving automation and tooling enhancements for our operators. To achieve this outcome, they will work closely with the Oracle Cloud Infrastructure service team and our Operability Improvement organization to implement and maintain a high level of system hygiene and identify and address potential issues that impact the positive experience of our cloud customers.

We are looking for a candidate who is passionate about operations and willing to take ownership of our systems' performance. The candidate should be comfortable working in a fast-paced environment and able to quickly identify and address issues. You must be a strong collaborator, developing solid partnerships across the business to foster outcomes for our customers. Experience with cloud infrastructure architecture and interaction is a must to be successful in this role.

Primary Responsibilities:

  • Serve as a technical leader for OCI cloud services across the operations teams servicing sovereign realms.
  • Deep dive into complex customer issues and assist customer support, sovereign cloud operators, and customer account managers in resolving them.
  • Decompose operational issues impacting sovereign cloud operators’ efficiency and help facilitate solutions.
  • Collaborate with the Operability Improvement organization to drive tooling and automation to improve change safety and reduce operator toil.
  • Provide rapid ad hoc solutions (e.g., scripting/coding) to provide near-term operational improvements as a stop-gap measure while long-term solutions are developed.
  • Establish yourself as a technical leader and operational champion for the sovereign cloud operator. Passion and love for operations, as an engineering discipline, are essential to success in this role.

Qualifications:

  • U.S. Citizenship Required.
  • Bachelor’s degree or higher in Computer Science or a related field.
  • 10+ years of SRE/DevOps experience (operations-focused).
  • Experience operating services in one of the significant Clouds such as AWS, OCI, Azure, etc.
  • Knowledge/Experience working with government clients to deliver IT services.
  • Strong knowledge of cloud infrastructure, distributed systems, and network architecture.
  • Proven track record of supporting large, complex, scalable systems/applications in an agile environment.
  • Change management, continuous integration, and deployment best practices.
  • Strong problem-solving and troubleshooting skills, with the ability to analyze complex systems and identify areas for improvement.
  • Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.
  • Proficiency in scripting or programming languages like Python, Go, or Bash.
  • Experience with automation and configuration management tools like Terraform, Ansible, or Chef.
  • Familiarity with monitoring and alerting tools such as Prometheus or Grafana.
  • Adapting to a fast-paced, dynamic environment and managing multiple tasks and priorities effectively.

Career Level - IC5

#J-18808-Ljbffr
Apply Now
Share this job
Oracle
Oracle
  • Similar Jobs

  • Principal Site Reliability Engineer, Sovereign Cloud Operations

    Reston
    View Job
  • Cloud Reliability Engineer

    Chantilly
    View Job
  • Cloud Reliability Engineer

    Chantilly
    View Job
  • Principal Site Reliability Engineer @ Chameleon Consulting Group

    Herndon
    View Job
  • Sovereign Cloud Senior DevOps Engineer - SAP SuccessFactors (HCM)

    Reston
    View Job
An error has occurred. This application may no longer respond until reloaded. Reload 🗙