Job Summary:
Our Performance and Reliability teams are leading the improvements, optimization, and availability of applications across the Disney organization and business units, taking a consultative approach to Reliability Engineering by supporting, educating, mentoring, and delivering automation to foster performance and resiliency in best practice.
The Senior Site Reliability Engineer is a key member of our Performance and Reliability embedded teams. We focus on planning, scoping, solution architecting, software design, and implementation based on functional and performance capability requirements. We leverage cloud-native, commercial, and open-source tools and frameworks to solve complex business needs. These solutions touch a wide range of functional areas. This role will collaborate with Software Engineers, Product Owners, and others across teams and business areas to influence solutions and platforms across the organization.
Responsibilities:
- Build solutions for problems of sizable scope and complexity that have been successfully deployed to customers.
- Champions Infrastructure as Code (IaC); provides thought leadership; establishes enterprise-level infrastructure patterns.
- Builds and enhances Continuous Integration and Delivery (CI/CD) pipelines.
- Regularly review existing systems, policies, and practices, while identifying solutions that enhance service delivery efficiency, and enhance the current environment.
- Mentors less experienced engineers. Collaborates with product engineering leaders to find innovative solutions for moderately complex problems.
- Writes code that establishes and enhances frameworks, typically for software programs and systems that have little or no precedent.
- Reviews code for the design, testability, and clear usability.
- Develops specifications for assigned components, projects or fixes.
- Builds solutions that scale and perform.
- Participates in project proposal, architecture, and design. Contributes to architecture design and implementation of assigned projects and may lead in the effort.
- Oversees technical maintenance. Performs troubleshooting for systems that tend to be large and highly complex.
- Design, development, documentation and/or testing.
- Applies experience to resolve a variety of complex issues.
Basic Qualifications:
- 7+ years within the Reliability Engineering field.
- Well-versed with Reliability Engineering principles, patterns, and best practices.
- Ability to understand the business domain from both a technical and product viewpoint.
- 5+ Years experience working with AWS Cloud Infrastructure and resources.
- 5+ Years experience in designing and implementing automation tools.
- 5+ Years experience running and monitoring large scale distributed systems.
- Proficient in Python and/or other coding language.
- Well-versed with modern infrastructure services and concepts such as containerization, distributed systems and microservices.
- Experience designing and implementing automation tools.
- Well-versed in Software Engineering principles and patterns.
- Experience working with globally distributed teams.
- Experience as a coach and mentor within a business environment.
- Experience working within an Agile environment.
Preferred Qualifications:
- Bachelor's degree in computer science or related field, or equivalent training or work experience.