Company:
Divisions Maintenance Group
Location: Cincinnati
Closing Date: 20/10/2024
Salary: £100 - £125 Per Annum
Hours: Full Time
Type: Permanent
Job Requirements / Description
Description
Job Description:
As a Infra Reliability Engineer 2, you will play a key role in ensuring the availability and performance of our infrastructure and applications. You will also be responsible for incident response and triage, ensuring a swift and effective response to security incidents and operational disruptions.
Key Responsibilities:
- Kubernetes and Container Orchestration: Maintain and optimize our Kubernetes-based infrastructure and Docker containers to ensure high availability and scalability.
- Cloud Infrastructure: Work with AWS services to design, implement, and manage scalable and resilient cloud infrastructure.
- CI/CD Pipelines: Manage and enhance CI/CD pipelines using tools like ArgoCD, Argo Workflows, and Helm for efficient software delivery.
- Scripting and Automation: Develop and maintain automation scripts using Python, Shell, or other scripting languages to streamline operational tasks.
- Incident Response And Incident Triage: Lead incident response efforts, including detection, containment, eradication, recovery, and post-incident analysis to ensure the security and integrity of our systems. Conduct initial triage of security incidents and operational disruptions to assess severity, gather information, and prioritize actions effectively.
- Configuration Management: Utilize configuration management tools such as Ansible and Chef to ensure consistent and reliable system configurations.
- Observability: Implement and maintain observability solutions using Prometheus, Grafana, Datadog, or similar tools to monitor and troubleshoot system performance.
- Collaboration: Collaborate closely with development teams to identify and resolve issues, improve application performance, and enhance system reliability.
- Documentation: Create and maintain comprehensive documentation for processes, configurations, and best practices.
Qualifications:
- Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent work experience).
- Minimum 5 years of experience in Site Reliability Engineering or DevOps role.
- Strong experience with Kubernetes and Docker containerization.
- Proficiency in AWS services and cloud infrastructure management.
- Hands-on experience with CI/CD platforms like ArgoCD, Argo Workflows, Spinnaker, and Helm.
- Proficiency in scripting languages such as Python and Shell.
- Knowledge of configuration management tools, including Ansible and Chef.
- Familiarity with observability platforms like Prometheus, Grafana, and Datadog.
- Excellent troubleshooting skills and the ability to work collaboratively with cross-functional teams.
- Experience in incident response and triage, with the ability to assess, contain, and remediate security incidents.
- Strong communication skills and the ability to document processes effectively.
Good To Have:
- Development experience in one of the languages C# or Java.
- Database knowledge in the following databases: PostgreSQL, MongoDB, or MySQL.
- Familiarity with SecOps tools such as WAF (Web Application Firewall), Trusted Advisor, or similar security tools.
- Experience in any Internal Developer Platforms such as Backstage.
- Experience in Java Script.
Share this job
Divisions Maintenance Group
Useful Links