Company:
JobRialto
Location: West Lake Hills
Closing Date: 03/11/2024
Hours: Full Time
Type: Permanent
Job Requirements / Description
Job Summary
The Site Reliability Engineer (SRE) is responsible for leading the production support, readiness, availability, and resiliency of critical applications, infrastructure, and batches. This role focuses on cloud computing (AWS & Azure), enterprise tools (Jenkins, Docker, Kubernetes, etc.), and implementing practices in resiliency engineering, automation, observability, and chaos testing. The SRE will work within a centralized support services team, managing complex distributed systems and enhancing reliability across all levels of the infrastructure.
Key Responsibilities
• Oversee production support, ensuring readiness, availability, and resiliency of critical systems and applications.
• Lead implementation of practices related to resiliency engineering, automation, observability, and chaos testing.
• Collaborate with cross-functional teams to address issues related to hardware, software, network, applications, and cloud service providers.
• Provide cloud and platform engineering support for production environments, participating in an on-call rotation.
• Solve application issues in Unix/Linux environments using J2EE, WebSphere, Tomcat, and SQL.
• Utilize observability tools like Prometheus, Grafana, ELK, Datadog, and Splunk to monitor applications and infrastructure.
• Ensure scalability and resiliency of complex distributed systems.
• Implement infrastructure as code using tools like Terraform, Chef, and Ansible.
• Automate day-to-day activities using Python and Ansible.
• Perform chaos testing to build system resilience.
• Manage and monitor SSL certificates and handle security and patching for on-prem servers.
Required Qualifications
• Bachelor's degree in a technology-related field (e.g., Engineering, Computer Science).
• 5-8+ years of hands-on experience deploying or supporting multi-tiered distributed systems.
• Hands-on experience with public cloud environments (AWS and Azure).
• Experience with container orchestration (Kubernetes).
• Experience with batch processing tools (Control-M, Informatica).
• Strong understanding of cloud computing and DevOps concepts, including CI/CD pipelines.
• Experience with monitoring and observability tools (Prometheus, Datadog, Grafana, Splunk).
• Experience in Unix/Linux environments with J2EE, WebSphere, Tomcat, and SQL.
• Familiarity with ITIL processes like incident and change management.
• Hands-on experience with infrastructure as code tools (Terraform, Ansible, Chef).
• Proven experience in chaos testing and building resilient systems.
• Strong skills in scripting languages (Python, Bash, Korn).
Preferred Qualifications
• Experience in cloud development and migration skills.
• Proficiency in managing large datasets using query languages and visualization tools.
• Strong understanding of API testing tools (SoapUI, Postman).
• Experience with Agile methodology and handling on-prem server fleets.
• Experience with web development (Django, JavaScript).
Certifications
• AWS or Azure cloud certifications (preferred but not required).
Education: Bachelors Degree
The Site Reliability Engineer (SRE) is responsible for leading the production support, readiness, availability, and resiliency of critical applications, infrastructure, and batches. This role focuses on cloud computing (AWS & Azure), enterprise tools (Jenkins, Docker, Kubernetes, etc.), and implementing practices in resiliency engineering, automation, observability, and chaos testing. The SRE will work within a centralized support services team, managing complex distributed systems and enhancing reliability across all levels of the infrastructure.
Key Responsibilities
• Oversee production support, ensuring readiness, availability, and resiliency of critical systems and applications.
• Lead implementation of practices related to resiliency engineering, automation, observability, and chaos testing.
• Collaborate with cross-functional teams to address issues related to hardware, software, network, applications, and cloud service providers.
• Provide cloud and platform engineering support for production environments, participating in an on-call rotation.
• Solve application issues in Unix/Linux environments using J2EE, WebSphere, Tomcat, and SQL.
• Utilize observability tools like Prometheus, Grafana, ELK, Datadog, and Splunk to monitor applications and infrastructure.
• Ensure scalability and resiliency of complex distributed systems.
• Implement infrastructure as code using tools like Terraform, Chef, and Ansible.
• Automate day-to-day activities using Python and Ansible.
• Perform chaos testing to build system resilience.
• Manage and monitor SSL certificates and handle security and patching for on-prem servers.
Required Qualifications
• Bachelor's degree in a technology-related field (e.g., Engineering, Computer Science).
• 5-8+ years of hands-on experience deploying or supporting multi-tiered distributed systems.
• Hands-on experience with public cloud environments (AWS and Azure).
• Experience with container orchestration (Kubernetes).
• Experience with batch processing tools (Control-M, Informatica).
• Strong understanding of cloud computing and DevOps concepts, including CI/CD pipelines.
• Experience with monitoring and observability tools (Prometheus, Datadog, Grafana, Splunk).
• Experience in Unix/Linux environments with J2EE, WebSphere, Tomcat, and SQL.
• Familiarity with ITIL processes like incident and change management.
• Hands-on experience with infrastructure as code tools (Terraform, Ansible, Chef).
• Proven experience in chaos testing and building resilient systems.
• Strong skills in scripting languages (Python, Bash, Korn).
Preferred Qualifications
• Experience in cloud development and migration skills.
• Proficiency in managing large datasets using query languages and visualization tools.
• Strong understanding of API testing tools (SoapUI, Postman).
• Experience with Agile methodology and handling on-prem server fleets.
• Experience with web development (Django, JavaScript).
Certifications
• AWS or Azure cloud certifications (preferred but not required).
Education: Bachelors Degree
Share this job
JobRialto
Useful Links