Site Reliability engineering (SRE)

Company: TechDigital Corporation

Location: Orlando

Closing Date: 04/11/2024

Hours: Full Time

Type: Permanent

Apply Now

Job Requirements / Description

Responsibilities: Lead the design, implementation, and management of complex systems architecture that emphasizes reliability, scalability, and performance. Collaborate closely with engineering teams to set and uphold service-level objectives (SLOs) and work on continuous improvements to achieve these goals. Mentor and guide junior members of the SRE/PRE team, fostering their technical growth and professional development. Solve intricate technical challenges across the entire technology stack, from hardware and infrastructure to applications and databases. Develop and implement robust automation solutions for deployment, configuration management, and infrastructure provisioning. Play a pivotal role in capacity planning, performance tuning, and optimizing systems for seamless scalability. Drive the establishment of comprehensive monitoring, alerting, and logging strategies to ensure prompt identification and resolution of issues. Participate in on-call rotations and respond promptly to incidents, taking ownership of resolution and post-incident analysis. Continuously advance best practices and processes, promoting a culture of reliability and operational excellence. Collaborate with stakeholders to ensure alignment between development and operations, contributing to product evolution and enhancements.Qualifications: Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience). 7+ years of experience in an SRE, PRE, or similar role, demonstrating a proven track record in driving system reliability and performance. Proficiency in programming languages such as Python, Go, or similar for automation and tool development. Expertise in cloud platforms (e.g., AWS, GCP, Azure) and container technologies (e.g., Kubernetes, Docker). Deep understanding of networking, operating systems, and distributed systems architecture. Experience with infrastructure as code tools (e.g., Terraform, Ansible) for provisioning and configuration management. Strong grasp of observability tools and practices (e.g., Prometheus, Grafana, ELK stack). Exceptional troubleshooting skills and the ability to diagnose complex technical issues. Outstanding communication skills to collaborate effectively with diverse teams. Proactive mindset and a focus on delivering exceptional customer experiences. Optional: Relevant certifications such as Certified Kubernetes Administrator, AWS DevOps Professional, or similar. (1.) To ensure customer engagement or satisfaction and referenceability (2.) To plan for Program and Delivery Management and ensure that the agreed deliverables in terms of margin are met. (3.) To anchor process improvementorcompliance (human error reporting) and other organizational initiatives (automation , Lean IT implemetation) (4.) To guide, manage, develop, engage the team therby ensuring employee retention (5.) To ensure upskillor creation of resources through internal academiesor trainings and growth rotation

Apply Now

Share this job

TechDigital Corporation

Useful Links

More Jobs in Orlando
Full Time Jobs in Orlando
Part Time Jobs in Orlando
Engineering Jobs
Devops Jobs

Similar Jobs
Site Reliability Engineer (SRE)
Orlando
View Job
Mgr-Site Reliability Engineering
Orlando
View Job
Mgr-Site Reliability Engineering
Orlando
View Job
Mgr-Site Reliability Engineering
Orlando
View Job
Mgr, Site Reliability Engineering
Orlando
View Job

Site Reliability engineering (SRE)

Similar Jobs