Site Reliability Engineer

Company:  Themesoft Inc.
Location: Dallas
Closing Date: 07/11/2024
Hours: Full Time
Type: Permanent
Job Requirements / Description

SRE Engineer/ Dallas, TX Location / FTE / Hybrid Role.


Job Description:


The Site Reliability Engineer is a fundamental piece of the Site Reliability Engineering team. Site Reliability Engineering is accountable for the availability, reliability, and performance of the services and platforms in a highly transactional 24x7 environment.

The role

  • Monitor application performance, take steps to improve overall application performance and stability, and follow through with implementation.
  • Apply automation and software to any tasks or parts of the system that would benefit from it or are performed manually.
  • Able to troubleshoot issues handling OS, Networking, databases in a cloud-based environment/on-premises environment and handle live production incidents, debug/troubleshoot application, and infrastructure issues, follow and implement SRE best practices.
  • Coordinate with Product owners/business representatives to define Service Level Objectives and error budgets for key functionalities of the projects
  • Participate in design reviews of software/components with build teams to ensure that they are built right.
  • Review products prior to production deployments to validate compliance with Service level objectives
  • Conduct system analysis, and configuration management and develop improvements for system software performance, availability, and reliability.
  • Work closely with software engineers and QA to ensure the system is responding properly to non-functional requirements such as performance, security, and availability.
  • Document system knowledge as acquired over time, create runbooks and ensure critical system information is readily available to those who need it.
  • Maintain and monitor deployment of the servers, docker containers, databases, and general backend infrastructure.
  • Participate in production feedback sessions, problem management calls to identify opportunities for product improvement.



What you’ll bring

  • Bachelor’s Degree in Computer Science or related; or equivalent combination of education and experience
  • 5+ years experience in full-stack application support/SRE role
  • Experience in JavaScript, Typescript and web development technologies
  • Proficient in scripting languages such as PowerShell and/or Python
  • Troubleshooting experience of complex application incidents built in AWS stack
  • Experience in conducting design reviews of software components and leading performance, capacity and chaos experiments.
  • Extensive Experience with observability platforms (Data dog) is required. Experience with built-in browser side diagnostic tools is expected.
  • Knowledge of DevOps methodologies and the tools involved such as CI/CD concepts, CI/CD tools (Jenkins, Code Pipeline, etc.), and automation and configuration tools (Puppet, Ansible, etc) a plus.
  • Hands on experience with AWS public cloud is a must, Project implementation experience on public cloud is a plus.
  • Ability and willingness to adapt to new application stacks and new technology concepts as the business evolves over time
  • Excellent communication skills, both verbal and written
  • Ability to collaborate with local and remote teams in different time zones
  • Ability to present/lead technical discussions with product, cloud COE, security and other support teams



Regards,

Purnima Pobbathy

Technical Recruiter

Themesoft INC

Apply Now
Share this job
Themesoft Inc.
  • Similar Jobs

  • Site Reliability Engineer

    Dallas
    View Job
  • Site Reliability Engineer

    Dallas
    View Job
  • Site Reliability Engineer III

    Irving
    View Job
  • Senior Site Reliability Engineer

    Irving
    View Job
  • Site Reliability Engineer - VP

    Irving
    View Job
An error has occurred. This application may no longer respond until reloaded. Reload 🗙