Site Reliability Engineer - Azure

Company:  Motion Recruitment
Location: Dallas
Closing Date: 09/11/2024
Hours: Full Time
Type: Permanent
Job Requirements / Description

A large financial and tax company located in Dallas, TX is looking to bring on a Site Reliability Engineer to their team on a full-time, direct hire basis. This position is responsible for developing and utilizing tools to monitor key metrics of their data systems, tracking the reliability and recoverability of their systems, and reporting outcomes of regular testing and monitoring activities.
Additionally, this position is responsible for enabling the high availability of their systems through understanding our failover methods, implementing changes to their methods and systems to minimize downtime, and facilitating failover procedure testing. They are also responsible for making recommendations to the developers for better reliability patterns in their existing and newly created systems.
The ideal candidate will have 2+ years of experience. Someone who loves processes, has proven experience with monitoring tools (Datadog, Splunk, New Relic, Prometheus, Grafana, Nagios, etc.), and any experience/exposure to modern DevOps is a plus (Azure, Kubernetes, Terraform)
This is a long-term contract position, will run 6+ months with a high likelihood to extent. Will be onsite 1-2 days per week. Pay rate will be in the range of $60-70/hr.
Required Skills & Experience

  • Utilize existing tools to create telemetry streams from each system that DevOps maintains.
  • Track trends of key metrics to build a repeatable snapshot of the current state of all systems within DevOps and predict failures.
  • Correlate data from disparate systems to determine underlying causes to issues that may be occurring in seemingly-unrelated parts of the enterprise.
  • Monitor existing logging and monitoring systems and reduce unnecessary logging or improperly tuned monitor probes.
  • Develop a suite of dashboards and tools that enable the SRE to track all incoming metrics and surface the most pressing issues
  • Continually improve these dashboards to make their information more useful in real time as well as for after-the-fact analysis
  • Generate "Post Mortem" reports for unplanned outages or system failures
  • Prepare "Scope of Impact" reports for upcoming planned outages or system changes
  • Work with the other members of DevOps and the Infrastructure team to ensure that underlying resources are ready for failover and to help plan for future growth
  • Maintain failover documentation and S.O.P.s.
  • Perform regularly scheduled failover testing in conjunction with the rest of the DevOps team, Infrastructure, and our Business teams.
  • Continually seek to improve our failover procedures.
Desired Skills & Experience
  • A bachelor's degree in Computer Science, Data Science, Computer Information Systems, or a related field is preferred, but commiserate experience is acceptable in lieu of such a degree
  • A basic understanding of computer programming and experience working with code, databases, and operating systems is needed.
  • At least two years of experience working with data systems
  • Ability to interact with various groups within the business to inform them of the basic details of upcoming changes or communicating the current state of system failures or outages.
  • Ability to interact with other developers and management to help define, implement, and enforce patterns for proper metric telemetry from systems, proper logging, and resilient failover patterns.
  • Should always be seeking to improve our system telemetry, uptime, and recoverability. Therefore, must be aware of the different technologies available in the industry and will help determine if they are a fit for our environment.
The Offer
  • Full-time, salary will range from $60-70/hr depending on experience
You will receive the following benefits:
  • Medical Insurance - Four medical plans to choose from for you and your family
  • Dental & Orthodontia Benefits
  • Vision Benefits
  • Health Savings Account (HSA)
  • Health and Dependent Care Flexible Spending Accounts
  • Voluntary Life Insurance, Long-Term & Short-Term Disability Insurance
  • Hospital Indemnity Insurance
  • 401(k) including match with pre and post-tax options
  • Paid Sick Time Leave
  • Legal and Identity Protection Plans
  • Pre-tax Commuter Benefit
  • 529 College Saver Plan

Motion Recruitment Partners (MRP) is an Equal Opportunity Employer, including Veterans/Disability/Women. All applicants must be currently authorized to work on a full-time basis in the country for which they are applying, and no sponsorship is currently available. Employment is subject to the successful completion of a pre-employment screening. Accommodation will be provided in all parts of the hiring process as required under MRP's Employment Accommodation policy. Applicants need to make their needs known in advance.
Posted by: Jay Aguiar
Specialization: Site Reliability Engineer
Apply Now
Share this job
Motion Recruitment
  • Similar Jobs

  • Site Reliability Engineer - Azure

    Dallas
    View Job
  • Site Reliability Engineer - Azure

    Dallas
    View Job
  • Site Reliability Engineer - Azure

    Dallas
    View Job
  • Site Reliability Engineer

    Dallas
    View Job
  • Site Reliability Engineer

    Dallas
    View Job
An error has occurred. This application may no longer respond until reloaded. Reload 🗙