Site Reliability Engineering (SRE)

Company:  Applab Systems Inc
Location: Princeton
Closing Date: 19/10/2024
Hours: Full Time
Type: Permanent
Job Requirements / Description
Replacement position - Fulltime // Permanent Hire

Location: O Fallon, Missouri (Onsite)

Skillset :

Must have skills:

Application/Production L2 Support experience is a must

Unix front end troubleshooting, Oracle SQL & Java

Monitoring Tools - Splunk / Dynatrace

DevOps Tools

JD as below:

Incident Resolution - Review and resolve the Incidents arising from

o Operation Command Center Alerts

o Alerts from Enterprise Monitoring Operations (EM Operations).

o OMNIBUS and Splunk Alerts

• Change Implementation - Deploying the application related artifacts to the production environments in the slotted approved release window

• Reporting the issues with the deployments and coordinating with the Development Teams to fix any deployment issues

• Work Orders - Resolve Work orders in form of Business/functional queries, adhoc testing, verification and validation etc, from Regional product team and customer support teams.

• Traffic Routing – perform traffic routing in support of infrastructure maintenance

• Perform Root Cause Analysis in detail for High severity Incidents – and take action on fixing the underlying cause of the high severity issues. Take necessary preventive actions also.

• Supporting the UAT testing by the Product team and Regional customer support team.

• Configuring application/artifacts and supporting the new customer onboarding to the platform

• Testing the newly on boarded customer's file processing and reports delivery

• Raise new change tickets and arrange for approvals, including CAB approvals

• Review and approve change tickets.

• Creating Confluence pages for newly analyzed Work Orders / new type of Incidents with resolution steps

• Work with customers on ad-hoc queries

• Work with Development / Testing team for defect analysis (with Production simulated data)

• Build automation scripts that reduce the number of Incidents and/or improves processes followed

• Support customer to fill in the Post Incident Report (PIR) when any high impacting Incidents affecting customers occurred.

• Participate / Initiate in War Room calls that impacts application availability or has a customer impact

• Willing to work on shifts (Morning & Afternoon shifts) & Weekend support

Apply Now
Share this job
Applab Systems Inc
  • Similar Jobs

  • Manager, Design Assurance/Reliability Engineering

    Princeton
    View Job
  • Manager, Design Assurance/Reliability Engineering

    Princeton
    View Job
  • Datadog SRE

    Princeton
    View Job
  • Reliability Engineer

    Princeton
    View Job
  • Reliability Engineer

    Princeton
    View Job
An error has occurred. This application may no longer respond until reloaded. Reload 🗙