Sr. Systems Operations Engineer

Company:  The Trade Desk
Location: New York
Closing Date: 07/11/2024
Salary: £150 - £200 Per Annum
Hours: Full Time
Type: Permanent
Job Requirements / Description

Who We Are

At The Trade Desk, we recognize that a seamless customer experience is driven by operational excellence. In pursuit of constantly improving the reliability of our platform, we are establishing a global Systems Operations team. This team's core mission will be to vigilantly monitor The Trade Desk platform services, refine our incident response methodologies, and guarantee a robust and highly-available customer experience. If you're passionate about ensuring system reliability, process improvement, and making an essential customer impact, we invite you to play a critical role in this next evolution of our on-call experience.

What You'll Do

  • Act as a technical expert and advisor to more junior Associate Systems Operations Engineers
  • At an escalated tier, monitor the state of platform services and stability via telemetry and alerts; triage issues, escalate to engineering teams as needed
    • Work collaboratively with development teams to facilitate issue remediation
    • Manage remediation task workflow
  • Proactively update and improve Systems Operations documentation and runbooks
  • Increase the effectiveness of the incident response process by defining and measuring relevant metrics
  • There may be periodic weekend coverage requirements

Who We are Looking For

  • Bachelor’s Degree from a four-year university or relevant substitute experience
  • 6+ years relevant work experience in Technical and/or Application Support with strong knowledge of services support and troubleshooting

The Systems Operations Engineer will either possess or be excited to learn a number of skills...

Technical Proficiency :

  • Understanding of large-scale distributed system architectures (e.g., databases, web services, application services).
  • Familiarity with monitoring tools (e.g., Prometheus, Grafana, Nagios).
  • Ability to configure and fine-tune alerts.
  • Proficiency or ability to learn programming languages including C# and SQL.

Incident Management and Troubleshooting :

  • Ability to prioritize and manage incidents based on severity, with a focus on customer impact.
  • Ability to remain calm under pressure and quickly diagnose issues.
  • Understanding of system logs, metrics, telemetry.

Communication Skills :

  • Ability to communicate effectively with stakeholders during an incident.
  • Clear and concise documentation skills.
  • Ability to maintain and update troubleshooting guides (TSGs) and operational documentation.
  • Ability to translate complex technical issues and platform outages to non-technical stakeholders.

Automation & Scripting :

  • Ability to automate repetitive tasks.
  • Proficiency in scripting languages (e.g., Python, Bash) is a plus.
#J-18808-Ljbffr
Apply Now
Share this job
The Trade Desk
  • Similar Jobs

  • Sr. Systems Engineer

    New York
    View Job
  • Sr. Systems Engineer

    New York
    View Job
  • Sr. Systems Engineer

    New York
    View Job
  • Sr. Systems Engineer

    New York
    View Job
  • Sr. Systems Engineer

    New York
    View Job
An error has occurred. This application may no longer respond until reloaded. Reload 🗙