Site Reliability Engineer

Company:  Compunnel Inc.
Location: Roanoke
Closing Date: 19/10/2024
Hours: Full Time
Type: Permanent
Job Requirements / Description

Location: Westlake, TX or Merrimack, NH

Skills:

  • Datadog
  • Kubernetes
  • AWS (EKS) and Azure (AKS) would prefer AWS
  • On-call experience running incidents
  • Development background: Ansible, Python, node, Javascript, Jenkins, groovy


The Expertise and Skills we’re Looking For

  • Bachelor’s degree or higher in a technology related field (e.g. Engineering, Computer Science, etc.) required
  • 5-8+ years of hands-on experience deploying and/or supporting highly distributed multi-tiered systems at scale
  • Hands-on experience with Public Cloud environments, preferably AWS and Azure. Certifications a plus
  • Hands-on experience with container orchestration, preferably with Kubernetes
  • Working experience on batch processing using tools like Control M, Informatica etc.
  • Ability to solve application issues on Unix/Linux with J2EE, WebSphere, Tomcat and SQL
  • Exposure to basic OS level scripting languages such as Korn/Bash/Jscript
  • Familiarity with ITIL processes like Incident management, Change/Problem management
  • Balancing delivery with ad hoc workloads and re-evaluating priorities
  • Solid understanding of Cloud Computing and DevOps concepts including CI/CD pipelines
  • Hands on experience with one or more observability tools (Prometheus, Grafana, ELK/OpenSearch, OpenTelemetry, Datadog, etc.)
  • Use Datadog, Catchpoint, Splunk & Grafana for Application Observability and monitoring of app & infrastructure
  • Experienced in Instrumentation with systems skills on building and operating, monitoring, logging, alerting services of distributed systems at scale
  • Proven experience in maintaining scalability and resiliency of complex environment.
  • Proven experience in implementing advanced observability practices and techniques at scale.
  • Provide enterprise Cloud and Platform Engineering support for production environments and ability to participate in on-call rotation to provide solutions.
  • Experience in Cloud development (AWS and Azure) and migration skills; Experience with building and operating highly resilient platforms in public cloud environments
  • Ability to triage, complete root cause analysis, and be decisive under pressure
  • Experience managing and interpreting large datasets using query languages and visualization tools
  • Proficient communication skills with an ability to reach both technical and non-technical audience
  • Ability to learn new software, method and practices and bringing them to our developers
  • Ability to work with a variety of individuals and groups, both in person and virtually, in a constructive and collaborative manner and build and maintain effective relationships
  • Proven experience performing chaos testing to build confidence in the system's capability to withstand turbulent conditions in production
  • Strong understanding in API testing tools (SoapUI, Postman)
  • Understanding of Agile Methodology
  • Experience managing systems using infrastructure as code tools (IAM, ARM, Terraform, Chef)
  • Handle a huge fleet of on-prem servers (including security & patching oversight)
  • Handle hundreds of SSL certificates for all applications in scope
  • Use Ansible & Python for automating day-to-day activities, Web development with Django, JavaScript
  • Collaboration and Relationships - Ability to work with a variety of individuals and groups, both in person and virtually, in a constructive and collaborative manner and build and maintain effective relationship

Apply Now
Share this job
Compunnel Inc.
  • Similar Jobs

  • Site Reliability Engineer (Datadog Python AWS) Hybrid TX NH W2 Only

    Roanoke
    View Job
  • System Engineer

    Roanoke
    View Job
  • Technical Engineer

    Roanoke
    View Job
  • Quality Engineer

    Roanoke
    View Job
  • Mining Engineer

    Roanoke
    View Job
An error has occurred. This application may no longer respond until reloaded. Reload 🗙