Site Reliability Engineer - HRNR - HYBRID

Company:  Dechen Consulting
Location: Dearborn
Closing Date: 25/10/2024
Hours: Full Time
Type: Permanent
Job Requirements / Description
Dechen Consulting Group (DCG) is a rapidly expanding, innovative IT Professional Services and Management Consulting company with a track record of more than twenty-five years in delivering skilled professionals to our clients across diverse sectors.
We are currently seeking Full Stack / Site Reliability Engineer (Consultant/Expert) for a W2 contract opportunity in Dearborn, MI. This role has the potential to extend over multiple years, with the chance to transition to a direct hire position with our client. We provide healthcare, vacation, relocation assistance, and visa sponsorship/transfer. This is a W2 position, not C2C. THIRD PARTIES NEED NOT APPLY. This role offers excellent prospects for career progression!
Position Description:
We are seeking a talented Full Stack / Site Reliability Engineer to play a key role in developing a comprehensive Internal Developer Platform (IDP) that includes CI/CD pipelines, managed infrastructure, observability, and a developer portal. The primary focus of this role will be on ensuring the stability and scalability of the Internal Developer Platform that hosts the cloud applications that power our customer's connected vehicle experiences. The secondary focus of this role will be to facilitate the enablement of our product teams developing and supporting these cloud applications.
Responsibilities:
  • Strong background in software development and systems administration, as well as excellent problem-solving and communication skills.
  • Run a production environment by monitoring availability and taking a holistic view of system health.
  • Developing, improving, and operating the deployment and orchestration of a complex distributed system
  • Improve reliability, quality, and time-to-market of our suite of software solutions
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
  • Provide primary operational and engineering Support for multiple large, distributed software applications
  • Identify and reduce or eliminate toil via automation to maximize the time spent on engineering and innovation
  • Collaborating with development teams to design, build, and operate scalable and resilient software systems
  • Automating build, deployment, monitoring, and incident response processes
  • Performing root cause analysis of production incidents and implementing preventive measures
  • Participating in an on-call rotation for incident response and support.
  • Ensuring compliance with security and regulatory standards
  • Conducting performance analysis and optimization of the system

Skills Required:
  • Understanding of gRPC & RESTful APIs, and microservices platform
Experience Required:
  • 5 - 6 years' experience with Golang, JAVA, J2EE, NoSQL/SQL Datastore, Spring Boot, GCP/AWS/Azure, Docker/K8 in Maintenance and Development of multi-tier applications.
  • 4 - 5 Years of experience with any of APM and other monitoring tools such as Grafana Cloud, Dynatrace, New Relic, ELK, Splunk, Prometheus, Sensu, Nagios, Kafka, DataDog, PagerDuty.
  • Strong experience with product & development teams to establish error budgets by identifying the right SLOs (Service level objective), SLIs (Service level indicators), KPIs (Key performance indicators) and effectively drive the use of the budget to ensure maximum domain availability/uptime.
Experience Preferred:
  • Regularly review key site technical metrics such as transactions errors, logging, response times, caching strategies, conversion/bounce rates, capacity & resource utilization.
  • Proactively identify stability risks & work with engineering leadership to establish appropriate mitigation plans
  • Experience in solving complex architecture/design & business problems, work to simplify, optimize, remove bottlenecks, etc.
  • Architect, design & develop automation to reduce toil, improve recoverability, availability, latency & scalability of supported applications with understanding of MTTD (Mean Time to Detection) & MTTR (Mean Time to Resolution)
  • Maintain knowledge repository that includes Standard operating procedure, Release checklists, Runbooks for incident recovery
Additional Information :
  • ***POSIITON IS HYBRID*** Nice to have: Google Cloud Platform Engineer
Apply Now
Share this job
Dechen Consulting
  • Similar Jobs

  • Site Reliability Engineer - HRNR - HYBRID

    Dearborn
    View Job
  • Site Reliability Engineer - Senior

    Dearborn
    View Job
  • Cloud Site Reliability Engineer

    Detroit
    View Job
  • Site Reliability Engineer - GCP (Remote)

    Allen Park
    View Job
  • Reliability Engineer

    Detroit
    View Job
An error has occurred. This application may no longer respond until reloaded. Reload 🗙