Principal II, Observability Engineer (SRE)

Company: Herbalife

Location: Torrance

Closing Date: 07/11/2024

Hours: Full Time

Type: Permanent

Apply Now

Job Requirements / Description

Principal II, Observability Engineer (SRE)Category: Global Technology ServicesPosition Type: Regular Full-TimeExternal ID: 14765Location: Torrance, CA, United StatesDate Posted: Oct 1, 2024Hiring Range: 157,600.00 to 173,100.00 USD AnnuallyShare: share to e-mail Tweet share to twitter Share on Facebook share to facebook Share on LinkedIn share to linkedinApply NowOverviewHE ROLE:The Observability Principal II Engineer will work a hybrid schedule, with the requirement to be onsite at our Torrance as needed. This role is responsible for leading the design, implementation, and optimization of observability solutions across the organization, ensuring end-to-end transparency into application performance, system health, and user experience. The Observability Principal II Engineer will focus on monitoring, alerting, and logging frameworks, ensuring that teams have the tools and data vital to identify and resolve issues quickly and efficiently.This role will drive the adoption of industry-leading observability platforms like Dynatrace, Splunk, and Prometheus, providing real-time insights into system behavior across hybrid and multi-cloud environments. The Observability Principal II Engineer will work closely with development, operations, and security teams to establish monitoring strategies that optimize performance, reliability, and customer experience.DETAILED RESPONSIBILITIES/DUTIES:Design and implement observability frameworks to provide full transparency into the performance and reliability of systems, applications, and infrastructure.Lead the deployment and optimization of monitoring and observability tools, including Dynatrace, Splunk, Prometheus, Grafana, and other relevant technologies.Collaborate with development and operations teams to build comprehensive monitoring and alerting systems that ensure real-time detection of issues.Develop and maintain dashboards and reporting systems to supervise system health, performance metrics, and key indicators.Ensure integration of observability solutions with CI/CD pipelines to provide feedback and insights throughout the deployment process.Manage and refine alerting strategies to minimize false positives while ensuring rapid response to real incidents.Perform root cause analysis using observability data to improve system resilience and prevent recurring issues.Continuously evaluate and improve logging, tracing, and metric collection methodologies to ensure accurate data for diagnostics and optimization.Drive the implementation of SLOs (Service Level Objectives) and SLIs (Service Level Indicators) to ensure the availability and performance of critical systems.Provide guidance and mentorship to engineering teams on standard methodologies for observability and monitoring.Collaborate with security teams to ensure that observability data meets compliance and security standards, enabling fast detection of anomalies or threats.QualificationsSKILLS AND BACKGROUND REQUIRED TO BE SUCCESSFUL:Validated experience in designing and implementing observability solutions using tools like Dynatrace, Splunk, Prometheus, Grafana, or ELK Stack.Deep understanding of monitoring, logging, and tracing practices in hybrid and multi-cloud environments (Azure, AWS, GCP).Expertise in creating and optimizing dashboards, alerts, and reports for monitoring performance and system health.Experience with log management and analysis tools such as Splunk or ElasticSearch, for real-time data analysis and troubleshooting.Proven understanding of distributed tracing methodologies (e.g., OpenTelemetry, Jaeger, Zipkin) to diagnose performance bottlenecks and improve system reliability.Knowledge of Infrastructure as Code (IaC) tools such as Terraform and Ansible to automate the deployment of monitoring and observability solutions.Proficient in scripting and automation using Python, Bash, or Go for supervising and alerting infrastructure.Strong understanding of SLOs, SLIs, to ensure reliability and performance objectives are met.Ability to work in Agile and DevOps environments, ensuring seamless integration of observability into development workflows.Experience:8+ years of experience in IT, with a focus on monitoring, observability, or performance engineering.Extensive experience with observability tools like Dynatrace and Splunk, including setup, customization, and optimization for large-scale environments.Proficiency in building and maintaining complex dashboards, alerts, and automated monitoring systems in cloud-native and hybrid environments.Hands-on experience with logging, metrics, and tracing frameworks, ensuring the end-to-end observability of systems.Strong understanding of cloud infrastructure, including AWS, Azure, and GCP, and how to implement observability across cloud platforms.Experience with monitoring containerized applications using tools like Prometheus and Kubernetes, ensuring performance at scale.Proven ability to perform root cause analysis and performance tuning using observability data.Certificates / Training Preferred:Certifications in relevant observability tools such as Dynatrace Certified Associate, Splunk Core Certified Power User, or Prometheus certifications.Cloud certifications like AWS Certified Solutions Architect, Azure Solutions Architect Expert, or Google Cloud Professional Cloud Architect.Education:Bachelor’s degree in Computer Science, Information Technology, or a related field, or equivalent experience.#LI-AR1At Herbalife, we value doing what’s right. We are proud to be an equal opportunity employer, making decisions without regard to race, color, religion, sex, sexual orientation, gender identity, marital status, national origin, age, veteran status, disability, or any other protected characteristic. We value diversity, strive for inclusivity, and believe the differences among our teammates is a key contributor to Herbalife’s ongoing success.Herbalife offers a variety of benefits to eligible employees in the U.S. (limited to the 50 States and the District of Columbia), which includes Group Health Programs, other Voluntary Benefit Programs, and Paid Time Off. Group Health Programs include Medical, Dental, Vision, Health Savings Account (HSA), Flexible Spending Accounts (FSA), Basic Life/AD&D; Short-Term and Long-Term Disability and an Employee Assistance Program (EAP).Other Voluntary Benefit Programs include a 401(k) plan, Wellness Incentive Program, Employee Stock Purchase Plan (ESPP), Supplemental Life/Critical Illness/Hospitalization/Accident Insurance, and Pet Insurance. Paid time off includes Company-observed U.S. Holidays, Floating Holidays, Vacation, Sick Time, a Volunteer Program, Paid Maternity and Paternity Leave, Bereavement Leave, Personal Leave and time off for voting.If reasonable accommodation is needed to participate in the job application or interview process, to perform essential job functions, and/or to receive other benefits and privileges of employment, please email your request : share to e-mail Tweet share to twitter Share on Facebook share to facebook Share on LinkedIn share to linkedin

Apply Now

Share this job

Herbalife

Useful Links

More Jobs in Torrance
Full Time Jobs in Torrance
Part Time Jobs in Torrance
Engineering Jobs
Devops Jobs

Similar Jobs
Principal II, Observability Engineer (SRE)
Torrance
View Job
Principal II, Observability Engineer (SRE)
Torrance
View Job
PRINCIPAL CIVIL ENGINEER
Rancho Palos Verdes
View Job
Principal Software Engineer
Torrance
View Job
Principal Software Engineer
Hawthorne
View Job

Principal II, Observability Engineer (SRE)

Similar Jobs