Company:
NTT DATA, Inc.
Location: Charlotte
Closing Date: 27/10/2024
Salary: £100 - £125 Per Annum
Hours: Full Time
Type: Permanent
Job Requirements / Description
NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now.
We are currently seeking a Senior / Lead Hadoop Engineer to join our team in Charlotte , North Carolina (US-NC) , United States (US) .
Job Duties and Responsibilities:
In this role, you will:
- Lead complex technology initiatives towards Hadoop platform stability, automation initiatives.
- Platform Incident and Problem management
- Be part of 24/7 platform support team to perform platform incident management and triaging team.
- Assess the impact of the platform issue and prioritize the triage by partnering with platform tenants, platform development and Hadoop vendor teams.
- Facilitate and perform root cause analysis of platform incidents towards determining permanent resolution involving Tenants and Hadoop vendor.
- Communicate with Stakeholders, leadership team on the triaging progress status.
- To troubleshoot tenant reported failures by deep diving onto client logs to identify the root cause.
- Use Autosys as scheduler to schedule, execute platform routine activities.
- Hadoop Cluster Monitoring
- Monitor Hadoop cluster health and performance.
- Automate new monitoring opportunities leveraging available monitoring and alerting tools such as Prometheus, Thousand Eyes, Splunk etc.
- Perform troubleshooting on the actionable alerts and tuning to ensure optimal cluster performance.
- Hadoop Cluster Configuration and tuning
- Periodically tune and configure Hadoop ecosystem components such as HDFS, YARN, Hive, HBase, Spark, HPOS etc.
- Schedule the Change request and track the approval process for implementation.
- Platform Backup and Recovery
- Ensure Hadoop platform is resilient with data backup and recovery strategies in place.
- Perform emergency or scheduled failover of platform to Disaster recovery site and fall back.
- Upgrades and Patch Management
- Plan and execute upgrades of Hadoop ecosystem components and patches.
- Ensure compatibility and smooth integration of new features and enhancements.
- Work with Operating System administrators and patching automation build team to schedule and execute patching.
- Perform Cluster health validation post patching and communicate with stakeholders.
- Identify automation and process improvement opportunities related to patching to implement it.
- Documentation and Reporting
- Maintain comprehensive documentation of Hadoop cluster configurations & troubleshooting processes, and procedures.
- Develop and Schedule to Generate reports on cluster usage, performance metrics, and capacity utilization.
- Collaboration and Support
- Ability to work both independently and in collaboration with Platform development and Platform tenant teams to troubleshoot and fix Hadoop platform issues, maintain production stability to enterprise-wide applications.
- Collaborate and consult with key technical experts, senior technology team, and Hadoop vendor to resolve complex technical issues and achieve goals.
Basic Qualifications:
- 8+ Years Proven Experience in technology incident and problem management role.
- 4+ Years Proven Experience in using monitoring tools like Prometheus, Splunk, SiteScope & Thousand Eyes etc.
- 6+ Years Proven experience as a Unix Scripting.
- 2+ Years Proven experience in Hadoop administration or in a similar role.
Preferred Qualifications:
- Strong knowledge of Hadoop ecosystem and its components (Hive, HBase, Spark, Hue).
- Proven Experience is using Autosys or similar job scheduler.
- Excellent problem-solving and troubleshooting skills.
- Excellent communication and people skills for collaborating with cross-functional teams.
Share this job
NTT DATA, Inc.
Useful Links