Job Description
About the Team:
At Data Science Platform (DScP), we are revolutionizing the field of Data Engineering and Data Science. We are a dynamic and innovative team, driven by a passion for Self-service Data Platform, Data Mesh and AI/ML technology and a commitment to delivering exceptional platforms and services to our business partners internally at Visa. Join us in shaping the future of Generative AI for Data Solutions and making a significant impact on the enterprise.
As a member of the team, you will have the opportunity to directly contribute and deliver solutions that are intuitive, user centric, and accessible for Visa’s Data Mesh and beyond.
Essential Functions:
- Hands-on Programming
- Build analytics tools that utilize the data automation pipeline to provide actionable insights.
- Proficient in building, scheduling, and managing DAGs in Apache Airflow.
- Monitor data processing tasks in the data pipelines monitoring tool - Airflow.
- Quality control of data assets to ensure the data loaded across different stages in the data pipeline are reconciled.
- Strong analytical and debugging skills on data issues and clearly communicating clarifications and recommendations to stakeholders.
- Work on emerging technologies, building highly scalable, secure, reliable and fault-tolerant distributed systems.
- Participate in design and code review sessions as appropriate.
- Provide third-level support to business users.
- Present technical solutions, capabilities, considerations, and features in business terms. Effectively communicate status, issues, and risks in a precise and timely manner.
- Independently understand data ecosystem, data, security, data privacy, and retention requirements needed to support business and product features.
- Interpret pipeline designs with minimal guidance.
- Create and implement code and apply coding patterns, guidelines, styles, and best practices; and adhere to all security requirements.
- Follow general guidelines to conduct unit testing to confirm functional capability of code with minimal oversight and participate in user acceptance testing in collaboration with the customer.
- Independently merge data into distributed systems, products, or tools for further processing, and help peers as needed.
This is a hybrid position. Hybrid employees can alternate time between both remote and office. Employees in hybrid roles are expected to work from the office 2-3 set days a week (determined by leadership/site), with a general guidepost of being in the office 50% or more of the time based on business needs.
Qualifications
Basic Qualifications
- 2 or more years of work experience with a Bachelor’s Degree or an Advanced Degree (e.g. Masters, MBA, JD, MD, or PhD)
Preferred Qualifications
- 3 or more years of work experience with a Bachelor’s Degree or more than 2 years of work experience with an Advanced Degree (e.g. Masters, MBA, JD, MD) in Computer Science, Information Technology, Engineering, Computer Engineering.
- 3+ years' experience in PySpark, Spark, Scala, Unix, Python programming/scripting experience (must haves).
- 2+ years' experience in one or more of the following: Kafka, Spark streaming, Airflow, Control-M, Presto.
- 2+ years' experience in one or more of the following: SQL, NOSQL Database.
- Experience in Hadoop, HDFC, Hive Tuning, Bucketing, Partitioning.
- Java experience is a plus.
- Expert level usage with Jenkins, GitHub / Bitbucket
- Experience on DataLakes, Data Mesh and datahub implementation desired