Company:
Darwin Resources
Location: Dallas
Closing Date: 04/11/2024
Hours: Full Time
Type: Permanent
Job Requirements / Description
Job Title: Data Engineer Architect (PySpark)
Location: Dallas, TX
Job Type: Full-Time
Department: Data Engineering
We are seeking a talented Data Engineer Architect with expertise in PySpark.
Job Summary:
As a Data Engineer Architect, you will play a pivotal role in designing and implementing scalable data pipelines and architecture that facilitate data ingestion, processing, and analysis. Your expertise in PySpark will be essential in building efficient data solutions that support our analytics and machine learning initiatives.
Key Responsibilities:
- Design and implement robust data architectures using PySpark that support ETL processes, data warehousing, and analytics platforms.
- Build, optimize, and maintain data pipelines for large-scale data processing, ensuring data quality, reliability, and performance.
- Work closely with data scientists, analysts, and other stakeholders to understand data requirements and translate them into technical specifications.
- Identify and implement best practices for data processing and storage, including performance tuning and resource optimization.
- Evaluate and recommend data engineering tools and technologies, keeping abreast of industry trends and advancements.
- Create and maintain comprehensive documentation for data architecture, pipelines, and processes.
Qualifications:
Bachelor's degree in Computer Science, Information Technology, or a related field
5+ years of experience in data engineering or related roles, with a strong focus on big data technologies.
Proven expertise in PySpark and experience with Apache Spark frameworks.
Proficiency in data modeling, ETL processes, and data warehousing concepts.
Experience with cloud platforms (AWS, Azure, GCP) and associated data services.
Strong programming skills in Python, with familiarity in other languages such as Scala or Java being a plus.
Knowledge of SQL and experience with relational and NoSQL databases.
Excellent problem-solving skills and the ability to work in a fast-paced, collaborative environment.
Strong communication skills, with the ability to articulate technical concepts to non-technical stakeholders.
Preferred Skills:
Experience with containerization technologies (Docker, Kubernetes).
Familiarity with machine learning frameworks and libraries.
Understanding of data governance, security, and compliance best practices.
If you are passionate about data engineering and are looking to make a significant impact within a forward-thinking organization, we would love to hear from you!
Location: Dallas, TX
Job Type: Full-Time
Department: Data Engineering
We are seeking a talented Data Engineer Architect with expertise in PySpark.
Job Summary:
As a Data Engineer Architect, you will play a pivotal role in designing and implementing scalable data pipelines and architecture that facilitate data ingestion, processing, and analysis. Your expertise in PySpark will be essential in building efficient data solutions that support our analytics and machine learning initiatives.
Key Responsibilities:
- Design and implement robust data architectures using PySpark that support ETL processes, data warehousing, and analytics platforms.
- Build, optimize, and maintain data pipelines for large-scale data processing, ensuring data quality, reliability, and performance.
- Work closely with data scientists, analysts, and other stakeholders to understand data requirements and translate them into technical specifications.
- Identify and implement best practices for data processing and storage, including performance tuning and resource optimization.
- Evaluate and recommend data engineering tools and technologies, keeping abreast of industry trends and advancements.
- Create and maintain comprehensive documentation for data architecture, pipelines, and processes.
Qualifications:
Bachelor's degree in Computer Science, Information Technology, or a related field
5+ years of experience in data engineering or related roles, with a strong focus on big data technologies.
Proven expertise in PySpark and experience with Apache Spark frameworks.
Proficiency in data modeling, ETL processes, and data warehousing concepts.
Experience with cloud platforms (AWS, Azure, GCP) and associated data services.
Strong programming skills in Python, with familiarity in other languages such as Scala or Java being a plus.
Knowledge of SQL and experience with relational and NoSQL databases.
Excellent problem-solving skills and the ability to work in a fast-paced, collaborative environment.
Strong communication skills, with the ability to articulate technical concepts to non-technical stakeholders.
Preferred Skills:
Experience with containerization technologies (Docker, Kubernetes).
Familiarity with machine learning frameworks and libraries.
Understanding of data governance, security, and compliance best practices.
If you are passionate about data engineering and are looking to make a significant impact within a forward-thinking organization, we would love to hear from you!
Share this job
Darwin Resources
Useful Links