Data Engineer

Company: Software Technology Inc.

Location: Dallas

Closing Date: 23/10/2024

Hours: Full Time

Type: Permanent

Apply Now

Job Requirements / Description

We encourage candidates who are able to work on a W2 basis to apply for this position.

Overview: their team supports data movements from several key processes/workflows: cost processing, S3 data, network traffic (VPC low flows), Cloudtrail data (API captured as a record).

- Their pipelines collect this data, enrich it with human information, and load it into a unified data store in ClickHouse for reporting and visualization purposes.

• Current project needs between net new development of pipelines and optimization and maintenance of existing ones.

• Pipelines built in Scala, PySpark and moving some over into Lambdas (Python backed) where they can. Net new work might involve Lambda development.

• Spark clusters all handle the different types of data, various structured, unstructured data sets

• Data pipelines running on EMR infrastructure, should have understanding of EMR from perspective of data distribution, scalability, performance

• Majority of pipelines are real-time streaming, costing is predominantly batch.

• Candidates should have strong experience in not only building but suggesting performance enhancements for the pipelines.

• All code is integrated into their CI/CD pipeline, orchestrated by Jenkins

• Monitoring through Cloudwatch, some Ganglia (NTH)

Must Have:

• Scala

• PySpark

• Data pipeline engineering and optimization

• AWS (specifically Lambdas and EMR)

• SQL

Nice to Have:

• ClickHouse database experience

• Ganglia

Apply Now

Share this job

Software Technology Inc.

Useful Links

More Jobs in Dallas
Full Time Jobs in Dallas
Part Time Jobs in Dallas
Information Technology Jobs
Engineering Jobs

Similar Jobs
Data Engineer
Irving
View Job
Data Engineer
Irving
View Job
Data Engineer
Dallas
View Job
Data Engineer
Irving
View Job
Data Engineer
Dallas
View Job

Data Engineer

Similar Jobs