Data Engineer

Company:  Software Technology Inc.
Location: Dallas
Closing Date: 23/10/2024
Hours: Full Time
Type: Permanent
Job Requirements / Description

We encourage candidates who are able to work on a W2 basis to apply for this position.


Overview: their team supports data movements from several key processes/workflows: cost processing, S3 data, network traffic (VPC low flows), Cloudtrail data (API captured as a record).

- Their pipelines collect this data, enrich it with human information, and load it into a unified data store in ClickHouse for reporting and visualization purposes.

• Current project needs between net new development of pipelines and optimization and maintenance of existing ones.

• Pipelines built in Scala, PySpark and moving some over into Lambdas (Python backed) where they can. Net new work might involve Lambda development.

• Spark clusters all handle the different types of data, various structured, unstructured data sets

• Data pipelines running on EMR infrastructure, should have understanding of EMR from perspective of data distribution, scalability, performance

• Majority of pipelines are real-time streaming, costing is predominantly batch.

• Candidates should have strong experience in not only building but suggesting performance enhancements for the pipelines.

• All code is integrated into their CI/CD pipeline, orchestrated by Jenkins

• Monitoring through Cloudwatch, some Ganglia (NTH)

Must Have:

• Scala

• PySpark

• Data pipeline engineering and optimization

• AWS (specifically Lambdas and EMR)

• SQL

Nice to Have:

• ClickHouse database experience

• Ganglia

Apply Now
Share this job
Software Technology Inc.
An error has occurred. This application may no longer respond until reloaded. Reload 🗙