Senior Data Engineer - Acquisition & Infrastructure

Company:  EON Systems, Inc.
Location: Little Ferry
Closing Date: 03/11/2024
Salary: £100 - £125 Per Annum
Hours: Full Time
Type: Permanent
Job Requirements / Description

This role

As a data engineer, you will be responsible for acquisition, processing and handling of large amounts of complex neuroscientific data. You will build and maintain an end-to-end cloud-based data pipeline structure from data capture to providing processed data to our ML models. You will be collaborating closely with the human / animal brain data acquisition and AI engineering teams, building the interface between data-acquisition and our machine learning models.

Representative projects

  • Download neuro datasets from 10+ repositories, format and preprocess them, and store them in an infrastructure accessible for training pipelines.
  • Build creative validation and quality assurance steps into this pipeline, that allow SMEs to judge their quality and later automate this process. Visualize key metrics in dashboards. One potential example: run our smallest neuro foundation model on it, rank by reconstruction loss, flag if the dataset was used to train the model and thus will have artificially low loss.
  • Work with ML engineers to build an API to feed (tokenized) brain data to training runs.
  • Download or scrape metadata from the above repositories, extract additional metadata from fields like Description, impute missing metadata via LLMs.
  • Proactively work to determine what other projects would provide value to the ML team and the company

Responsibilities

  • Manage the acquisition process of petabytes of online datasets of different types and modalities
  • Assess and process unstructured and noisy data sets, requiring intensive cleanup and organization.
  • Build a cloud-based data pipeline to streamline massive amounts of data for our ML model applications
  • Host and maintain our large cloud-based datasets, ensuring scalability, accessibility and end-to-end functionality at all levels
  • Collaborate closely with our Machine Learning (ML) team to facilitate and optimize data pipeline projects.
  • Document the data pipeline with clear and comprehensive guides, facilitating easy access and understanding for the ML team and other stakeholders.
  • do not refer to internal details or delivery timelines, but be specific about what they’ll do and use
  • Example (to be deleted)

Requirements

  • Strong demonstrated experience in handling and preprocessing messy, unstructured datasets, ideally within scientific research environments.
  • Demonstrated experience in building software around cloud-based data pipeline infrastructures
  • Demonstrated experience in building large data infrastructure for ML applications
  • Proficiency in cloud computing platforms, at a minimum AWS, and ideally others
  • Good understanding of machine learning concepts and how data preprocessing affects ML model performance.
  • Strong background and experience in implementing data validation and cleaning techniques.
  • Experience in managing complex projects with a focus on timely delivery of technical solutions.
  • Excellent communication skills for effective collaboration with technical and non-technical teams.

Nice-to-haves (we’ll prioritize your application if you have the skills below)

  • Experience in the following: Kafka, Hadoop, EMR, GCP, Glue, Spark, CloudStack, HDFS, Databricks, Sagemaker, etc
  • Experience with database management, ETL processes, and SQL/NoSQL databases.
  • Thoughtfulness about policy and epistemics related to the rapidly-changing future of technology

This role may not be the best fit for you if…

  • You have predominantly developed data pipelines for business contexts, where data needs less serial and experimental processing compared to the complexities of scientific datasets.
  • Your experience does not include hands-on work with design choices around dataset acquisition.
  • You lack familiarity with fundamental scientific computing techniques, for instance, normalizing by z-score or resampling.

Salary

Competitive salaries, including equity, apply.

#J-18808-Ljbffr
Apply Now
Share this job
EON Systems, Inc.
  • Similar Jobs

  • Senior Software Engineer, Data Infrastructure

    Little Ferry
    View Job
  • Senior Data Center Infrastructure Engineer

    New York
    View Job
  • Senior Infrastructure Engineer

    Little Ferry
    View Job
  • Senior Infrastructure Engineer

    New York
    View Job
  • Senior Project Engineer - Infrastructure

    Bloomfield
    View Job
An error has occurred. This application may no longer respond until reloaded. Reload 🗙