Data Engineer

Day Zero Diagnostics (DZD) is an exciting startup revolutionizing infectious disease diagnostics by leveraging cutting-edge sample prep technologies, whole genome sequencing and machine learning. We are building the next generation of IVDs able to perform comprehensive bacterial species ID and antimicrobial resistance and susceptibility (AMR/S) profiling in less than 8 hours of sample receipt, without the need for culture.

Our first application is for Sepsis. Sepsis is responsible for about a third of hospital deaths and costs hospitals in the US about $24B annually. Using the current culture-based approach for pathogen ID and AST that takes 2-5 days and has a 40% failure rate, patients with Sepsis are treated with broad-spectrum antibiotics, leading to significant toxicity, higher rates of organ injury, increased risk of c. difficile infection, and contributing to the growth of the antibiotic resistance problem globally.

By providing an accurate and comprehensive diagnosis within the first cycle of treatment, patients can get appropriate antibiotic therapy for systemic infections, such as sepsis, reducing hospital treatment durations and costs while positively impacting patient outcomes.

At DZD, we are passionate about our mission of modernizing infectious diseases diagnosis and treatment. Employees gain experience in a multidisciplinary and fast-paced start-up and have ample opportunities to acquire new skills, engage with emerging technologies, work closely with our accomplished team, and communicate their results, all while working in a supportive and energetic environment.

SUMMARY

The Data Engineer will join our Data Science team which is responsible for training machine learning models to predict antimicrobial resistance from genomic sequencing data. The Data Engineer will primarily work on MicrohmDB®, our database of microbial sequencing data and resistance profiles that powers our ML models. The team will rely on this person to extend and modernize the database and the ETL pipelines that transform raw data for R&D with bioinformatics tools and custom algorithms. This person will work closely with our computational biology team to develop genomic pipelines and, with our software engineering team, to deploy code to cloud infrastructure. Being able to work both collaboratively and independently is key.

PRIMARY RESPONSIBILITIES

Develop, improve, and maintain code for key data processing pipelines

Build a robust and scalable data ecosystem for one of our company’s most important data assets

Drive best practices around database management, data pipeline design, and workflow automation.

Work closely within the data science team and with outside collaborators

Maintain close communication with the team regarding progress

QUALIFICATIONS

Bachelor’s degree in computer science, data science, computational biology, or a related quantitative field and 1+ years experience

in software and/or data engineering

working in a cloud environment

OR a Master’s degree and no experience

Fluency in in Python, SQL, and Linux

Strong database design and management fundamentals

Dedication to coding best practices

Familiarity with biological sequencing data analysis helpful, but not required

Enthusiasm for learning about and solving problems in a new field

Highly motivated and independent, with the ability to thrive in a dynamic team environment

Strong oral and written communication skills