Zephyr is building an innovative AI platform to change the way we treat cancer, diabetes, and other chronic diseases. By aggregating massive data sets and harnessing advanced technologies and AI to increase our understanding of biology, Zephyr discovers insights that will transform how new therapies are developed and how we treat patients. We will use that knowledge to devise interventions that enable people to live longer and healthier lives. Working in close partnership with industry-leading institutions across academia, biopharma, and care delivery, Zephyr is advancing our understanding of how to characterize and treat chronic diseases. With an initial focus on cancer and diabetes, Zephyr is working to revolutionize drug development, reform clinical trials, and change healthcare to impact patient lives. Zephyr is based in Tysons Corner, VA, and currently operates as a remote-first organization.




We seek a Machine Learning Research Scientist to join our growing multidisciplinary team and lead projects to collect, standardize, and analyze Electronic Health Records data and implement state-of-the-art natural language processing (NLP) algorithms, including large language models (LLMs). Working collaboratively with the research and engineering teams, the ideal candidate will apply their deep technical knowledge in machine learning and NLP to design, develop, invent, and implement novel analyses to improve patient outcomes and address critical issues in cancer and diabetic patient care. This role requires communicating computational results and outcomes to scientists in both quantitative and non-quantitative disciplines and external collaborators. Join us!


We are a team of scientists and engineers dedicated to creating transformative computational tools for biology and focused on delivering innovative precision medicine solutions to patients with chronic diseases. Our Data Science team builds predictive models to understand diabetes and cancer better and to drive product development based on such insights. Our team is flat and self-organizing. We move quickly, but we do good work. We are excited to share our passion for applying innovative AI techniques to save patients' lives with you and for your perspective to help shape our team's goals.


  • Conduct scholarly research and create new approaches for modeling of electronic health record data.

  • Develop and implement algorithms and models to analyze large-scale healthcare data sets, including unstructured notes, electronic health records, claims data, and other patient-level data.

  • Perform applied research implementing SOTA NLP solutions for embeddings based information retrieval, NER, relation extraction, topic analysis, summarization, etc.

  • Work closely with engineers and data scientists to translate models and algorithms into production applications.

  • Communicate clearly and effectively in verbal, visual, and written form to stakeholders with varying levels of technical knowledge.

  • Write analytical and technical documentation and supporting research materials.

 Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions 


  • Currently has or is in the process of obtaining a PhD in the field of machine learning, AI, computer science, statistics, applied mathematics, data science or a related field, or equivalent practical experience.

  • Proven track record of achieving significant results as demonstrated by publications at leading workshops, journals or conferences.

  • Strong background in Natural Language Processing (NLP) including various architectures (Autoencoders, Transformers, etc.)

  • Advanced expertise in Python and any of the common ML frameworks (PyTorch, Tensorflow, JAX).

  • Familiarity with cloud based infrastructure (preferably AWS but Azure or GCP are good too).

  • Strong analytical and problem-solving skills to effectively evaluate information/data to make decisions; anticipate obstacles and develop plans to resolve.

  • Experience communicating research for public audiences of peers.

You will be a step ahead if you have

  • Experience in manipulating and utilizing large, longitudinal data sources, such as Electronic Health Records, Medical Claims, and other patient-level data sets.

  • Experience with delivering machine learning models to production (including logging, monitoring metrics, etc.).

  • Familiarity with pharmaceutical or biotech industry and meeting regulatory requirements.



This is a full-time, exempt, position, reporting to the EVP of Science and Technology. This position is remote and requires the ability to work cross functionally within the organization. 


We offer competitive compensation as well as a comprehensive benefits package including:


  • 100% Company Paid Medical/Dental/Vision Insurance 

  • Generous paid time off

  • Paid holidays

  • 401(k) program 

  • Voluntary life and disability plans

  • Employee assistance program (EAP)

  • Opportunities for advancement



We are an equal opportunity employer


Zephyr AI provides equal employment opportunities (EEO) to all applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, disability, genetic information, marital status, amnesty, or status as a covered veteran in accordance with applicable federal, state and local laws. Zephyr AI complies with applicable state and local laws governing non-discrimination in employment in every location in which the company operates. This policy applies to all terms and conditions of employment, including, but not limited to, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training.