Head of Data Engineering (Clinical)
Deep 6 AI
What You'll Do
- Lead the continued development and enhancement of our clinical data ingestion and comprehension pipeline.
- Drive the utilization of core principles in development, including observability, scalability, and end-to-end control.
- Support the utilization of advanced AI/ML research through the exposure of raw, canonicalized, and comprehended data in analytics platforms.
- Establish and maintain key performance metrics to track the effectiveness of data engineering initiatives.
- Foster collaboration with internal and external stakeholders to gather feedback and drive continuous improvement.
- Enhance Deep 6 AI's reputation as a leader in AI-driven clinical trials acceleration through thought leadership and industry recognition.
About You
- Strong player-coach mentality, with an ability to balance the hands-on needs of leading two groups: data platform (data engineering) and search/enrichment (ML engineering).
- Proven record of accomplishment, leading successful large-scale clinical data initiatives within a product-focused environment.
- Strong background in applied data engineering and data pipelines, with hands-on experience developing and deploying production-ready models.
- Understanding of Software Development Life Cycle and data product development
- Experience working with healthcare data, especially HL7 and FHIR.
- Deep understanding of streaming data ingestion and ETL processes
- Conceptual understanding of ML techniques, particularly NLP (e.g., NER, BERT).
- Conceptual understanding of AI techniques like large language models (LLMs), self-learning models (SLMs), and other state-of-the-art approaches.
- Experience with database technologies, especially Elasticsearch, PostgreSQL, Amazon Aurora, and DynamoDB.
- Demonstrated passion for staying up to date with the latest data engineering and pipeline trends, along with a record of accomplishment of driving innovation.
Preferred Qualifications
- Cloud Services: Experience with cloud-based data processing and storage services (AWS).
- Infrastructure as Code: Proficiency with infrastructure as code tools (CDK).
Technologies We Use: While specific expertise in our tech stack is beneficial, we value adaptability and a willingness to learn. Our current stack includes:
- AWS Cloud Services (e.g., EC2, ECS, RDS, Aurora, DynamoDB, Lambda)
- Java (Kotlin), Python, TypeScript
- Kubernetes, Docker
- FHIR Servers (e.g., HAPI, Health Samurai AidBox)
- Elasticsearch and Elastic Cloud
- CI/CD: GitHub Actions
- Monitoring: OpenTelemetry, AWS X-Ray, AWS Cloudwatch, Datadog, Pendo