Research Manager, Data
- Acquire and deliver massive and high-quality datasets for our large training runs.
- Develop and implement best practices and data pipelines (ingest, annotate, and incorporate high-quality datasets into model training and evaluation) by working with internal and external data partners.
- Improve our data infrastructure (e.g., management, versioning) by collaborating with software engineers and security engineers.
- Collaborate with modeling and product teams to evaluate the impact of the data on our models and continuously improve the data quality.
- Hire, provide career growth guidance, coaching, and training for engineers on your team.
- Work across teams to understand and manage project priorities and product deliverables, evaluate trade-offs, and drive technical initiatives from execution to landing.
You may be a good fit if you have:
- 5+ years of experience in managing unstructured and/or human-annotated data (e.g., collecting or assessing sample quality)
- Owned data initiatives such as data cleaning, data validation, data augmentation, and image or video processing
- Proficiency in Python
- Experience with ML frameworks such as Pytorch and Tensorflow
- 2+ years people management experience
- MS, PhD in Computer Science or a related field.
- Experience with creating large-scale datasets or RLHF-based dataset creation.