MLOps Engineer
Reppls
This job is no longer accepting applications
See open jobs at Reppls.See open jobs similar to "MLOps Engineer" Techstars.For our client we are seeking a Senior MLOps Engineer with extensive experience in implementing inference systems and deploying models to production environments. You should have proven experience in managing large-scale ML infrastructure for LLMs, TTS, STT, Stable Diffusion, and other GPU-intensive models in production, with expertise in cost-efficient, high-performance serving stacks in a Kubernetes-based AWS environment.
Responsibilities:- Architect, deploy, and maintain scalable ML infrastructure on AWS EKS using Terraform and Helm.
- Own end-to-end model deployment pipelines for LLMs, diffusion models (LDM/Stable Diffusion), and other generative/AI models requiring high GPU throughput.
- Implement and optimize high-load ML inference systems for image generation and voice models (TTS/STT), focusing on performance optimization.
- Design auto-scaling, cost-effective serving systems using Triton Inference Server, vLLM, Ray Serve, or similar frameworks.
- Build and maintain CI/CD pipelines for the ML model lifecycle (training → validation → deployment).
- Optimize GPU resource utilization and implement orchestration with frameworks like KServe, Kubeflow, or custom workloads on EKS.
- Deploy and manage FluxCD (or ArgoCD) for GitOps-based deployment.
- Implement monitoring, logging, and alerting for model health and infrastructure performance (Prometheus, Grafana, Loki).
- Collaborate with infrastructure teams, ML Engineers, and Software Engineers to ensure smooth integration and feedback loops.
- 4-5 years of experience implementing inference systems and deploying ML models to production.
- 2-3 years of experience with model serving frameworks like Triton, vLLM, Ray Serve, or similar.
- 3-4 years of experience deploying and optimizing LLMs and LDMs (e.g., Stable Diffusion) under high load with GPU-aware scaling.
- Proven experience with optimized ML inferences at scale, particularly in image generation and voice models.
- Strong expertise in Kubernetes (EKS) and infrastructure-as-code (Terraform, Helm).
- 4-5 years of experience in Python and ML model lifecycle in production environments.
- Demonstrated ability to collaborate with infrastructure teams and cross-functional engineering teams.
- Fluent English.
- Experience with model quantization, distillation, and optimization techniques.
- Familiarity with ML model registries like MLflow or DVC.
- Exposure to Kafka or event-driven data pipelines.
- Experience with performance profiling and GPU optimization techniques (CUDA).
- Contributions to open-source MLOps tools or frameworks.
- Experience with A/B testing frameworks for ML model deployments.
- Knowledge of cost optimization strategies for GPU-intensive workloads.
This job is no longer accepting applications
See open jobs at Reppls.See open jobs similar to "MLOps Engineer" Techstars.