hero
companies
Jobs

MLOps Engineer

Reppls

Reppls

Poland
Posted on Sep 24, 2025

For our client we are seeking a Senior MLOps Engineer with extensive experience in implementing inference systems and deploying models to production environments. You should have proven experience in managing large-scale ML infrastructure for LLMs, TTS, STT, Stable Diffusion, and other GPU-intensive models in production, with expertise in cost-efficient, high-performance serving stacks in a Kubernetes-based AWS environment.

Responsibilities:
  • Architect, deploy, and maintain scalable ML infrastructure on AWS EKS using Terraform and Helm.
  • Own end-to-end model deployment pipelines for LLMs, diffusion models (LDM/Stable Diffusion), and other generative/AI models requiring high GPU throughput.
  • Implement and optimize high-load ML inference systems for image generation and voice models (TTS/STT), focusing on performance optimization.
  • Design auto-scaling, cost-effective serving systems using Triton Inference Server, vLLM, Ray Serve, or similar frameworks.
  • Build and maintain CI/CD pipelines for the ML model lifecycle (training → validation → deployment).
  • Optimize GPU resource utilization and implement orchestration with frameworks like KServe, Kubeflow, or custom workloads on EKS.
  • Deploy and manage FluxCD (or ArgoCD) for GitOps-based deployment.
  • Implement monitoring, logging, and alerting for model health and infrastructure performance (Prometheus, Grafana, Loki).
  • Collaborate with infrastructure teams, ML Engineers, and Software Engineers to ensure smooth integration and feedback loops.
Required Experience:
  • 4-5 years of experience implementing inference systems and deploying ML models to production.
  • 2-3 years of experience with model serving frameworks like Triton, vLLM, Ray Serve, or similar.
  • 3-4 years of experience deploying and optimizing LLMs and LDMs (e.g., Stable Diffusion) under high load with GPU-aware scaling.
  • Proven experience with optimized ML inferences at scale, particularly in image generation and voice models.
  • Strong expertise in Kubernetes (EKS) and infrastructure-as-code (Terraform, Helm).
  • 4-5 years of experience in Python and ML model lifecycle in production environments.
  • Demonstrated ability to collaborate with infrastructure teams and cross-functional engineering teams.
  • Fluent English.
Nice to Have:
  • Experience with model quantization, distillation, and optimization techniques.
  • Familiarity with ML model registries like MLflow or DVC.
  • Exposure to Kafka or event-driven data pipelines.
  • Experience with performance profiling and GPU optimization techniques (CUDA).
  • Contributions to open-source MLOps tools or frameworks.
  • Experience with A/B testing frameworks for ML model deployments.
  • Knowledge of cost optimization strategies for GPU-intensive workloads.