Senior AI Backend Engineer AWS / Node.js / TypeScript (Hybrid)
Serve First
Location: Hybrid from Milton Keynes (3 days/month in office)
Team: Engineers across UK, US, India, Philippines
Reports to: CTO (US)
Works closely with: Head of Engineering (India)
Why Serve First
Were a scrappy, well-funded (£4.5m seed closed) AI startup turning raw customer feedback into real-time insight for businesses that care about CX.
Our 2025 roadmap is ambitious:
- Break apart our Node.js monolith into microservices
- Double our AI-driven workflows
- Harden infra for 100× traffic
What Youll Do
- Break up the monolith Define service boundaries and lead the transition to a microservices architecture. Implement REST + SQS communication, containerized on ECS Fargate. Design services that scale, not snowball.
- Own AI workflow integration Build and orchestrate AI workflows using OpenAI/Claude APIs today and frameworks like LangChain or similar. Design composable, multi-model pipelines (prompt orchestration, vector DBs, caching, telemetry). Lead the shift toward Bedrock/RAG-native infrastructure.
- Build and scale AI infra Stand up inference/workflow infra on AWS (Bedrock, SageMaker, or containerized flows). Make AI systems observable, secure, and cost-efficient. You wont be training models from scratch, but youll architect the systems to support them if/when we need to.
- Architect the AI platform Were not just wrapping GPT. Were building infrastructure for experimentation, scale, and optional self-hosting. Define orchestration vs. inference boundaries, expose tracing and prompt history, and keep iteration sane.
- Champion testing and correctness Enforce robust testing strategies (unit, integration, load). Design testable systems with clear mocks, interface contracts, and fast CI.
- Estimate, scope, deliver Break complex features into milestones, identify hidden risks, and communicate trade-offs clearly. Youll shape specs as much as you implement them.
- Make it observable Build first-class telemetry: structured logs, metrics, traces, alerts. Make LLM behavior debuggable and traceable from token usage to prompt mutation.
- Think security first Handle sensitive customer data with care: PII handling, IAM design, secrets management, rate limiting, GDPR readiness.
- Ship backend code Work in Node.js/TypeScript with MongoDB (Mongoose), Redis, and job schedulers (cron/EventBridge).
- Keep infra flexible AWS-first today, modular Terraform for future cloud (GCP) support.
- Mentor & raise the bar Lead reviews, mentor engineers, and reinforce best practices without slowing velocity. Know when to lean on AI tools and when not to.
- 8+ years backend engineering; deep experience building distributed systems on AWS with Node.js/TypeScript
- Strong system design skills, especially event-driven/autoscaling architectures
- Production LLM workflow experience (OpenAI, Claude, etc.): you know context windows, token limits, caching, and cost trade-offs
- Workflow orchestration: experience with LangChain, agentic frameworks, or equivalent workflow automation stacks
- Infra-aware mindset: youve deployed and scaled AI workflows/inference via Bedrock, SageMaker, or containers
- MongoDB & Redis tuning/debugging chops
- Terraform & Docker fluency
- Testing mindset with CI/CD experience
- Clear async communicator (writing, docs, code)
- Security & compliance awareness (GDPR/SOC2 basics)
- Proven ability to scope, build, and deliver complex backend features
- Background in CX, survey, or analytics SaaS
- Bedrock, LangChain, or RAG-native infra exposure
- LLMOps experience (prompt versioning, feedback loops, telemetry)
- GCP infra experience / portability
- React familiarity or empathy for frontend engineers
- Incident response and blameless postmortems
- Competitive salary (band shared at offer stage)
- Standard UK pension
- 20 days holiday + public holidays
- Generous hardware/kit budget
- High autonomy, massive scope
- Personal and professional development budget