System Software EngineerNewHybrid

Nimblemind.ai
Nimblemind.ai

Software Engineering

Albuquerque, NM, USA

Posted on Jun 12, 2026

About Hoonify

Hoonify delivers secure, sovereign AI infrastructure designed for the next generation of inference workloads. Powered by TurbOS®, our platform enables organizations and NeoCloud/data center operators to transform CPU/GPU infrastructure into production-ready AI environments—supporting local LLMs, agentic copilots, RAG, and embeddings. We empower teams with robust model lifecycle management, multi-tenant controls, usage metering, and fully auditable operations.

The Role

We are seeking a System Software Engineer to help build, deploy, and operate our multi-cloud computational platform and model-serving infrastructure underpinning our AI/ML developer platform. This role focuses on implementation, automation, and day-to-day operation of production systems, working under the technical direction of senior engineers and the platform's established architectural patterns.

The successful candidate will deliver well-engineered, well-tested infrastructure changes, and grow their depth across Kubernetes, GPU-backed workloads, observability, and continuous delivery in a production environment.

This role enables meaningful growth in cloud infrastructure, distributed systems, and ML serving. You will work directly with senior engineers on real production systems, receive code and design review on your work, and have a clear path to expand scope and ownership as your experience deepens.

Core Responsibilities

  • Implement and maintain Kubernetes workloads and supporting resources, including manifests, Helm charts, controllers, and configuration for networking, ingress, and storage, following established platform patterns.
  • Deploy and operate model-serving workloads on GPU and accelerator node pools, including configuring autoscaling policies, resource requests and limits, and tenant-specific deployment configurations.
  • Support model training and simulation workloads on distributed GPU systems.
  • Build and maintain instrumentation on Prometheus, Grafana, and OpenTelemetry, including authoring dashboards, alerting rules, and trace and metric instrumentation for new services.
  • Implement and improve CI/CD pipelines, including build, test, and deployment automation, and contribute to progressive delivery practices already in use on the platform.
  • Develop and maintain infrastructure-as-code modules and automation scripts in support of repeatable, auditable infrastructure changes across cloud environments.
  • Support response to production incidents, execute documented runbooks, and contribute to postmortems and follow-up remediation work.
  • Investigate and resolve issues across the stack, including container, node, network, and accelerator-level problems, escalating appropriately when scope exceeds the role.
  • Write clear documentation, including runbooks, internal references, and design notes for the changes you ship.
  • Participate in code and design reviews, both as author and reviewer, and incorporate feedback from senior engineers into your work.

Required Qualifications

  • Bachelor's degree in Computer Science, Computer Engineering, or Information Technology, plus three (3) years relevant work experience or equivalent combination of education and relevant experience
  • Professional experience in cloud infrastructure, DevOps, site reliability, or backend engineering roles involving production system operation.
  • Working knowledge of Kubernetes in a production context, including writing and debugging manifests, understanding core resource types, and operating production workloads.
  • Hands-on experience with at least one major cloud provider (e.g. AWS, GCP, or OpenStack), including its compute, networking, and identity primitives.
  • Experience instrumenting services and consuming observability data, including writing Prometheus queries, building Grafana dashboards, or working with distributed traces.
  • Familiarity with CI/CD systems and the basic mechanics of automated build, test, and deployment pipelines.
  • Experience in configuration management and infrastructure as code tools (e.g. Ansible, Puppet, and Helm)
  • Proficiency in at least one programming or scripting language used for infrastructure work (Python, Go, Rust, or Bash).
  • Comfort working in a Linux environment and with standard developer tooling, including Git-based workflows.

Preferred Qualifications

  • Exposure to GPU or accelerator workloads in any production or research context.
  • Experience working in a multi-cloud environment, or strong interest in developing it.
  • Exposure to model-serving runtimes such as vLLM, or NVIDIA Triton.
  • Hands-on experience compiling, packaging, and configuring Linux software for integration into a target system.
  • Experience with HPC batch schedulers and MPI based workloads.

Why Join Hoonify

You'll have a direct line to leadership and genuine influence over the company's growth trajectory. This is a rare opportunity to build a cutting-edge multi-cloud computational platform at a company doing meaningful work in AI — with the autonomy and resources to make it your own.

Hoonify is an equal opportunity employer. We welcome applicants from all backgrounds and are committed to building a diverse and inclusive team. Must be eligible to obtain and maintain a US government security clearance.