hero
companies
Jobs

AI Data Ingestion Platform Engineer

Sigma360

Sigma360

Software Engineering, Data Science
New York, NY, USA · Remote
Posted on Jan 21, 2026

About Sigma360

Sigma360 is an MIT-incubated, venture-backed, Series B AI-driven global data and analytics company that helps clients manage risk. We convert the world’s messy data into actionable insights for financial institutions, corporates, and governments—powering workflows like name screening, investigations, and risk research.

We’re a collaborative team that values ownership, clarity, and practical problem solving.

Why this role matters

We’re building an Automated Ingestion System (AIS) that turns ingestion into a repeatable capability.

Instead of engineering a one-off pipeline for every new data source, AIS will let us:

  • Register a new source (domain / endpoint / feed)
  • Map it into our existing ingestion templates
  • Run a one-time backfill
  • Keep it updated automatically on a schedule
  • Monitor freshness, volume, and failures in one place

This is a greenfield, high-ownership project. Your job is to build the system that makes new integrations fast, consistent, and maintainable—not to build individual integrations yourself.

What you’ll do

You will own AIS end-to-end and ship it into production in our Databricks environment.

  • Build the “control plane” for ingestion: a source registry that tracks each source’s configuration, template mapping, schedule, health, and run history
  • Build the onboarding workflow that turns a new source into a working pipeline using our ingestion templates, including one-time backfill and continuous updates
  • Implement an AI-assisted “source understanding” flow that can look at a domain or endpoint and propose an ingestion plan:
    • identify what’s available and what’s valuable
    • choose an ingestion approach (files, API, HTML pages, JS-rendered pages)
    • propose how to map fields into our templates
  • Use agentic AI to accelerate extraction and parser creation:
    • generate and iterate on site-specific extraction rules/parsers when needed
    • fall back to deterministic parsers for high-volume patterns
    • create “known pattern” modules so new sources get easier over time
  • Build the operational backbone that makes AIS reliable at scale: scheduling, retries, rate limiting, incremental updates, change detection, and clear failure reporting
  • Build visibility that non-authors can trust: dashboards and alerts for freshness, volume, failures, and “what changed” when a source breaks
  • Establish the human-in-the-loop workflow so AIS can run through thousands of sources with oversight:
    • confidence scoring and safe defaults
    • review queues for uncertain mappings or risky changes
    • guardrails that prevent silent bad data from landing in production tables
  • Partner with data engineering and AI teams to align AIS outputs with our schemas, downstream systems, and product needs

Tech stack

  • Databricks for development, orchestration, and scheduled workflows
  • Python + PySpark + pandas for pipelines and tooling
  • Delta tables for lakehouse storage and system metadata
  • Data shipped downstream to Postgres and Neo4j, supporting a Golang backend
  • AWS

What we’re looking for

This role is ideal for a builder who likes greenfield internal platforms: equal parts product thinking and engineering execution.

Required

  • 5+ years of professional experience in data engineering, backend engineering, or automation-heavy engineering roles
  • Strong Python; comfortable designing maintainable systems (not just notebooks)
  • Experience with data ingestion patterns (APIs, file feeds, semi-structured data, web content) and normalization/QA
  • Familiarity with web data realities (HTML, basic JS rendering concepts, rate limits, site changes)
  • Comfort owning a system in production: observability, reliability, incident response, and iteration
  • Strong written communication and ability to work autonomously in a remote environment
  • Ability to overlap at least 4 hours with NYC business hours (9am–5pm ET)
  • Bachelor’s degree (Computer Science, Engineering, Data Science, or related field), or equivalent practical experience

Nice to have

  • Databricks experience (Jobs/Workflows, Delta, production operations)
  • PySpark / distributed processing depth
  • Prior experience building an internal platform or developer tool used by other engineers
  • Experience with agentic workflows, LLM prompting/tooling, or AI-assisted automation (or strong interest and ability to ramp quickly)

What success looks like (first 6–12 months)

  • AIS is live and reliably onboarding new sources through templates (one-time backfill + continuous ingestion) with clear health and freshness visibility
  • Source onboarding becomes meaningfully faster and more repeatable for the team, with fewer bespoke one-off pipelines
  • Failures are diagnosable and actionable (clear reasons + next steps), reducing operational churn
  • The system has a clear maturity path: quick wins for simple sources plus a roadmap for handling harder patterns (JS-heavy sites, layout changes, higher-volume parsing) over time

What we offer

  • Remote-first team with high autonomy and ownership
  • Competitive compensation and meaningful equity
  • Health, dental, vision, and other benefits (or local equivalent)
  • Generous time off and a culture that supports learning and growth

Sigma360 is an equal opportunity employer. We are committed to fair hiring practices and to creating a welcoming environment for all team members. All qualified applicants will receive consideration without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, disability, age, familial status, or veteran status.