companies

Jobs

AI Data Ingestion Platform Engineer

Sigma360

This job is no longer accepting applications

See open jobs at Sigma360.See open jobs similar to "AI Data Ingestion Platform Engineer" Techstars.

Software Engineering, Data Science

New York, NY, USA · Remote

Posted on Jan 21, 2026

About Sigma360

Sigma360 is an MIT-incubated, venture-backed, Series B AI-driven global data and analytics company that helps clients manage risk. We convert the world’s messy data into actionable insights for financial institutions, corporates, and governments—powering workflows like name screening, investigations, and risk research.

We’re a collaborative team that values ownership, clarity, and practical problem solving.

Why this role matters

We’re building an Automated Ingestion System (AIS) that turns ingestion into a repeatable capability.

Instead of engineering a one-off pipeline for every new data source, AIS will let us:

Register a new source (domain / endpoint / feed)
Map it into our existing ingestion templates
Run a one-time backfill
Keep it updated automatically on a schedule
Monitor freshness, volume, and failures in one place

This is a greenfield, high-ownership project. Your job is to build the system that makes new integrations fast, consistent, and maintainable—not to build individual integrations yourself.

What you’ll do

You will own AIS end-to-end and ship it into production in our Databricks environment.

Build the “control plane” for ingestion: a source registry that tracks each source’s configuration, template mapping, schedule, health, and run history
Build the onboarding workflow that turns a new source into a working pipeline using our ingestion templates, including one-time backfill and continuous updates
Implement an AI-assisted “source understanding” flow that can look at a domain or endpoint and propose an ingestion plan:
- identify what’s available and what’s valuable
- choose an ingestion approach (files, API, HTML pages, JS-rendered pages)
- propose how to map fields into our templates
Use agentic AI to accelerate extraction and parser creation:
- generate and iterate on site-specific extraction rules/parsers when needed
- fall back to deterministic parsers for high-volume patterns
- create “known pattern” modules so new sources get easier over time
Build the operational backbone that makes AIS reliable at scale: scheduling, retries, rate limiting, incremental updates, change detection, and clear failure reporting
Build visibility that non-authors can trust: dashboards and alerts for freshness, volume, failures, and “what changed” when a source breaks
Establish the human-in-the-loop workflow so AIS can run through thousands of sources with oversight:
- confidence scoring and safe defaults
- review queues for uncertain mappings or risky changes
- guardrails that prevent silent bad data from landing in production tables
Partner with data engineering and AI teams to align AIS outputs with our schemas, downstream systems, and product needs

Tech stack

Databricks for development, orchestration, and scheduled workflows
Python + PySpark + pandas for pipelines and tooling
Delta tables for lakehouse storage and system metadata
Data shipped downstream to Postgres and Neo4j, supporting a Golang backend
AWS

What we’re looking for

This role is ideal for a builder who likes greenfield internal platforms: equal parts product thinking and engineering execution.

Required

5+ years of professional experience in data engineering, backend engineering, or automation-heavy engineering roles
Strong Python; comfortable designing maintainable systems (not just notebooks)
Experience with data ingestion patterns (APIs, file feeds, semi-structured data, web content) and normalization/QA
Familiarity with web data realities (HTML, basic JS rendering concepts, rate limits, site changes)
Comfort owning a system in production: observability, reliability, incident response, and iteration
Strong written communication and ability to work autonomously in a remote environment
Ability to overlap at least 4 hours with NYC business hours (9am–5pm ET)
Bachelor’s degree (Computer Science, Engineering, Data Science, or related field), or equivalent practical experience

Nice to have

Databricks experience (Jobs/Workflows, Delta, production operations)
PySpark / distributed processing depth
Prior experience building an internal platform or developer tool used by other engineers
Experience with agentic workflows, LLM prompting/tooling, or AI-assisted automation (or strong interest and ability to ramp quickly)

What success looks like (first 6–12 months)

AIS is live and reliably onboarding new sources through templates (one-time backfill + continuous ingestion) with clear health and freshness visibility
Source onboarding becomes meaningfully faster and more repeatable for the team, with fewer bespoke one-off pipelines
Failures are diagnosable and actionable (clear reasons + next steps), reducing operational churn
The system has a clear maturity path: quick wins for simple sources plus a roadmap for handling harder patterns (JS-heavy sites, layout changes, higher-volume parsing) over time

What we offer

Remote-first team with high autonomy and ownership
Competitive compensation and meaningful equity
Health, dental, vision, and other benefits (or local equivalent)
Generous time off and a culture that supports learning and growth

Sigma360 is an equal opportunity employer. We are committed to fair hiring practices and to creating a welcoming environment for all team members. All qualified applicants will receive consideration without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, disability, age, familial status, or veteran status.

This job is no longer accepting applications

See open jobs at Sigma360.See open jobs similar to "AI Data Ingestion Platform Engineer" Techstars.

See more open positions at Sigma360

AI Data Ingestion Platform Engineer

About Sigma360

Why this role matters

What you’ll do

Tech stack

What we’re looking for

What success looks like (first 6–12 months)

What we offer

accelerators

portfolio

learn more & apply

accelerator partnership

network engagement

innovation bootcamp

ecosystem development

startupweekend.org

startupweek.co

startupdigest.com

code of conduct

diversity & inclusion

Techstars Foundation

faq

contact

brand guidelines

© techstars 2024|privacy policy|terms of use