Location: Fully remote
Reports to: CTO
Compensation: up to EUR 4,000 / month
Horizon: 6-month ramp
Mission
Build and own the data pipelines that turn carrier invoices and tariffs — messy, high-volume,
multi-layout — into structured, auditable data for a portfolio of freight clients. You'll
own each integration end-to-end: extraction, matching logic, tariff translation, and the
technical relationship with the client's operations team. This is a hands-on engineering role
with direct client exposure; not a project-management role with code on the side.
What you'll own (per client)
1. Carrier integration
Onboard new carriers and new freight modes onto existing clients. Concretely, this means
designing and shipping extraction + matching pipelines that hold up under production
conditions:
- Invoice extraction at scale. Carrier invoices arrive as PDFs of up to 600+ pages with
100,000+ structured values that must be extracted at ≥ 99% field-level accuracy. Layouts vary
by carrier, by office, by region, and by year — you'll build pipelines that handle
multi-layout variance instead of one brittle parser per format.
- End-to-end pipeline ownership. PDF parsing, regex extraction, LLM-based structured
extraction, pydantic-based validation, tariff matching, surcharge handling, error recovery.
You write it, you debug it in production, you own its accuracy numbers.
- Hard validation. Every value you extract is auditable against the carrier's rate card.
Numbers that are almost right are wrong. Detail orientation isn't a nice-to-have — it's the
job.
2. Tariff translation & management
Translate carrier rate cards into the canonical, structured format our matching pipeline
consumes.
- The inputs are ugly. Excel files with 50+ tabs, merged cells, multi-currency, multi-zone,
embedded surcharge tables, footnotes that change the meaning of entire sections, and
conventions that differ per carrier.
- You'll build translation pipelines, not edit cells by hand. For most carriers the right
move is a small automated transformer (openpyxl + carrier-specific rules + validation) that
maps the carrier's structure into ours, plus a verification step that cross-checks against
the original. Manual translation doesn't scale across renewals.
- Keep tariffs accurate over time. Renewals, FSC updates, rate-card refreshes, mid-year
amendments — all flow through the same pipeline.
3. Rule management
Tune matching rules per client/carrier — zone logic, taxable weight, surcharge fallback
chains, identifier mapping, country normalization. Adjust extraction logic when a carrier
changes invoice layouts mid-contract (they will).
4. Customer contact & request handling
You are the client's primary technical contact. Handle small platform asks directly,
coordinate and quality-control the invoice checks performed by the ops team, and own the
outcome end-to-end.
Tech stack
- Backend: Python, FastAPI, Celery, SQLAlchemy
- Extraction & AI: GPT-class LLMs for structured extraction, pdfplumber, openpyxl, pydantic
- Frontend: Next.js (App Router), React, TypeScript
- Infra: Kubernetes
- Data: MSSQL
What we're looking for
- Comfortable shipping Python in a real codebase — you can read an async pipeline, follow
data through it, and add a new integration without hand-holding
- Track record of building extraction or data-transformation pipelines against messy
real-world inputs (PDFs, spreadsheets, scanned documents, EDI, etc.)
- Obsessive about accuracy. You don't ship a parser at 92% and call it done; you find the
long tail and close it
- Pragmatic about LLMs — you know when to use them, when regex is better, and how to validate
either
- Direct with clients and teammates. You can tell a client "this rate card is ambiguous, we
need clarification" without softening it into uselessness
- Ownership mindset: if a client invoice is wrong, it's your problem until it isn't,
regardless of which layer caused it
- Strong written English — most communication is async across timezones
- Bonus: freight / logistics domain experience, or prior FDE / solutions-engineering role at
a B2B SaaS
How the role evolves
In the first 3 months you'll be mostly hands-on: integrating carriers, building translation
pipelines, hunting accuracy bugs. As your portfolio grows, the mix shifts — more time on
client conversations and scoping, less on writing integrations from scratch. We expect you to grow into that split rather than stay purely technical.
Reports to: CTO
Compensation: up to EUR 4,000 / month
Horizon: 6-month ramp
Mission
Build and own the data pipelines that turn carrier invoices and tariffs — messy, high-volume,
multi-layout — into structured, auditable data for a portfolio of freight clients. You'll
own each integration end-to-end: extraction, matching logic, tariff translation, and the
technical relationship with the client's operations team. This is a hands-on engineering role
with direct client exposure; not a project-management role with code on the side.
What you'll own (per client)
1. Carrier integration
Onboard new carriers and new freight modes onto existing clients. Concretely, this means
designing and shipping extraction + matching pipelines that hold up under production
conditions:
- Invoice extraction at scale. Carrier invoices arrive as PDFs of up to 600+ pages with
100,000+ structured values that must be extracted at ≥ 99% field-level accuracy. Layouts vary
by carrier, by office, by region, and by year — you'll build pipelines that handle
multi-layout variance instead of one brittle parser per format.
- End-to-end pipeline ownership. PDF parsing, regex extraction, LLM-based structured
extraction, pydantic-based validation, tariff matching, surcharge handling, error recovery.
You write it, you debug it in production, you own its accuracy numbers.
- Hard validation. Every value you extract is auditable against the carrier's rate card.
Numbers that are almost right are wrong. Detail orientation isn't a nice-to-have — it's the
job.
2. Tariff translation & management
Translate carrier rate cards into the canonical, structured format our matching pipeline
consumes.
- The inputs are ugly. Excel files with 50+ tabs, merged cells, multi-currency, multi-zone,
embedded surcharge tables, footnotes that change the meaning of entire sections, and
conventions that differ per carrier.
- You'll build translation pipelines, not edit cells by hand. For most carriers the right
move is a small automated transformer (openpyxl + carrier-specific rules + validation) that
maps the carrier's structure into ours, plus a verification step that cross-checks against
the original. Manual translation doesn't scale across renewals.
- Keep tariffs accurate over time. Renewals, FSC updates, rate-card refreshes, mid-year
amendments — all flow through the same pipeline.
3. Rule management
Tune matching rules per client/carrier — zone logic, taxable weight, surcharge fallback
chains, identifier mapping, country normalization. Adjust extraction logic when a carrier
changes invoice layouts mid-contract (they will).
4. Customer contact & request handling
You are the client's primary technical contact. Handle small platform asks directly,
coordinate and quality-control the invoice checks performed by the ops team, and own the
outcome end-to-end.
Tech stack
- Backend: Python, FastAPI, Celery, SQLAlchemy
- Extraction & AI: GPT-class LLMs for structured extraction, pdfplumber, openpyxl, pydantic
- Frontend: Next.js (App Router), React, TypeScript
- Infra: Kubernetes
- Data: MSSQL
What we're looking for
- Comfortable shipping Python in a real codebase — you can read an async pipeline, follow
data through it, and add a new integration without hand-holding
- Track record of building extraction or data-transformation pipelines against messy
real-world inputs (PDFs, spreadsheets, scanned documents, EDI, etc.)
- Obsessive about accuracy. You don't ship a parser at 92% and call it done; you find the
long tail and close it
- Pragmatic about LLMs — you know when to use them, when regex is better, and how to validate
either
- Direct with clients and teammates. You can tell a client "this rate card is ambiguous, we
need clarification" without softening it into uselessness
- Ownership mindset: if a client invoice is wrong, it's your problem until it isn't,
regardless of which layer caused it
- Strong written English — most communication is async across timezones
- Bonus: freight / logistics domain experience, or prior FDE / solutions-engineering role at
a B2B SaaS
How the role evolves
In the first 3 months you'll be mostly hands-on: integrating carriers, building translation
pipelines, hunting accuracy bugs. As your portfolio grows, the mix shifts — more time on
client conversations and scoping, less on writing integrations from scratch. We expect you to grow into that split rather than stay purely technical.