Available for new work

Emanuele Czofei

AI Product Engineer

Building production LLM systems · RAG · Agents · Python + TypeScript

I find the failure your eval is hiding — a TNR of 0.00, a miscalibrated threshold — then rebuild the judge until the numbers are real.

Basque Country, Spain · Open to remote (EU / US via EOR)

What I build

Three things I do at production depth.

01

RAG systems with real eval harnesses

Retrieval pipelines built alongside the evaluation infrastructure that proves they work — not just demos that look right.

02

Full-stack AI products

Python backends and Next.js / TypeScript frontends, owned end to end — from retrieval logic to the interface users actually touch.

03

LLM evaluation infrastructure

Retrieval metrics, LLM-as-judge, and meta-evaluation — the tooling that tells you whether the judge itself can be trusted.

Projects

The work, with the receipts.

Spanish Safety RAG Assistant

View repo

A RAG assistant over the Spanish INSHT / NTP occupational-safety corpus. It answers regulatory queries with retrieved evidence and grounded generation — every claim traceable to a source.

PythonLangChainChromaDBVoyageAI voyage-3rerank-2.5Claude Haiku APINext.jsTypeScript
The differentiator

The judge initially showed TNR 0.00 due to threshold miscalibration and missing reference answers. I diagnosed the failure, rebuilt the judge with reference-aware prompting and explicit fault enumeration, and validated the fix.

eval · results
FaithfulnessTPR100%TNR100%
CompletenessTPR100%TNR87.5%
Hit@8→ 0.98
MRR→ 0.955
Precision@8→ 0.69
Golden set50 human-labeled items

E-Commerce Platform

Full-Stack · Personal Project
View repo

Custom e-commerce platform. I built the backend only — product catalogue, checkout, and order management with full auth and role-based access control. Backend, not the frontend.

FastAPIPostgreSQLSQLAlchemyStripeRailway

Async Document Conversion Service

Technical Challenge · SlideSpeak
View repo

Technical challenge for SlideSpeak (AI presentation startup). Built a PowerPoint-to-PDF async conversion service. Upload triggers a Celery worker via Redis; LibreOffice handles the conversion server-side; the completed file is stored in S3 and the frontend polls for the result. Fast-tracked past intermediate interview rounds.

ReactTailwind CSSFastAPICeleryRedisAmazon S3LibreOffice

Skills

The stack I reach for.

LLM Systems
RAG pipelinesLLM-as-judgeeval harnessesprompt engineering
Backend
PythonFastAPILangChainChromaDBVoyageAI
Frontend
Next.jsTypeScriptReactTailwind CSSREST APIs
Currently learning
LangGraphMCP serversInstructor

Experience

Where I've shipped.

Fullstack AI Engineer Intern — Gaddr

Sept–Dec 2025

Built authentication flows in Next.js / TypeScript and integrated early API features into the product.

Self-taught developer

since 2024

Went from first principles to production LLM systems — learning by shipping, debugging, and measuring.

Contact

Building something in AI? Let's talk.