AI Evaluations Engineer

ConnexAI

ScreenedJust posted

Manchester, North West

Posted 1 day ago

About the role

Role summary

This role sits at the centre of how we measure and improve AI systems in production.

You’ll define what good performance means across LLMs, ASR, TTS, and full speech-to-speech pipelines, and build the datasets, metrics, and evaluation systems that make AI quality measurable and comparable in the real world.

You’ll work closely with engineering and product teams to ensure model changes lead to real improvements in user experience, not just better offline benchmarks.

Check below to see if you have what is needed for this opportunity, and if so, make an application asap.

What you’ll do

Design and run evaluations across LLM, ASR, TTS, and speech-to-speech systems
Build real-world datasets and test cases from production behaviour and edge cases
Define metrics and scorecards for model and system quality
Benchmark internal models against external and frontier systems
Evaluate full pipelines (ASR → LLM → TTS), not just individual models
Build Python tools to automate evaluation workflows
Create internal leaderboards, red-teaming setups, and regression tests
Work with engineers and product teams to diagnose system failures
Turn vague product goals into measurable evaluation frameworks

What this role is about

Defining and measuring AI quality in production systems
Turning real user behaviour into structured evaluation signals
Ensuring model changes improve real-world performance
Understanding why AI systems fail, not just whether they do

What good looks like

You can translate improved quality into measurable metrics
You think in terms of system impact (before vs after), not just accuracy
You’re comfortable working across code, data, and production systems
You care about real-world behaviour, not just benchmarks

xwzovoh

Core skills

Strong Python (scripting, data analysis, tooling)
Experience with ML systems, evaluation, or experimentation
Understanding of LLMs or speech systems (ASR / TTS)
Ability to design test cases and structured datasets
Comfortable working with engineers and product teams

Nice to have

Experience with LLM evaluation or benchmarking
Exposure to speech or multimodal systems
Familiarity with production APIs or ML systems
Experience with automated testing or CI-style workflows

About this listing

Screened by Joboru

This role passed our automated spam and quality filters and was active in our feed when last checked. Joboru is an aggregator — here is how we screen listings. If anything looks off, tell us.

Similar jobs you may like

Junior ICT Technician

IT Talent Solutions Ltd

Infrastructure Engineer

Yolk Recruitment Ltd

SAP Role

Owen Daniels

Technical Engineer - Security Systems

Johnson Controls

Senior RF Test Engineer

MASS Consultants

Business Development Manager

Sysco

Automation Tester

Yolk Recruitment Ltd

Automation Engineer

KP Snacks

Senior COBOL Developer

Partnerscale