AI Evaluation Engineer – Design Real‑World Benchmark Tasks
Gramian Consulting · Égypte
Job description
About the role
Gramian Consultancy seeks an AI Evaluation Engineer to create realistic, terminal‑based benchmark tasks that assess how large language models reason through debugging, operational failures, and complex multi‑step workflows. The role is fully remote and can be performed full‑time or part‑time over a five‑week contract.
Key responsibilities
- Design technically deep debugging and investigation scenarios for AI evaluation systems.
- Develop task specifications that involve infrastructure, pipelines, and operational failure modes.
- Write clear solution approaches and deterministic evaluation criteria.
- Identify realistic edge cases, failure modes, and system constraints.
- Craft multi‑step reasoning challenges across complex technical environments.
- Collaborate with reviewers and researchers to refine benchmark quality and validation logic.
Required profile
- 3‑10 years of experience in software engineering or related technical domains.
- Strong analytical, debugging, and systems‑reasoning abilities.
- Good understanding of system architecture, dependencies, and operational processes.
- Experience with terminal, CLI, automation, or developer‑tooling workflows.
- Exposure to AI systems, LLMs, benchmarking, or evaluation frameworks is a plus.
Required skills
- Backend engineering
- Infrastructure
- DevOps
- Data systems
- MLOps
- Cybersecurity
- Platform engineering
- Terminal / CLI
- Automation
- Developer tooling
- AI systems
- Large language models (LLMs)
- Benchmarking
- Evaluation frameworks
Questions fréquentes
Why are you reporting this job?
Apply in 30 seconds
Enter your email to apply. An account will be created automatically.
By continuing, you accept our terms of use.
Already have an account? Login
Published 6 hours ago
Expires 1 month from now
5 views · 0 applications
Boost your chances
Upload your CV — we will match you with relevant openings.
Analyzing your CV...
Gramian Consulting
Égypte
Related job offers
-
Software Testing Analyst – AI Training (Remote)
Alignerr Égypte -
Senior Information Security Engineer - Application Security
Camunda Égypte -
Remote Robot Operator – Capture Everyday Video for AI Training
Alignerr Égypte -
Senior OpenShift Engineer
Systems Limited - Egypt 6 Octobre -
Senior PMIS Engineer
Rowad Modern Engineering Le Caire