AI Evaluation Engineer – Design Real‑World Benchmark Tasks
Gramian Consulting · Égypte
وصف الوظيفة
About the role
Gramian Consultancy seeks an AI Evaluation Engineer to create realistic, terminal‑based benchmark tasks that assess how large language models reason through debugging, operational failures, and complex multi‑step workflows. The role is fully remote and can be performed full‑time or part‑time over a five‑week contract.
Key responsibilities
- Design technically deep debugging and investigation scenarios for AI evaluation systems.
- Develop task specifications that involve infrastructure, pipelines, and operational failure modes.
- Write clear solution approaches and deterministic evaluation criteria.
- Identify realistic edge cases, failure modes, and system constraints.
- Craft multi‑step reasoning challenges across complex technical environments.
- Collaborate with reviewers and researchers to refine benchmark quality and validation logic.
Required profile
- 3‑10 years of experience in software engineering or related technical domains.
- Strong analytical, debugging, and systems‑reasoning abilities.
- Good understanding of system architecture, dependencies, and operational processes.
- Experience with terminal, CLI, automation, or developer‑tooling workflows.
- Exposure to AI systems, LLMs, benchmarking, or evaluation frameworks is a plus.
Required skills
- Backend engineering
- Infrastructure
- DevOps
- Data systems
- MLOps
- Cybersecurity
- Platform engineering
- Terminal / CLI
- Automation
- Developer tooling
- AI systems
- Large language models (LLMs)
- Benchmarking
- Evaluation frameworks
Questions fréquentes
لماذا تبلغ عن هذا العرض؟
قدم طلبك في 30 ثانية
أدخل بريدك الإلكتروني للتقديم. سيتم إنشاء حساب تلقائياً.
بالمتابعة، أنت توافق على شروط الاستخدام.
لديك حساب بالفعل؟ تسجيل الدخول
عزز فرصك
حمّل سيرتك الذاتية وسنقترح عليك الوظائف التي تناسب ملفك.
جاري تحليل سيرتك الذاتية...
Gramian Consulting
Égypte
عروض عمل ذات صلة
-
Software Testing Analyst – AI Training (Remote)
Alignerr Égypte -
Senior Information Security Engineer - Application Security
Camunda Égypte -
Remote Robot Operator – Capture Everyday Video for AI Training
Alignerr Égypte -
Senior OpenShift Engineer
Systems Limited - Egypt 6 Octobre -
Senior PMIS Engineer
Rowad Modern Engineering Le Caire