AI Evaluation Engineer - Mathematics & Algorithms

contractengineeringaidataremote FROM πŸ‡§πŸ‡·
Open to candidates in: Pk, Eg, Ke, Gh, Ng, Brazil
Gramian Consulting Group
🏭 IT Services and IT Consulting
πŸ“ Kumanovo, MK
πŸ‘€ 2-10

About Us

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs.

Role overview

We are looking for a highly analytical and computationally strong professional with a solid research background in mathematics or quantitative fields.

In this role, you will design advanced benchmark tasks for multi-agent AI systems, focusing on complex mathematical reasoning, algorithmic problem-solving, and verifiable computational outputs. You will contribute by crafting challenging problems, building validation systems, and structuring tasks that require decomposition into coordinated sub-solutions.

Commitments Required: 8 hours per day with an overlap of 4 hours with PST.

Employment type: Contractor assignment (no medical/paid leave)

Duration of contract: 4 weeks+

Location: Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria,Turkey, Vietnam

Interview: take home assessment (60min) + short interview

Responsibilities

  • Design and build multi-agent benchmark tasks requiring multi-step mathematical reasoning and algorithmic problem-solving
  • Create complex, decomposable problems across domains such as:
    • Competition mathematics
    • Numerical analysis
    • Combinatorial optimization
    • Statistical inference
  • Develop verification scripts to validate:
    • Numerical outputs (with tolerance thresholds)
    • Proof correctness and logical steps
    • Algorithmic outputs and constraints
  • Write clear, structured problem statements with precise notation and defined outputs
  • Design task decomposition strategies for parallel or multi-agent execution
  • Implement computational solutions and validation pipelines using Python
  • Work with containerized environments (Docker) for reproducibility and evaluation

Requirements

  • 5+ years in mathematics, quantitative research, or computational science β€” competition math, university-level mathematics, or quantitative research background 
  • Python programming β€” NumPy, SciPy, or symbolic computation (SymPy) Experience writing mathematical proofs or formal derivations.
  • Ability to create problems with precise, verifiable answers β€” not subjective or open-ended.
  • Experience with AI coding benchmarks (SWE-bench, Terminal-bench) 
  • Comfortable with Docker β€” writing Dockerfiles, building images, and debugging container issues.
  • Understanding of numerical methods β€” floating point tolerance, convergence criteria, error bounds.

Nice to Have

  • Experience creating competition math problems (AMC, AIME, Putnam, IMO)
  • Background in theoretical computer science or advanced mathematics research
  • Exposure to automated theorem proving or formal verification
  • Familiarity with AI reasoning benchmarks (GSM8K, MATH, AIME, GPQA, ARC-AGI)
  • Experience in large-scale numerical or scientific computing
Gramian Consulting Group
🏭 IT Services and IT Consulting
πŸ“ Kumanovo, MK
πŸ‘€ 2-10