Senior Engineer - AI Evaluator
G2i Inc.
Job Overview
Location
Remote
Salary
USD 100 - 200 hourly
Employment Type
Contract
Work Arrangement
Remote
Sector
Information Technology & Software
Experience Level
Senior (5-8 years)
Application Deadline
May 21, 2026
About the Company
Job Description
We are seeking a highly experienced Senior Engineer to join our team as an AI Evaluator, focusing on modern coding agents like OpenAI Codex and Claude Code.
This is a unique contract role where you will not be writing production code. Instead, your expertise will be crucial in evaluating the quality of interactions with AI coding assistants, assessing whether they truly think like a great engineer.
Your responsibilities will include evaluating AI-generated coding interactions end-to-end, judging their usefulness, correctness, and alignment with strong engineering judgment. You will assess the quality of explanations and reasoning, distinguish between different levels of response quality, and provide clear, opinionated feedback on what worked, what didn’t, and what felt “off.” You will also help define what constitutes excellent AI interaction, particularly with tools like Cursor.
We are looking for individuals with a strong engineering “taste” – those who can make subjective yet rigorous judgments about whether an AI’s response feels like something a strong engineer would say, if an explanation is helpful, and if the model guides the user effectively. This role requires a high bar for what constitutes good engineering.
To apply for this role, click the Apply button on this page and follow the instructions.
Required Skills
Key Responsibilities
- Evaluate AI-generated coding interactions end-to-end
- Judge whether outputs are useful, correct (at a high level), and aligned with how a strong engineer would think
- Assess the quality of explanations and reasoning, not just code
- Distinguish between different levels of response quality
- Provide clear, opinionated feedback on what worked, what didn’t, and what felt “off” or misleading
- Help define what great looks like when interacting with tools like Cursor
Qualifications
- Staff / Principal-level engineer (or equivalent experience)
- Strong background in TypeScript / JavaScript or Python
- Hands-on experience using OpenAI Codex, Claude Code, and Cursor
- Deep familiarity with modern AI-assisted dev workflows
- Able to evaluate code without needing to fully execute or deeply review every line
- Comfortable giving direct, opinionated feedback
- High bar for what “good engineering” looks like
Benefits & Perks
- Competitive hourly rate ($100–$200/hour)
- Flexible ~20+ hours per week
- Contract duration through early May with possible extension
- Opportunity to work with cutting-edge AI technology
How to Apply
To apply for this role, click the Apply button on this page and follow the instructions.
Join Our Communities
The AI evaluation landscape is rapidly evolving, with coding agents like Codex and Claude Code at the forefront of innovation. This role focuses on assessing the nuanced quality of AI-generated code interactions, moving beyond mere syntax correctness to evaluate the model's engineering judgment and reasoning. You will analyze responses for sense, preamble usefulness, and alignment with expert developer thought processes. Your impact will be measured by your contribution to refining AI models, directly influencing their ability to assist developers and enhance productivity, thereby impacting the ROI of AI development initiatives.
Posted Date
May 7, 2026
DevOps Engineer
G2i Inc.
Staff Engineer
G2i Inc.
Software Engineer
Future
Senior Software Engineer
Future
Backend Engineer
Confidential Employer
Mobile Developer (React Native)
Confidential Employer
Applied AI Engineer
Future
Senior/Staff Backend Payments Engineer
Plasma
Notion
Notion
Notion
Notion