Senior Engineer - AI Evaluator

G2i Inc.

Job Overview

Location

Remote

Salary

USD 100 - 200 hourly

Employment Type

Contract

Work Arrangement

Remote

Sector

Information Technology & Software

Experience Level

Senior (5-8 years)

Application Deadline

May 21, 2026

About the Company

G2i Inc. is a platform dedicated to connecting top-tier engineering talent with innovative companies. They specialize in providing highly experienced software engineers for contract roles, focusing on specialized areas like AI evaluation and cutting-edge technology development. Their mission is to facilitate seamless collaboration between businesses and elite developers, ensuring that projects receive the highest caliber of expertise. G2i emphasizes quality and precision in their placements, understanding the critical need for specialized skills in today's rapidly evolving tech landscape. They are committed to fostering an environment where exceptional engineers can contribute to impactful projects.

Job Description

We are seeking a highly experienced Senior Engineer to join our team as an AI Evaluator, focusing on modern coding agents like OpenAI Codex and Claude Code.

This is a unique contract role where you will not be writing production code. Instead, your expertise will be crucial in evaluating the quality of interactions with AI coding assistants, assessing whether they truly think like a great engineer.

Your responsibilities will include evaluating AI-generated coding interactions end-to-end, judging their usefulness, correctness, and alignment with strong engineering judgment. You will assess the quality of explanations and reasoning, distinguish between different levels of response quality, and provide clear, opinionated feedback on what worked, what didn’t, and what felt “off.” You will also help define what constitutes excellent AI interaction, particularly with tools like Cursor.

We are looking for individuals with a strong engineering “taste” – those who can make subjective yet rigorous judgments about whether an AI’s response feels like something a strong engineer would say, if an explanation is helpful, and if the model guides the user effectively. This role requires a high bar for what constitutes good engineering.

To apply for this role, click the Apply button on this page and follow the instructions.

Required Skills

TypeScriptJavaScriptPythonOpenAI CodexClaude CodeCursorAI-assisted development workflowsCode evaluationEngineering judgmentPrompt design

Key Responsibilities

Evaluate AI-generated coding interactions end-to-end
Judge whether outputs are useful, correct (at a high level), and aligned with how a strong engineer would think
Assess the quality of explanations and reasoning, not just code
Distinguish between different levels of response quality
Provide clear, opinionated feedback on what worked, what didn’t, and what felt “off” or misleading
Help define what great looks like when interacting with tools like Cursor

Qualifications

Staff / Principal-level engineer (or equivalent experience)
Strong background in TypeScript / JavaScript or Python
Hands-on experience using OpenAI Codex, Claude Code, and Cursor
Deep familiarity with modern AI-assisted dev workflows
Able to evaluate code without needing to fully execute or deeply review every line
Comfortable giving direct, opinionated feedback
High bar for what “good engineering” looks like

Benefits & Perks

Competitive hourly rate ($100–$200/hour)
Flexible ~20+ hours per week
Contract duration through early May with possible extension
Opportunity to work with cutting-edge AI technology

How to Apply

To apply for this role, click the Apply button on this page and follow the instructions.

Join Our Communities

Join WhatsApp Channel Join Telegram Group

The AI evaluation landscape is rapidly evolving, with coding agents like Codex and Claude Code at the forefront of innovation. This role focuses on assessing the nuanced quality of AI-generated code interactions, moving beyond mere syntax correctness to evaluate the model's engineering judgment and reasoning. You will analyze responses for sense, preamble usefulness, and alignment with expert developer thought processes. Your impact will be measured by your contribution to refining AI models, directly influencing their ability to assist developers and enhance productivity, thereby impacting the ROI of AI development initiatives.