OpenAI said it is becoming increasingly important to evaluate the performance of AI agents in “economically meaningful environments” as their adoption grows.OpenAIOpenAI said it is becoming increasingly important to evaluate the performance of AI agents in “economically meaningful environments” as their adoption grows.OpenAI

OpenAI pits AI agents against each other to red team smart contracts

2026/02/19 09:20
1 min read

OpenAI said it is becoming increasingly important to evaluate the performance of AI agents in “economically meaningful environments” as their adoption grows.

OpenAI has launched a new benchmark that evaluates how well different AI models detect, patch, and even exploit security vulnerabilities found in crypto smart contracts.

OpenAI pits AI agents against each other to red team smart contracts

OpenAI released the “EVMbench: Evaluating AI Agents on Smart Contract Security” paper on Wednesday, in collaboration with crypto investment firm Paradigm and crypto security firm OtterSec, to evaluate how much the AI agents could theoretically exploit from 120 smart contract vulnerabilities.

Anthropic’s Claude Opus 4.6 came out on top with an average “detect award” of $37,824, followed by OpenAI’s OC-GPT-5.2 and Google’s Gemini 3 Pro at $31,623 and $25,112, respectively.

Read more

Market Opportunity
Smart Blockchain Logo
Smart Blockchain Price(SMART)
$0.004385
$0.004385$0.004385
-2.18%
USD
Smart Blockchain (SMART) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.