In the First Proof project, four AI systems were tested on ten research-level mathematics problems. None of the AI models performed as well as top mathematicians, scoring only 6 out of 10 on average. The test was designed to meet three criteria: using research-level math problems, avoiding problems present in the AI's training data, and being formally graded by human mathematicians. The results were published on the First Proof website on 10 June. This follows recent advancements in AI, such as a chatbot solving an 80-year-old math problem.
Bias read (Center): The article presents factual information about an AI performance test without taking a stance on the implications or outcomes. It reports on the results objectively, mentioning both the limitations of AI and recent advancements without biased language or emphasis.






