Researchers from the Center for AI Safety and Scale AI have created a new evaluation called “Humanity’s Last Exam” in response to the ongoing challenge of testing the intelligence of A.I. systems. This test, which consists of 3,000 complex questions in a variety of academic subjects, aims to determine how well A.I. systems can perform complex intellectual tasks.
Questions were submitted by experts in fields like analytic philosophy and rocket engineering, and were tested on leading A.I. models. The results were surprising, with none of the models scoring particularly well on the exam. However, the researchers believe that with time, A.I. systems could surpass human experts in answering challenging questions.
The creation of “Humanity’s Last Exam” was inspired by Elon Musk’s concerns about the existing tests given to A.I. models, which he felt were too easy. While there are other tests available to measure A.I. capabilities in specific domains, this exam is unique in its attempt to provide a general intelligence score across a wide range of academic subjects.
Overall, the research sheds light on the complex and jagged nature of A.I. progress, highlighting the need for more creative and innovative methods of tracking and evaluating A.I. systems. While A.I. models may be proficient at answering complex questions, they still have limitations when it comes to applying knowledge in real-world scenarios that require critical thinking and problem-solving skills beyond what standardized tests can measure.