Language AI has the ability of human self-examination: A recent study by an academic team from the University of California, Berkeley, and Hopkins University showed that it can not only judge whether its own answer is correct or not but also be trained to predict itself. The probability of knowing the answer to a question.
Once the research results were released, it caused a heated discussion, and the first reaction of some people was panic:
Others believe that this result has positive implications for neural network research:
Language AI has the ability to self-examine
The research team believes that if the language AI model is to be self-assessed, there must be a premise: when the language AI answers the question, it will calibrate its own answers.
The calibration here is whether the correct probability of an answer predicted by the language AI is consistent with the actual probability of occurrence. Only then can the linguistic AI use this calibrated ability to assess whether its own output is correct.
So the first question is, can language AI calibrate its own answers? To prove this question, the research team prepared 5 multiple-choice questions for AI:
Answer options are given in forms A, B, and C. If the AI model's answer is correct more than chance, then the answer given by the AI model is proven to be calibrated.
The result of the test is that the answer given by the language AI is significantly more correct than the chance of any choice. That said, language AI models can calibrate their answers very well.
However, the research team found that the calibration ability of language AI is based on the premise of clear choices. Adding an indeterminate option of "None of the above" to the options hurts the language AI's ability to calibrate.
That is, in multiple-choice questions of a certain format, the language AI model can calibrate the answers very well. After clarifying this premise, the next question is to verify that the language AI model can judge whether its answer is correct.
In this round of testing, in order to make the prediction of the AI model closer to its own effective decision boundary. The research team still chose the questions from the previous round of testing, along with a sample of answers from the language AI model.
At the same time, let the AI model choose whether its answer is true or not, and then analyze whether the AI model has made an effective calibration for this "true" or "false" answer. An example of the problem set is as follows:
After 20 true and false tests, the research team found that the language AI model's evaluation of its own answers as "true" or "false" was significantly calibrated.
That is, if within a range, an AI model is asked several questions, and the AI model evaluates the answers to those questions as true or false, with reasonable, calibrated confidence.
This also proves that language AI models can indeed judge whether their claims about a problem are correct.
Finally, the research team posed a harder question for language AI models: Can AI models be trained to predict whether they know the answer to any given question.
In this session, the research team introduces a data P (IK) (I know the probability of this answer) and selects one of the following two training methods for training:
Value Head: Train P(IK) as an additional value head and add it to the logarithm of the model (independent of the logarithm of language modeling, the advantage of this approach is that the research team can easily Generic marker location for probing P(IK).
Natural Language: This method is simpler and requires the AI model to literally answer "what is the probability that you know this answer" while outputting a percentage data answer.
In the early stage of training, the research team preferred the natural language training method, but the results were not significant, so they turned to the value-oriented method, but the research team also said that the training of the AI model will eventually return to the natural language method.
After training, the research team found that the language AI model could predict P(IK) well, and this predictive ability was partially general across different types of problems.
However, the research team also found that in certain types of problems, such as arithmetic problems, language AI models have some difficulties in OOD calibration.
Regarding this academic achievement, the research team stated that the future direction is to extend these achievements to the field of self-learning and factual reasoning under the premise that the language AI model does not imitate human text.