Amnesty International is absorbed in Sudoku. More worry is that it cannot explain the reason

Chatbots It can be really impressive when you see them do that Good things inLike writing a realistic text or Create strange future images. But try to ask artificial intelligence to solve one of those puzzles that you find at the back of a newspaper, and things can start quickly.

This is what researchers at the University of Colorado Boldar found when they challenged various large language models to solve Sudoku. Not even the standard 9×9 puzzles. The 6 x 6 puzzle often exceeded the LLM capabilities without external assistance (in this case, the specified puzzle tools).

The most important result came when the models were asked to show their work. For the largest part, they could not. Sometimes they lied. Sometimes they explained things in meaningless ways. Sometimes they are jelly Talking about the weather began.

Ashotosh Trevidy, Professor of Computer Science at Colorado University in Bulder, one of the authors of the authors at Colorado University in Bulder, one of the authors of the authors at Colorado University, one of the authors of the authors at Colorado University, Professor of Computer Science at Colorado University in Bulder and one of the authors of the authors of the authors: paper In July, it was published in the results of the Association of Mass.

“We really would like these explanations to be transparent and reflect the reason for making artificial intelligence this decision,” said Trevidy.

When you make a decision, you can at least try to justify it or explain how you reach it. This is an essential element in society. We take responsibility for the decisions we make. The artificial intelligence model may not be able to explain itself accurately or transparently. Do you trust him?

Why do you struggle llms with sudoku

We have seen artificial intelligence models that fail in basic games and puzzles before. Chatgpt was from Openai (among others) Fully crush By ATARI 1979 computer discount. I found a modern search paper from Apple that models can be struggled with Other puzzles, such as Hanoi Tower.

It comes to the way LLMS works and fill the gaps in information. These models try to complete these gaps based on what is happening in similar cases in their training data or other things they have seen in the past. With sudoku, the question is one of the logic. Artificial intelligence may try to fill each gap in order, based on what appears to be a reasonable answer, but to solve it properly, instead it must look at the entire image and find a logical arrangement that changes from the mystery to the mystery.

Chatbots is bad in chess for a similar reason. They find the following logical moves, but they do not necessarily think three, four or five moves. This is the basic skill needed to play the chess well. Chatbots also tend to move chess pieces in ways that do not really follow the rules or put the pieces in a meaningless danger.

You may expect LLMS to be able to solve Sudoku because they are computers and mystery consists of numbers, but the puzzles themselves are not really sports; It is symbolic. “Sudoku is famous for being a puzzle with numbers that can be done with anything that is not numbers,” said Fabio Sumanzi, a professor of CU and one of the authors of the research paper.

I used a sample directed from the researchers ’paper and gave it to ChatGPT. The tool showed her work, and she repeatedly told me that the answer before showing a mystery did not succeed, then return and correct it. It was as if the robot was turning in a presentation that continued to get second modifications: this is the final answer. No, in fact, do not care, this It is the final answer. She eventually got the answer, through experience and error. But experience and error are not a practical way to solve Sudoku in the newspaper. This is the amount of erasing and destroying fun.

A robot plays chess against a person. — Artificial intelligence and robots can be good in games if they are built for operation, but general purposes such as large language models can fight with logical puzzles.

Ore Huiying/Bloomberg via Getty Images

Amnesty International is struggling to show its work

Colorado researchers only wanted to know if robots can solve puzzles. They requested explanations on how robots work through them. Things did not go well.

OPENAI’s thinking model test, researchers believed that the interpretations-even for the puzzles that were resolved properly-did not explain accurately or justify their movements and basic terms declined.

“One of the things they are good at is to provide reasonable explanations,” said Maria Pacico, a CU computer assistant professor in CU. “They are in line with humans, so they learn to speak as we love it, but if they are sincere, the actual steps to solve the thing should be the place we face a little.”

Sometimes, the explanations were completely relevant. Since the completion of the work of the paper, the researchers have continued to test new models released. Sumanzi said that when he and Trevide were the Openai thinking model in Openai through the same tests, at some point, he seemed completely surrendered.

“The next question we asked, the answer was the weather forecast for Denver,” he said.

(Disclosure: Zif Davis, the parent company CNET, filed a lawsuit against Openai, claimed that it had violated the copyright of ZifF Davis in training and operating artificial intelligence systems.)

Explain yourself is an important skill

When you solve the puzzle, you are sure to walk another person with your thinking. The fact that these LLMS failed amazingly in this basic function is not a trivial problem. With artificial intelligence companies, they speak constantly.Artificial intelligence agents“This can take action on your behalf, and the ability to explain yourself is necessary.

Consider the types of functions that are granted to artificial intelligence now, or planned in the near future: Drivingand Carrying out taxesDetermine the work strategies and translate important documents. Imagine what will happen if you, a person, one of these things and something wrong occurred.

“When humans have to put their face in front of their decisions, it is better to be able to explain what led to this decision,” Suminzi said.

It is not just a matter of obtaining a reasonable answer. It should be accurate. One day, the interpretation of artificial intelligence for itself may be forced to withstand court, but how can its testimony be taken seriously if it is known to lie? You will not trust someone who failed to explain himself, and you will not trust that someone found him saying what you wanted to hear instead of the truth.

“There is a very close interpretation of manipulation if this is done for a wrong reason,” Trevidy said. “We must be very careful regarding the transparency of these interpretations.”

https://www.cnet.com/a/img/resize/ab04348177ba65a246194b5e68681b74ac192e6f/hub/2025/08/06/35b5b376-ca0f-4344-8e81-cc4b65e3994e/gettyimages-1440979336.jpg?auto=webp&fit=crop&height=675&width=1200

Source link

Why do you struggle llms with sudoku

Amnesty International is struggling to show its work

Explain yourself is an important skill

“You must stay outside the drug”

Amid the demobilization of workers, TCS announces a rise in wages from September 1. Below who gets it

Leave a Comment Cancel reply