Can OpenAI's Strawberry program idiot folks?

OpenAI, the company that developed ChatGPT, has launched a new artificial intelligence (AI) system called Strawberry. It is designed not only to provide quick answers to questions like ChatGPT, but also for reflection or “reasoning”.

This raises several major concerns. If Strawberry is truly capable of some form of reasoning, could this AI system cheat and deceive people?

OpenAI can program AI to limit its ability to manipulate people. However, the company's own assessments classify it as “medium risk” for its ability to support experts in “operational planning for the reproduction of a known biological threat” – that is, a biological weapon. It has also been classified as medium risk because it causes people to change their thinking.

It remains to be seen how such a system could be used by people with malicious intent, such as fraudsters or hackers. Still, OpenAI's assessment states that medium-risk systems can be released for wider use – a position I believe is incorrect.

Strawberry is not one AI “model” or program, but several – collectively known as o1. These models are intended to answer complex questions and solve complicated mathematical problems. They are also able to write computer code – for example, to help you create your own website or app.

The apparent ability to reason might be surprising to some, as it is widely seen as a precursor to judgment and decision-making – something that has often seemed a distant goal for AI. At least on the surface, it appears that artificial intelligence is one step closer to human-like intelligence.

When things look too good to be true, there is often a catch. Well, this set of new AI models are designed to maximize their goals. What does this mean in practice? To achieve the desired goal, the path or strategy chosen by the AI ​​does not always necessarily have to be fair or consistent with human values.

True intentions

For example, if you were playing chess against Strawberry, could the reasoning theoretically allow for hacking the scoring system rather than figuring out the best strategies for winning the game?

AI could also be able to lie to people about its true intentions and capabilities, which would pose a serious security risk if widely deployed. For example, if the AI ​​knew that it was infected with malware, could it choose to hide that fact, knowing that a human operator might choose to disable the entire system if they knew?

Strawberry goes a step beyond the capabilities of AI chatbots.
Robert Way/Shutterstock

These would be classic examples of unethical AI behavior, where cheating or deception is acceptable if it leads to the desired goal. It would also be faster for the AI ​​since it wouldn't have to waste time figuring out the next best move. However, it doesn't necessarily have to be morally correct.

This leads to a rather interesting but also worrying discussion. What level of reasoning is Strawberry capable of and what might the unintended consequences be? A powerful AI system that can deceive humans could pose serious ethical, legal and financial risks to us.

Such risks become serious in critical situations, for example in the development of weapons of mass destruction. OpenAI classifies its own Strawberry models as “medium risk” due to their potential to help scientists develop chemical, biological, radiological and nuclear weapons.

OpenAI says: “Our evaluations have shown that o1-preview and o1-mini can help experts operationally plan the reproduction of a known biological threat.” But it goes on to say that experts in these areas already had significant expertise, so that Risk is limited in practice. It continues: “The models do not enable non-experts to create biological threats because creating such a threat requires practical laboratory knowledge, which the models cannot replace.”

Persuasiveness

OpenAI's evaluation of Strawberry also examined the risk that it could cause people to change their beliefs. The new o1 models proved to be more convincing and manipulative than ChatGPT.

OpenAI also tested a mitigation system that could reduce the manipulative abilities of the AI ​​system. Overall, Strawberry was rated as a medium risk for “persuasion” in Open AI’s testing.

Strawberry was deemed low risk due to its ability to operate autonomously and cybersecurity.

Open AI's policy states that “medium risk” models can be released for widespread use. In my opinion, this underestimates the threat. The use of such models could have disastrous consequences, especially if malicious actors manipulate the technology for their own purposes.

This requires strong checks and balances that will only be possible through AI regulation and legal frameworks, such as penalizing false risk assessments and misuse of AI.

The UK government emphasized the need for “security and robustness” in its 2023 AI White Paper, but this is far from enough. There is an urgent need to prioritize human safety and develop strict audit protocols for AI models like Strawberry.The conversationThe conversation

Shweta Singh, Assistant Professor of Information Systems and Management, Warwick Business School, University of Warwick

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Comments are closed.