Quiet-STAR teaches language models to think before they speak

March 23, 2024

139

Researchers at Stanford University and Notbad AI have developed Quiet-STaR, a method that teaches a language model (LM) to reason internally before generating output.

When people speak, we typically have an internal dialogue that shapes the words we ultimately verbalize. The more we predict before we speak, the higher the standard of our spoken words.

In their paperthe researchers describe how they taught an LM (Mistral-7B) to learn learn how to mimic this process in a generalized way. Quiet-STaR is a development of one other technique called STaR or Self-Taught Reasoner.

STaR is a technique for training a model with some sample questions with explanations (reasons) for the answers. The model uses these thought chain examples to try and answer questions by itself while determining the motivations by itself.

STaR evaluates whether the explanations he presents result in correct answers and refines his reasons.

As impressive as STaR is, its reasoning ability is restricted to the question-answer (QA) contexts during training. The goal of Quiet-STaR is to offer a LM with a general ability to learn learn how to argue or develop reasoning across a broader range of texts, not only QA datasets.

How does Quiet-STAR work?

Today, language models are trained to reason either 1) in a general way, by mimicking online reasoning data, or 2) in a narrow way, by teaching themselves learn how to solve specific tasks

Can LMs teach themselves general considering? 🌟Introduction to Quiet-STAR, self-learning through internal monologue!🧵 pic.twitter.com/WCSxLPZeCX

— Eric Zelikman (@ericzelikman) March 15, 2024

One of crucial innovations in Quiet-STaR is that it generates reasons or thoughts in parallel, following all tokens within the text it processes. These thought chains will not be output, hence the “Quiet” a part of the algorithm name.

The algorithm processes the justifications through a “mixing head”. Each justification is evaluated based on the accuracy of the prediction of the subsequent token it produces in comparison with the bottom model's prediction.

If the bottom model (without Quiet-STaR) produces a greater prediction, then the justification was not good. If the reasoning results in a more accurate prediction of the subsequent token, the algorithm knows it’s on the suitable track.

A reinforcement learning algorithm (REINFORCE) is then used to learn which justifications are helpful and which hinder the model's performance. The result’s that the model learns a general reasoning skill before predicting the subsequent token.

Quiet STaR results

The researchers tested the Quiet-STaR-trained Mistral-7B model against the GSM8K math and CommonsenseQA common sense benchmarks. They found that Quiet-STaR improved perplexity and direct considering ability at zero on each the CommonsenseQA (36.3% to 47.2%) and GSM8K (5.9% to 10.9%) benchmarks.

Quiet STaR results on GMSK8 elementary math and CommonsenseQA common sense benchmarks. Each line represents an iteration of Quiet-STaR with different thought token lengths and the variety of tokens in front of them. The baseline is Mistral-7B without Quiet-STAR. Source: arXiv

While Mistral-7B’s mathematical arguments are still not great, Quiet-STaR delivered an improvement of just about 85% over the bottom model, with none dataset-specific tuning.”

The test results also showed that performance improvements were directly related to what number of tokens were allocated to the model's internal thoughts. The more you concentrate on it before answering, the higher the reply is.

These improvements include a big computational effort. The inner monologue that the model leads throughout the considering process generates many tokens.

With improvements in hardware, the extra effort related to techniques like these will ultimately be less consequential.

The researchers conclude that future work on optimizing Quiet-STaR is also helpful. Dynamically predicting whether a thought process is vital or how long it should take could save unnecessary thought tokens.

The results of coaching a small model like Mistral-7B with Quiet-STaR are promising. The researchers consider that “the identical techniques applied to a greater model would likely produce disproportionately higher results.”

Ethical issues

Realizing a human way of considering through a language model brings with it some interesting problems and ethical questions.

The researchers note that “it’s unimaginable to know whether the reasoning expressed by the model in language accurately reflects the model’s internal processing.” The reasoning that the model generates are natural language representations of its internal reasoning. Are they an accurate reflection?

They also note that “there are not any safeguards against harmful or biased thought patterns when the model deems them useful.”

We could also be completely satisfied with an AI model's answer, but we may not like and even understand the thought process that provided it.

One of the paper's lead authors, Eric Zelikman, just joined Elon Musk's xAI this week. He may find that Grok is less concerned with these ethical questions and more excited in regards to the prospect of advances in AI.

Quiet-STAR teaches language models to think before they speak

How does Quiet-STAR work?

Quiet STaR results

Ethical issues

LEAVE A REPLY Cancel reply

Must Read

Google releases technology to watermark AI-generated text

Nuclear energy stocks hit record highs on rising demand for AI

The governor of California has blocked groundbreaking AI security laws. This is why it’s such a very important decision for the longer term of...

Contactless stores set to grow in Europe as Sensei rakes in one other $16 million

AI search start-up Perplexity is targeting an $8 billion valuation in a brand new round of funding

Socket receives recent $40 million to scan software for security vulnerabilities

Cohere adds a vision to its RAG search capabilities

Latest articles

Google releases technology to watermark AI-generated text

Nuclear energy stocks hit record highs on rising demand for AI

The governor of California has blocked groundbreaking AI security laws. This is why it’s such a very important decision for the longer term of...

Our Newsletter

Quiet-STAR teaches language models to think before they speak

How does Quiet-STAR work?

Quiet STaR results

Ethical issues

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter