Type something to search...
My LLM’s outputs got 1000% better with this simple trick.

My LLM’s outputs got 1000% better with this simple trick.

I wish I had known this trick sooner.

When I interned at Adobe Research (Bangalore) last summer, my job was to make open-source LLMs more aligned with the context. That means that no matter what the context provided said, the LLM needed to abide by it.

I tried a method that looked at the input token activations and used some existing patterns in them to identify tokens that are seen in the context and boost them more than the other ones. This is called a “logit transformation”. Sometimes, logit transformations can go wrong, causing low probability tokens to exceed all other tokens.

In the above example, assume the context says that Olympia is the capital.

Clearly, without any transformation, the output is Seattle.

With the transformation, the output is “eek”.

Neither answer is correct.

Let’s not get into the details of the transformation right now. You can read some more details on it through this link:

But you might’ve guessed what happened:

My outputs ended up totally garbled.

Example output: “The capital of Washington iseekek0q3n ee”

I was stuck for a while and didn’t know what to do.

The first thing I tried was reduce the magnitude by which tokens were boosted through the method I tried. While reducing it helped reduce how garbled the outputs were, the context alignment I was trying to achieve had reduced. It was almost like I had to balance context alignment and garbled-ness of my outputs.

But after what I tried next, the garbled-ness had totally been resolved, without affecting the context alignment at all.

The Trick

I just filtered out the words of very low probability.

That turned out to completely eradicate the garbled outputs, while still allowing me to improve context alignment of the outputs.

At the end, my method improved context alignment slightly while maintaining fluency and grammatical correctness of the outputs.

Filtering function

This is how the overall function looks, when you take into consideration the filtering and the logit transformation.

Assume qN(x) is the logit distribution that we’re trying to modify. The modification in this case is the logarithm of qN over qM. Just consider this as some function that changes the output distribution and makes the LLM more “truthful”.

Now, the filtering aspect is taken care of by setting the logits to -infinity if the probability values are less than a threshold. Therefore, highly unlikely tokens like “eek” right after “Washington” are removed here. Remember that when we perform a softmax to the logits to get the probability distribution, we are using an exponential function. So setting any logit to -infinity is equivalent to setting that token’s probability to zero.

You can see how the filtering threshold is defined here.

Essentially, the threshold is some fraction of the probability of the most likely next token. This could be anything depending on the number of tokens that are similarly likely to be predicted next. That’s why we can’t take a fixed threshold but instead take some fraction.

Having this specific form of the filtering function mattered a lot in practice, since neither of the following worked as well:

  • A fixed number of highest probability tokens (eg, the top 10 probability tokens)
  • A fixed threshold (eg, 0.1)

Concluding thoughts and limitations

This is a pretty interesting application of a fairly simple technique that has some far-reaching consequences in output quality of LLMs. In specific, when applying certain transformations, these transformations often only apply to high probability tokens, and the low probability tokens need to eliminated to begin with before transforming the distributions.

I find these decoding approaches to be exciting new approaches to change LLM behaviour, but in spite of a filtering approach like this, it’s important to realise that methods involving transformations of logits have limitations. While this filtering approach might work well overall to resolve the garbled outputs, the threshold of filtering required make the outputs more fluent might differ across prompts. This makes a standardised filter hard to develop.

Even if the filtering approach works in most cases, it would be hard to prove that it would work in ALL cases — we might need some more confidence in it if it is to be adopted in a more commercial application.

If you want to know another method that used this filtering function, you might find this blog interesting:

Follow me: LinkedIn | X (Twitter) | Website

Related Posts

10 Creative Ways to Use ChatGPT Search The Web Feature

10 Creative Ways to Use ChatGPT Search The Web Feature

For example, prompts and outputs Did you know you can use the “search the web” feature of ChatGPT for many tasks other than your basic web search? For those who don't know, ChatGPT’s new

Read More
📚 10 Must-Learn Skills to Stay Ahead in AI and Tech 🚀

📚 10 Must-Learn Skills to Stay Ahead in AI and Tech 🚀

In an industry as dynamic as AI and tech, staying ahead means constantly upgrading your skills. Whether you’re aiming to dive deep into AI model performance, master data analysis, or transform trad

Read More
10 Powerful Perplexity AI Prompts to Automate Your Marketing Tasks

10 Powerful Perplexity AI Prompts to Automate Your Marketing Tasks

In today’s fast-paced digital world, marketers are always looking for smarter ways to streamline their efforts. Imagine having a personal assistant who can create audience profiles, suggest mar

Read More
10+ Top ChatGPT Prompts for UI/UX Designers

10+ Top ChatGPT Prompts for UI/UX Designers

AI technologies, such as machine learning, natural language processing, and data analytics, are redefining traditional design methodologies. From automating repetitive tasks to enabling personal

Read More
100 AI Tools to Finish Months of Work in Minutes

100 AI Tools to Finish Months of Work in Minutes

The rapid advancements in artificial intelligence (AI) have transformed how businesses operate, allowing people to complete tasks that once took weeks or months in mere minutes. From content creat

Read More
17 Mindblowing GitHub Repositories You Never Knew Existed

17 Mindblowing GitHub Repositories You Never Knew Existed

Github Hidden Gems!! Repositories To Bookmark Right Away Learning to code is relatively easy, but mastering the art of writing better code is much tougher. GitHub serves as a treasur

Read More