Type something to search...

Compression

How NVIDIA Pruned and Distilled Llama 3.1 to Create Minitron 4B and 8B

How NVIDIA Pruned and Distilled Llama 3.1 to Create Minitron 4B and 8B

The new models are using state of the art pruning and distillation techniques.I recently started an AI-focused educational newsletter, that already has over 170,000 subscribers. TheSe

Read More
Large Language Models Just Got A Whole Lot Smaller

Large Language Models Just Got A Whole Lot Smaller

And how this might change the game for software startups This piece was co-written with David Meiborg. *TLDR: Large Language Models (LLMs for short) a

Read More