Transformer alternative

Mistral: Codestral Mamba

A 7.3B parameter Mamba-based model designed for code and reasoning tasks.Linear time inference, allowing for theoretically infinite sequence lengths 256k token context window Optimized for qu...

MistralAI 250K context $0.25/M input tokens $0.25/M output tokens