Transformer alternative
A 7.3B parameter Mamba-based model designed for code and reasoning tasks.Linear time inference, allowing for theoretically infinite sequence lengths 256k token context window Optimized for qu...
A 7.3B parameter Mamba-based model designed for code and reasoning tasks.Linear time inference, allowing for theoretically infinite sequence lengths 256k token context window Optimized for qu...