Models

Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows ...

Meta Llama 128K context $0.01/M input tokens $0.02/M output tokens

Qwen: QwQ 32B Preview

Introduction QwQ-32B-Preview is an experimental research model developed by the Qwen Team, focused on advancing AI reasoning capabilities. As a preview release, it demonstrates promising anal ...

Qwen 32K context $0.15/M input tokens $0.6/M output tokens

Google: Gemini 2.0 Flash Experimental

Gemini 2.0 Flash offers a significantly faster time to first token (TTFT) compared to Gemini 1.5 Flash, while maintaining quality on par with larger models like [Gemini 1.5 ...

Google 976.56K context $0.2/M input tokens $0.6/M output tokens

FREE

Google: Gemini 2.0 Flash Experimental (free)

Gemini 2.0 Flash offers a significantly faster time to first token (TTFT) compared to Gemini 1.5 Flash, while maintaining quality on par with larger models like [Gemini 1.5 ...

Google 976.56K context $0 input tokens $0 output tokens

FREE

DeepSeek: R1 (free)

DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully ...

DeepSeek 160K context $0 input tokens $0 output tokens

FREE

Meta: Llama 3.3 70B Instruct (free)

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimiz ...

Meta Llama 128K context $0 input tokens $0 output tokens

FREE

NVIDIA: Llama 3.1 Nemotron 70B Instruct (free)

NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging Llama 3.1 70B architecture and Reinfo ...

NVIDIA 128K context $0 input tokens $0 output tokens

FREE

Qwen: Qwen2.5 VL 72B Instruct (free)

Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images. ...

Qwen 128K context $0 input tokens $0 output tokens

FREE

Google: Gemini 2.0 Flash Thinking Experimental (free)

Gemini 2.0 Flash Thinking Mode is an experimental model that's trained to generate the "thinking process" the model goes through as part of its response. As a result, Thinking Mode is capable of stro ...

Google 39.06K context $0 input tokens $0 output tokens

FREE

Rogue Rose 103B v0.2 (free)

Rogue Rose demonstrates strong capabilities in roleplaying and storytelling applications, potentially surpassing other models in the 103-120B parameter range. While it occasionally exhibits inconsist ...

Sophosympatheia 4K context $0 input tokens $0 output tokens

FREE

Google: Gemini Pro 2.0 Experimental (free)

Gemini 2.0 Pro Experimental is a bleeding-edge version of the Gemini 2.0 Pro model. Because it's currently experimental, it will be heavily rate-limited by Google. Usage of Gemini is subject to ...

Google 1.91M context $0 input tokens $0 output tokens

FREE

Google: Gemini Flash Lite 2.0 Preview (free)

Gemini Flash Lite 2.0 offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maintaining quality on par with larger models like [Gemin ...

Google 976.56K context $0 input tokens $0 output tokens

FREE

DeepSeek: R1 Distill Llama 70B (free)

DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The m ...

DeepSeek 128K context $0 input tokens $0 output tokens

FREE

Qwen: Qwen VL Plus (free)