Type something to search...

Generative ai

DeepSeek-R1 is here! ⚡ Performance on par with OpenAI-o1 📖 Fully open-source model & technical report 🏆 MIT licensed: Distill & commercialize freely! ...

DeepSeek R1
DeepSeek
62.5K context $0.55/M input tokens $2.19/M output tokens

Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 1 ...

Microsoft: Phi 4
Microsoft Azure
16K context $0.07/M input tokens $0.14/M output tokens

1. Introduction We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-eff ...

DeepSeek V3
DeepSeek
62.5K context $0.14/M input tokens $0.28/M output tokens

Gemini 2.0 Flash offers a significantly faster time to first token (TTFT) compared to Gemini 1.5 Flash, while maintaining quality on par with larger models like [Gemini 1.5 ...

Google: Gemini 2.0 Flash Experimental
Google
976.56K context $0.2/M input tokens $0.6/M output tokens

DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported ...

DeepSeek V3
DeepSeek
62.5K context $0.14/M input tokens $0.28/M output tokens

Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge. Crea ...

Sao10K: Llama 3 8B Lunaris
Rifx.Online
8K context $0.03/M input tokens $0.06/M output tokens

Mag Mell is a merge of pre-trained language models created using mergekit, based on Mistral Nemo. It is a great roleplay and storytelling model which combines the best part ...

Inflatebot: Mag Mell R1 12B
Rifx.Online
15.63K context $0.9/M input tokens $0.9/M output tokens
FREE

Gemini 2.0 Flash Thinking Mode is an experimental model that's trained to generate the "thinking process" the model goes through as part of its response. As a result, Thinking Mode is capable of stro ...

Google: Gemini 2.0 Flash Thinking Experimental (free)
Google
39.06K context $0 input tokens $0 output tokens
50% OFF

EVA Llama 3.33 70b is a roleplay and storywriting specialist model. It is a full-parameter finetune of Llama-3.3-70B-Instruct on mixture of ...

EVA Llama 3.33 70b
Eva unit 01
16K context $4/M input tokens $6/M output tokens

Euryale L3.3 70B is a model focused on creative roleplay from Sao10k. It is the successor of Euryale L3 70B v2.2. ...

Sao10K: Llama 3.3 Euryale 70B
Rifx.Online
7.81K context $1.5/M input tokens $1.5/M output tokens

Experimental release (December 6, 2024) of Gemini. ...

gemini-exp-1206
Google
8K context $4/M input tokens $16/M output tokens

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimiz ...

Meta: Llama 3.3 70B Instruct
Meta Llama
128K context $0.13/M input tokens $0.4/M output tokens

This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet(https://openrouter.ai/anthropic/claude-3.5-sonnet) and Opus(https://openrouter.ai/anthro ...

Magnum v4 72B
Anthracite org
32K context $1.875/M input tokens $2.25/M output tokens
40% OFF

Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including:Code generation Text generation Text editing Problem solving...

Gemini 1.5 Pro
Google
1.91M context $2.5/M input tokens $10/M output tokens $0.003/M image tokens
40% OFF

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results [here](https ...

Claude-3-Haiku-20240307
Anthropic
195.31K context $0.5/M input tokens $2.5/M output tokens $0.4/K image tokens

A wild 7B parameter model that merges several models using the new task_arithmetic merge method from mergekit. List of merged models:NousResearch/Nous-Capybara-7B-V1.9 [HuggingFaceH4/zephyr-7b-b...

Toppy M 7B
Undi95
4K context $0.07/M input tokens $0.07/M output tokens

A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge ...

ReMM SLERP 13B
Undi95
4K context $1.125/M input tokens $1.125/M output tokens

GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more afford ...

GPT-4o mini
OpenAI
125K context $0.15/M input tokens $0.6/M output tokens $0.007/M image tokens
40% OFF

GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more afford ...

GPT-4o mini
OpenAI
125K context $0.15/M input tokens $0.6/M output tokens $0.007/M image tokens

One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge _These are extended-context endpoints for [MythoMax 13B](/gryphe/mythomax-l2-13b ...

MythoMax 13B (extended)
Gryphe
8K context $1.125/M input tokens $1.125/M output tokens
FREE

One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge _These are extended-context endpoints for [MythoMax 13B](/gryphe/mythomax-l2-13b ...

MythoMax 13B (free)
Gryphe
8K context $0 input tokens $0 output tokens

A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge ...

ReMM SLERP 13B (extended)
Undi95
4K context $1.125/M input tokens $1.125/M output tokens

PaLM 2 fine-tuned for chatbot conversations that help with code-related questions. ...

Google: PaLM 2 Code Chat 32k
Google
31.99K context $1/M input tokens $2/M output tokens

This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-2407). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch anno ...

Mistral Large 2411
MistralAI
125K context $2/M input tokens $6/M output tokens

This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-2407). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch anno ...

Mistral Large 2407
MistralAI
125K context $2/M input tokens $6/M output tokens

Llama 3.1 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. This is a normal offline LLM, but the [online version](/perpl ...

Perplexity: Llama 3.1 Sonar 70B
Perplexity
128K context $1/M input tokens $1/M output tokens

Llama 3.1 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. This is a normal offline LLM, but the [online version](/perpl ...

Perplexity: Llama 3.1 Sonar 8B
Perplexity
128K context $0.2/M input tokens $0.2/M output tokens

OpenChat 7B is a library of open-source language models, fine-tuned with "C-RLFT (Conditioned Reinforcement Learning Fine-Tuning)" - a strategy inspired by offline reinforcement learning. It has been ...

OpenChat 3.5 7B
Openchat
8K context $0.055/M input tokens $0.055/M output tokens

An older GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Sep 2021. ...

OpenAI: GPT-3.5 Turbo 16k (older v1106)
OpenAI
16K context $1/M input tokens $2/M output tokens
FREE

A wild 7B parameter model that merges several models using the new task_arithmetic merge method from mergekit. List of merged models:NousResearch/Nous-Capybara-7B-V1.9 [HuggingFaceH4/zephyr-7b-b...

Toppy M 7B (free)
Undi95
4K context $0 input tokens $0 output tokens

A pretrained generative Sparse Mixture of Experts, by Mistral AI. Incorporates 8 experts (feed-forward networks) for a total of 47B parameters. Base model (not fine-tuned for instructions) - see [Mix ...

Mixtral 8x7B (base)
MistralAI
32K context $0.54/M input tokens $0.54/M output tokens

This model is currently powered by Mistral-7B-v0.2, and incorporates a "better" fine-tuning than Mistral 7B, inspired by community work. It's best used for larg ...

Mistral Tiny
MistralAI
31.25K context $0.25/M input tokens $0.25/M output tokens

Google's flagship text generation model. Designed to handle natural language tasks, multiturn text and code chat, and code generation. See the benchmarks and prompting guidelines from [Deepmind](htt ...

Google: Gemini Pro 1.0
Google
31.99K context $0.5/M input tokens $1.5/M output tokens $0.003/M image tokens

The NeverSleep team is back, with a Llama 3 70B finetune trained on their curated roleplay data. Striking a balance between eRP and RP, Lumimaid was designed to be serious, yet uncensored when necess ...

Llama 3 Lumimaid 70B
Meta Llama
8K context $3.375/M input tokens $4.5/M output tokens

A large LLM created by combining two fine-tuned Llama 70B models into one 120B model. Combines Xwin and Euryale. Credits to@chargoddard for developing the fr...

Goliath 120B
Alpindale
6K context $9.375/M input tokens $9.375/M output tokens

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coher ...

Nous: Hermes 3 405B Instruct (free)
NousreSearch
128K context $0 input tokens $0 output tokens

WizardLM-2 7B is the smaller variant of Microsoft AI's latest Wizard model. It is the fastest and achieves comparable performance with existing 10x larger opensource leading models It is a finetune ...

WizardLM-2 7B
Microsoft Azure
31.25K context $0.055/M input tokens $0.055/M output tokens

Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including:Code generation Text generation Text editing Problem solving...

Google: Gemini Pro 1.5
Google
1.91M context $1.25/M input tokens $5/M output tokens $0.003/M image tokens

DBRX is a new open source large language model developed by Databricks. At 132B, it outperforms existing open source LLMs like Llama 2 70B and Mixtral-8x7b on standard indu ...

Databricks: DBRX 132B Instruct
Databricks
32K context $1.08/M input tokens $1.08/M output tokens

Euryale 70B v2.1 is a model focused on creative roleplay from Sao10k.Better prompt adherence. Better anatomy / spatial awareness. Adapts much better to unique and...

Llama 3 Euryale 70B v2.1
Rifx.Online
8K context $0.35/M input tokens $0.4/M output tokens

A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length. *Mistral 7B Instruct has multiple version variants, and this is intended to be the latest ...

Mistral: Mistral 7B Instruct
MistralAI
32K context $0.055/M input tokens $0.055/M output tokens

Command is an instruction-following conversational model that performs language tasks with high quality, more reliably and with a longer context than our base generative models. Use of this model is ...

Cohere: Command
Cohere
4K context $0.95/M input tokens $1.9/M output tokens

Command-R is a 35B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows ...

Cohere: Command R
Cohere
125K context $0.475/M input tokens $1.425/M output tokens
FREE

Qwen2 7B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning. It features SwiGLU activation, attention QKV bias, and gro ...

Qwen 2 7B Instruct (free)
Qwen
32K context $0 input tokens $0 output tokens

Gemma 2 27B by Google is an open model built from the same research and technology used to create the Gemini models. Gemma models are well-suited for a variety of text generation ...

Google: Gemma 2 27B
Google
8K context $0.27/M input tokens $0.27/M output tokens

From the maker of Goliath, Magnum 72B is the first in a new family of models designed to achieve the prose quality of the Claude 3 models, notably Opus ...

Magnum 72B
Alpindale
16K context $3.75/M input tokens $4.5/M output tokens

A 7.3B parameter Mamba-based model designed for code and reasoning tasks.Linear time inference, allowing for theoretically infinite sequence lengths 256k token context window Optimized for qu...

Mistral: Codestral Mamba
MistralAI
250K context $0.25/M input tokens $0.25/M output tokens

A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chi ...

Mistral: Mistral Nemo
MistralAI
125K context $0.13/M input tokens $0.13/M output tokens

Qwen2 7B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning. It features SwiGLU activation, attention QKV bias, and gro ...

Qwen 2 7B Instruct
Qwen
32K context $0.054/M input tokens $0.054/M output tokens

The first image to text model from Mistral AI. Its weight was launched via torrent per their tradition: https://x.com/mistralai/status/1833758285167722836 ...

Mistral: Pixtral 12B
MistralAI
4K context $0.1/M input tokens $0.1/M output tokens $0.144/K image tokens

Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a ...

Phi-3.5 Mini 128K Instruct
Microsoft Azure
125K context $0.1/M input tokens $0.1/M output tokens

Dynamic model continuously updated to the current version of GPT-4o in ChatGPT. Intended for research and evaluation. Note: This model is currently experimental and not suitable fo ...

OpenAI: ChatGPT-4o
OpenAI
125K context $5/M input tokens $15/M output tokens $0.007/M image tokens

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This is the base 405B pre-trained version. It has demonstrated strong performance compared to leading closed-sour ...

Meta: Llama 3.1 405B (base)
Meta Llama
128K context $2/M input tokens $2/M output tokens
FREE

Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including:Code generation Text generation Text editing Problem solving...

Google: Gemini Pro 1.5 Experimental
Google
1.91M context $0 input tokens $0 output tokens $0.003/M image tokens

Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding. See the launch announcement and benchmark result ...

Anthropic: Claude 3 Opus
Anthropic
195.31K context $15/M input tokens $75/M output tokens $0.024/M image tokens

Claude 3 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments. See the launch announcement and ...

Anthropic: Claude 3 Sonnet
Anthropic
195.31K context $3/M input tokens $15/M output tokens $0.005/M image tokens

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results [here](https: ...

Anthropic: Claude 3 Haiku
Anthropic
195.31K context $0.25/M input tokens $1.25/M output tokens $0.4/K image tokens

Jamba 1.5 Large is part of AI21's new family of open models, offering superior speed, efficiency, and quality. It features a 256K effective context window, the longest among open models, enabling im ...

AI21: Jamba 1.5 Large
Ai21
250K context $2/M input tokens $8/M output tokens

Euryale L3.1 70B v2.2 is a model focused on creative roleplay from Sao10k. It is the successor of Euryale L3 70B v2.1. ...

Llama 3.1 Euryale 70B v2.2
Rifx.Online
8K context $0.35/M input tokens $0.4/M output tokens

Jamba 1.5 Mini is the world's first production-grade Mamba-based model, combining SSM and Transformer architectures for a 256K context window and high efficiency. It works with 9 languages and can h ...

AI21: Jamba 1.5 Mini
Ai21
250K context $0.2/M input tokens $0.4/M output tokens

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning ...

Nous: Hermes 3 70B Instruct
NousreSearch
128K context $0.4/M input tokens $0.4/M output tokens

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context cohere ...

Nous: Hermes 3 405B Instruct
NousreSearch
128K context $1.79/M input tokens $2.49/M output tokens
FREE

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answ ...

Meta: Llama 3.2 11B Vision Instruct (free)
Meta Llama
128K context $0 input tokens $0 output tokens $0.079/K image tokens

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answ ...

Meta: Llama 3.2 11B Vision Instruct
Meta Llama
128K context $0.055/M input tokens $0.055/M output tokens $0.079/K image tokens

Lumimaid v0.2 8B is a finetune of Llama 3.1 8B with a "HUGE step up dataset wise" compared to Lumimaid v0.1. Sloppy chats output were purged. Usage of this model ...

Lumimaid v0.2 8B
Meta Llama
128K context $0.188/M input tokens $1.125/M output tokens

GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more afford ...

OpenAI: GPT-4o-mini
OpenAI
125K context $0.15/M input tokens $0.6/M output tokens $0.007/M image tokens

Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines For emotional intelligence similar to Pi, ...

Inflection: Inflection 3 Productivity
Inflection
7.81K context $2.5/M input tokens $10/M output tokens

Liquid's 40.3B Mixture of Experts (MoE) model. Liquid Foundation Models (LFMs) are large neural networks built with computational units rooted in dynamic systems. LFMs are general-purpose AI models ...

Liquid: LFM 40B MoE (free)
Liquid
8K context $0 input tokens $0 output tokens

Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2:Significantly more knowledge and has greatly improved capabilities in coding an...

Qwen2.5 7B Instruct
Qwen
128K context $0.27/M input tokens $0.27/M output tokens

Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported:Expanded vocabulary with unique and expressive word choices Enhanced creativity for vivid narrati...

Rocinante 12B
Thedrummer
32K context $0.25/M input tokens $0.5/M output tokens

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with ...

Meta: Llama 3.2 3B Instruct
Meta Llama
128K context $0.03/M input tokens $0.05/M output tokens
FREE

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with ...

Meta: Llama 3.2 3B Instruct (free)
Meta Llama
128K context $0 input tokens $0 output tokens

Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2:Significantly more knowledge and has greatly improved capabilities in coding a...

Qwen2.5 72B Instruct
Qwen
128K context $0.35/M input tokens $0.4/M output tokens

The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image caption ...

Meta: Llama 3.2 90B Vision Instruct
Meta Llama
128K context $0.35/M input tokens $0.4/M output tokens $0.506/K image tokens

Experimental release (November 21st, 2024) of Gemini. ...

Google: Gemini Experimental 1121 (free)
Rifx.Online
8K context $0 input tokens $0 output tokens

An experimental version of Gemini 1.5 Pro from Google. ...

Google: LearnLM 1.5 Pro Experimental (free)
Rifx.Online
8K context $0 input tokens $0 output tokens

Mistral Large 2 2411 is an update of Mistral Large 2 released together with Pixtral Large 2411 It is fluent in English, Fren ...

Mistral Large 2411
Rifx.Online
125K context $2/M input tokens $6/M output tokens

ERNIE Bot Overview Key Capabilities and Use Cases:Engages in interactive dialogues, answers questions, and assists with creative tasks. Facilitates efficient information ret...

ERNIE-Bot-4.0
Ernie bot 4.0
8K context $16.44/M input tokens $16.44/M output tokens

Developer/Company: Baidu Overview: ERNIE Bot Turbo is an enhanced version of ERNIE Bot, offering expanded capabilities with support for 7K input + 1K output. It includes system ...

ERNIE-Bot-turbo
Ernie
8K context $1.65/M input tokens $1.65/M output tokens

Developer/Company: Baidu Research Key Capabilities & Use Cases: ERNIE-4.0-8K is valuable in natural language processing (NLP), applicable to search engines, intelligent custome ...

ERNIE-4.0-8K
Ernie
8K context $5.48/M input tokens $16.44/M output tokens

Basic Information The "GLM-4-AIRX" is an advanced large language model developed by experts in the field of artificial intelligence. It is renowned for its powerful natural language ...

GLM-4 AirX
ChatGLM
7.81K context $1.4/M input tokens $1.4/M output tokens

GLM-4-Flash Model Introduction Key Capabilities and Primary Use CasesHandles multi-turn dialogues, web searches, and tool calls. Supports long text inference with a context...

glm-4-flash
ChatGLM
125K context $0.01/M input tokens $0.01/M output tokens

GLM-4V-Plus Model Introduction Key Capabilities and Primary Use CasesMultimodal Understanding: Excels in image and video understanding, including temporal sequence analys...

glm-4v-plus
ChatGLM
31.25K context $1.4/M input tokens $1.4/M output tokens

GLM-4-Plus Model Introduction Key Capabilities and Primary Use CasesLanguage Understanding: Advanced capabilities in language comprehension, instruction following, and lo...

glm-4-plus
ChatGLM
125K context $7/M input tokens $7/M output tokens

GLM-4 Long GLM-4 Long is a state-of-the-art language model designed for extended context processing, making it ideal for applications requiring comprehensive text analysis and genera ...

GLM-4 Long
ChatGLM
976.56K context $0.14/M input tokens $0.14/M output tokens

GLM-4 Air Model Introduction Key Capabilities and Primary Use CasesMultilingual Support: Primarily aligned for Chinese and English, with additional support for 24 languag...

GLM-4 Air
ChatGLM
125K context $0.14/M input tokens $0.14/M output tokens

Inferor is a merge of top roleplay models, expert on immersive narratives and storytelling. This model was merged using the Model Stock merge method ...

Mistral Nemo Inferor 12B
Infermatic
31.25K context $0.25/M input tokens $0.5/M output tokens

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:Signifi...

Qwen2.5 Coder 32B Instruct
Qwen
32K context $0.18/M input tokens $0.18/M output tokens

SorcererLM is an advanced RP and storytelling model, built as a Low-rank 16-bit LoRA fine-tuned on WizardLM-2-8x22B.Advanced reasoning and emotional intelligence for engaging and im...

Sorcererlm 8x22b
Raifle
15.63K context $4.5/M input tokens $4.5/M output tokens

A roleplaying/storywriting specialist model, full-parameter finetune of Qwen2.5-32B on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it ...

Eva Qwen2.5 32B
Eva unit 01
31.25K context $0.5/M input tokens $0.5/M output tokens

UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scenarios. ...

Unslopnemo 12b
Thedrummer
31.25K context $0.5/M input tokens $0.5/M output tokens

Lumimaid v0.2 70B is a finetune of Llama 3.1 70B with a "HUGE step up dataset wise" compared to Lumimaid v0.1. Sloppy chats output were purged. Us ...

Lumimaid v0.2 70B
Neversleep
128K context $3.375/M input tokens $4.5/M output tokens

This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet and Opus. The model is fine-tuned on top of [Qwen2.5 72B]. ...

Magnum v4 72B
Anthracite org
32K context $1.875/M input tokens $2.25/M output tokens

Grok Beta is xAI's experimental language model with state-of-the-art reasoning capabilities, best for complex and multi-step use cases. It is the successor of [Grok 2](https://x.ai/blo ...

xAI: Grok Beta
X ai
128K context $5/M input tokens $15/M output tokens

Ministral 3B is a 3B parameter model optimized for on-device and edge computing. It excels in knowledge, commonsense reasoning, and function-calling, outperforming larger models like Mi ...

Ministral 3B
Mistralai
125K context $0.04/M input tokens $0.04/M output tokens

Ministral 8B is an 8B parameter model featuring a unique interleaved sliding-window attention pattern for faster, memory-efficient inference. Designed for edge use cases, it supports up ...

Ministral 8B
Mistralai
125K context $0.1/M input tokens $0.1/M output tokens

NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging Llama 3.1 70B architect ...

Nvidia: Llama 3.1 Nemotron 70B Instruct
Nvidia
128K context $0.35/M input tokens $0.4/M output tokens

Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines For emotional intelligence s ...

Inflection: Inflection 3 Productivity
Inflection
7.81K context $2.5/M input tokens $10/M output tokens

A model specializing in RP and creative writing, this model is based on Qwen2.5-14B, fine-tuned with a mixture of synthetic and natural data. It is trained on 1.5M tokens of role-play ...

EVA Qwen2.5 14B
Eva unit 01
32K context $0.25/M input tokens $0.5/M output tokens

Liquid's 40.3B Mixture of Experts (MoE) model. Liquid Foundation Models (LFMs) are large neural networks built with computational units rooted in dynamic systems. LFMs are general-purp ...

Liquid: LFM 40B MoE
Liquid
32K context $1/M input tokens $2/M output tokens

From the maker of Goliath, Magnum 72B is the seventh in a family of models designed to achieve the prose quality of the Claude 3 models, ...

Magnum v2 72B
Anthracite org
32K context $3.75/M input tokens $4.5/M output tokens

Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported:Expanded vocabulary with unique and expressive word choices Enhanced creativity for...

Rocinante 12B
Thedrummer
32K context $0.25/M input tokens $0.5/M output tokens

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual ...

Meta: Llama 3.2 11B Vision Instruct
Meta llama
128K context $0.055/M input tokens $0.055/M output tokens $0.079/K image tokens

Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its small ...

Meta: Llama 3.2 1B Instruct
Meta llama
128K context $0.01/M input tokens $0.02/M output tokens

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. ...

Meta: Llama 3.2 3B Instruct
Meta llama
128K context $0.03/M input tokens $0.05/M output tokens

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. ...

Meta: Llama 3.2 3B Instruct (free)
Rifx.Online
4K context $0 input tokens $0 output tokens

The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in ...

Meta: Llama 3.2 90B Vision Instruct
Meta llama
128K context $0.35/M input tokens $0.4/M output tokens $0.506/K image tokens

The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in ...

Meta: Llama 3.2 90B Vision Instruct (free)
Rifx.Online
4K context $0 input tokens $0 output tokens

The first image to text model from Mistral AI. Its weight was launched via torrent per their tradition: https://x.com/mistralai/status/1833758285167722836 ...

Mistral: Pixtral 12B
Mistralai
4K context $0.1/M input tokens $0.1/M output tokens $0.144/K image tokens

command-r-plus-08-2024 is an update of the Command R+ with roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version ...

Cohere: Command R+ (08-2024)
Cohere
125K context $2.375/M input tokens $9.5/M output tokens

command-r-08-2024 is an update of the Command R with improved performance for multilingual retrieval-augmented generation (RAG) and tool use. More broadly, it is be ...

Cohere: Command R (08-2024)
Cohere
125K context $0.143/M input tokens $0.57/M output tokens

Gemini 1.5 Flash 8B Experimental is an experimental, 8B parameter version of the Gemini 1.5 Flash model. Usage of Gemini is subject to Google's [Gemini Term ...

Google: Gemini Flash 8B 1.5 Experimental
Google
976.56K context $0 input tokens $0 output tokens

Euryale L3.1 70B v2.2 is a model focused on creative roleplay from Sao10k. It is the successor of Euryale L3 70B v2.1. ...

Llama 3.1 Euryale 70B v2.2
Sao10k
8K context $0.35/M input tokens $0.4/M output tokens

Qwen2 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements:SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art...

Qwen2-VL 7B Instruct
Qwen
32K context $0.1/M input tokens $0.1/M output tokens $0.144/K image tokens

Jamba 1.5 Large is part of AI21's new family of open models, offering superior speed, efficiency, and quality. It features a 256K effective context window, the longest among open model ...

AI21: Jamba 1.5 Large
Ai21
250K context $2/M input tokens $8/M output tokens

Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available website ...

Phi-3.5 Mini 128K Instruct
Microsoft
125K context $0.1/M input tokens $0.1/M output tokens

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplay ...

Nous: Hermes 3 70B Instruct
Nousresearch
128K context $0.4/M input tokens $0.4/M output tokens

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long ...

Nous: Hermes 3 405B Instruct
Nousresearch
128K context $1.79/M input tokens $2.49/M output tokens

Dynamic model continuously updated to the current version of GPT-4o in ChatGPT. Intended for research and evaluation. Note: This model is currently experimental and n ...

OpenAI: ChatGPT-4o
Openai
125K context $5/M input tokens $15/M output tokens $0.007/M image tokens

Llama 3.1 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. The model is built upon the Llama 3.1 405B and h ...

Perplexity: Llama 3.1 Sonar 405B Online
Perplexity
124.09K context $5/M input tokens $5/M output tokens $0.005/M request tokens

Starcannon 12B is a creative roleplay and story writing model, using nothingiisreal/mn-celeste-12b as a base and [intervitens/mini ...

Mistral Nemo 12B Starcannon
Aetherwiing
11.72K context $2/M input tokens $2/M output tokens

Llama 3.1 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. This is the online version of the [offline chat ...

Perplexity: Llama 3.1 Sonar 70B Online
Perplexity
124.09K context $1/M input tokens $1/M output tokens $0.005/M request tokens

Llama 3.1 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. This is the online version of the [offline chat ...

Perplexity: Llama 3.1 Sonar 8B Online
Perplexity
124.09K context $0.2/M input tokens $0.2/M output tokens $0.005/M request tokens

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrate ...

Meta: Llama 3.1 70B Instruct (free)
Rifx.Online
8K context $0 input tokens $0 output tokens

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compar ...

Meta: Llama 3.1 8B Instruct
Meta llama
128K context $0.055/M input tokens $0.055/M output tokens

A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, P ...

Mistral: Mistral Nemo
Mistralai
125K context $0.13/M input tokens $0.13/M output tokens

Gemma 2 27B by Google is an open model built from the same research and technology used to create the Gemini models. Gemma models are well-suited for a variety of t ...

Google: Gemma 2 27B
Google
8K context $0.27/M input tokens $0.27/M output tokens

Dolphin 2.9 is designed for instruction following, conversational, and coding. This model is a finetune of Mixtral 8x22B Instruct. It features a 64k ...

Dolphin 2.9.2 Mixtral 8x22B 🐬
Cognitivecomputations
64K context $0.9/M input tokens $0.9/M output tokens

A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length. *Mistral 7B Instruct has multiple version variants, and this is intended to ...

Mistral: Mistral 7B Instruct
Mistralai
32K context $0.055/M input tokens $0.055/M output tokens

A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length. *Mistral 7B Instruct has multiple version variants, and this is intended to ...

Mistral: Mistral 7B Instruct (free)
Rifx.Online
8K context $0 input tokens $0 output tokens

Phi-3 Mini is a powerful 3.8B parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference ...

Phi-3 Mini 128K Instruct
Microsoft
125K context $0.1/M input tokens $0.1/M output tokens

Phi-3 128K Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning a ...

Phi-3 Medium 128K Instruct
Microsoft
125K context $1/M input tokens $1/M output tokens

Mistral's official instruct fine-tuned version of Mixtral 8x22B. It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its siz ...

Mistral: Mixtral 8x22B Instruct
Mistralai
64K context $0.9/M input tokens $0.9/M output tokens

WizardLM-2 7B is the smaller variant of Microsoft AI's latest Wizard model. It is the fastest and achieves comparable performance with existing 10x larger opensource leading models It ...

WizardLM-2 7B
Microsoft
31.25K context $0.055/M input tokens $0.055/M output tokens

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all ...

WizardLM-2 8x22B
Microsoft
64K context $0.5/M input tokens $0.5/M output tokens

Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including:Code generation Text generation Text editing Prob...

Google: Gemini Pro 1.5
Google
1.91M context $1.25/M input tokens $5/M output tokens $0.003/M image tokens

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results ...

Anthropic: Claude 3 Haiku
Anthropic
195.31K context $0.25/M input tokens $1.25/M output tokens $0.4/K image tokens

Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding. See the launch announcement and be ...

Anthropic: Claude 3 Opus
Anthropic
195.31K context $15/M input tokens $75/M output tokens $0.024/M image tokens

This model is currently powered by Mistral-7B-v0.2, and incorporates a "better" fine-tuning than Mistral 7B, inspired by community work. It's best ...

Mistral Tiny
Mistralai
31.25K context $0.25/M input tokens $0.25/M output tokens

This is a 16k context fine-tune of Mixtral-8x7b. It excels in coding tasks due to extensive training with coding data and is known for its obedience, although ...

Dolphin 2.6 Mixtral 8x7B 🐬
Cognitivecomputations
32K context $0.5/M input tokens $0.5/M output tokens

Google's flagship multimodal model, supporting image and video in text or chat prompts for a text or code response. See the benchmarks and prompting guidelines from [Deepmind](https:// ...

Google: Gemini Pro Vision 1.0
Google
16K context $0.5/M input tokens $1.5/M output tokens $0.003/M image tokens

A pretrained generative Sparse Mixture of Experts, by Mistral AI. Incorporates 8 experts (feed-forward networks) for a total of 47B parameters. Base model (not fine-tuned for instructio ...

Mixtral 8x7B (base)
Mistralai
32K context $0.54/M input tokens $0.54/M output tokens

OpenChat 7B is a library of open-source language models, fine-tuned with "C-RLFT (Conditioned Reinforcement Learning Fine-Tuning)" - a strategy inspired by offline reinforcement learnin ...

OpenChat 3.5 7B
Openchat
8K context $0.055/M input tokens $0.055/M output tokens

OpenChat 7B is a library of open-source language models, fine-tuned with "C-RLFT (Conditioned Reinforcement Learning Fine-Tuning)" - a strategy inspired by offline reinforcement learnin ...

OpenChat 3.5 7B (free)
Rifx.Online
8K context $0 input tokens $0 output tokens

A Mythomax/MLewd_13B-style merge of selected 70B models. A multi-model merge of several LLaMA2 70B finetunes for roleplaying and creative work. The goal was to create a model that combi ...

lzlv 70B
Lizpreciatior
4K context $0.35/M input tokens $0.4/M output tokens

A wild 7B parameter model that merges several models using the new task_arithmetic merge method from mergekit. List of merged models:NousResearch/Nous-Capybara-7B-V1.9 [HuggingFace...

Toppy M 7B
Undi95
4K context $0.07/M input tokens $0.07/M output tokens

A wild 7B parameter model that merges several models using the new task_arithmetic merge method from mergekit. List of merged models:NousResearch/Nous-Capybara-7B-V1.9 [HuggingFace...

Toppy M 7B (free)
Rifx.Online
4K context $0 input tokens $0 output tokens

PaLM 2 is a language model by Google with improved multilingual, reasoning and coding capabilities. ...

Google: PaLM 2 Chat 32k
Google
31.99K context $1/M input tokens $2/M output tokens

PaLM 2 fine-tuned for chatbot conversations that help with code-related questions. ...

Google: PaLM 2 Code Chat 32k
Google
31.99K context $1/M input tokens $2/M output tokens

This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations. Training data: up to Sep 2021. ...

OpenAI: GPT-3.5 Turbo Instruct
Openai
4K context $1.5/M input tokens $2/M output tokens

A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge ...

ReMM SLERP 13B
Undi95
4K context $1.125/M input tokens $1.125/M output tokens

A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge _These are extended-context endpoints for ReMM SLERP 13B. They may have ...

ReMM SLERP 13B (extended)
Undi95
6K context $1.125/M input tokens $1.125/M output tokens

One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge ...

MythoMax 13B
Gryphe
4K context $0.1/M input tokens $0.1/M output tokens