For LLM apps developers
Models
Large language models
The chat API currently supports these models:
Model | Full model name | Supported context length |
---|---|---|
llama3.1 | neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 | 8192 tokens |
llama3.1:70b | neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 | 128k (131 072) tokens |
llama3.1:405b | neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w4a16 | 128k (131 072) tokens |
mistral-nemo | neuralmagic/Mistral-Nemo-Instruct-2407-quantized.w4a16 | 128k (128 000) tokens |
Embedding models
The embeddings API currently supports these models:
Model | Full model name | Supported context length | Output dimensions |
---|---|---|---|
gte-large-en-v1.5 | Alibaba-NLP/gte-large-en-v1.5 | 8192 tokens | 1024 |
Image generation models
The image generation and editing API currently supports these models:
Model name | Supported features |
---|---|
stabilityai/stable-diffusion-xl-base-1.0 | Text2Image, Image2Image |