Supported models

The API currently supports these models:

Llama3.1 8B

  • Model id: neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8
  • Context length: 8192 tokens

Llama3.1 70B

  • Model id: neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w4a16
  • Context length: 128k (131072) tokens

Llama3.1 405B

  • Model id: neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w4a16
  • Context length: 128k (131072) tokens