The token limits include the input + output tokens. See the pricing here.

ModelTierRPM ¹RPD ²TPM ³TPD ⁴
Llama 3.1 8B”Free”6028,80040k5M
Llama 3.1 8B”Infinite”
Llama 3.1 70B”Free”6028,80040k5M
Llama 3.1 70B”Infinite”
Llama 3.1 405B”Free”126,00040k1M
Llama 3.1 405B”Infinite”

(1) RPM: Request Per Minute
(2) RPD: Request Per Day
(3) TPM: Tokens Per Minute
(4) TPD: Tokens Per Day

To access “Infinite” tier, please apply here

Rate limit headers

We set the following x-ratelimit headers to inform you on current rate limits applicable to you.

The following headers are set (values are illustrative):

HeaderValueNotes
retry-after2Seconds to wait until retrying*
x-ratelimit-limit-requests28800Requests per day allowed
x-ratelimit-limit-tokens40000Tokens per minute allowed
x-ratelimit-remaining-requests123Requests remaining for the day
x-ratelimit-remaining-tokens1337Tokens remaining for this minute
x-ratelimit-reset-requests1337sSeconds until the daily rate limit resets
x-ratelimit-reset-tokens1sSeconds until the minute based token limit resets

* The retry-after header is only returned if the response status code is 429 and the request was rate limited