Text Generation

Generate text based on a prompt.

If you are interested in a Chat Completion task, which generates a response based on a list of messages, check out the chat-completion task.

For more details about the text-generation task, check out its dedicated page! You will find examples and related materials.

Recommended models

google/gemma-2-2b-it: A text-generation model trained to follow instructions.
bigcode/starcoder: A code generation model that can generate code in 80+ languages.
meta-llama/Meta-Llama-3.1-8B-Instruct: Very powerful text generation model trained to follow instructions.
microsoft/Phi-3-mini-4k-instruct: Small yet powerful text generation model.
HuggingFaceH4/starchat2-15b-v0.1: Strong coding assistant model.
mistralai/Mistral-Nemo-Instruct-2407: Very strong open-source large language model.

This is only a subset of the supported models. Find the model that suits you best here.

Using the API

Python

JavaScript

cURL

API specification

Request

Payload
inputs*	string
parameters	object
best_of	integer
decoder_input_details	boolean
details	boolean
do_sample	boolean
frequency_penalty	number
grammar	unknown	One of the following:
(#1)
type*	enum	Possible values: json.
value*	unknown	A string that represents a JSON Schema. JSON Schema is a declarative language that allows to annotate JSON documents with types and descriptions.
(#2)
type*	enum	Possible values: regex.
value*	string
max_new_tokens	integer
repetition_penalty	number
return_full_text	boolean
seed	integer
stop	string[]
temperature	number
top_k	integer
top_n_tokens	integer
top_p	number
truncate	integer
typical_p	number
watermark	boolean
stream	boolean

Some options can be configured by passing headers to the Inference API. Here are the available headers:

Headers
authorization	string	Authentication header in the form `'Bearer: hf_**'` when `hf_**` is a personal user access token with Inference API permission. You can generate one from your settings page.
x-use-cache	boolean, default to `true`	There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching here.
x-wait-for-model	boolean, default to `false`	If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability here.

For more information about Inference API headers, check out the parameters guide.

Response

Output type depends on the stream input parameter. If stream is false (default), the response will be a JSON object with the following fields:

Body
details	object
best_of_sequences	object[]
finish_reason	enum	Possible values: length, eos_token, stop_sequence.
generated_text	string
generated_tokens	integer
prefill	object[]
id	integer
logprob	number
text	string
seed	integer
tokens	object[]
id	integer
logprob	number
special	boolean
text	string
top_tokens	array[]
id	integer
logprob	number
special	boolean
text	string
finish_reason	enum	Possible values: length, eos_token, stop_sequence.
generated_tokens	integer
prefill	object[]
id	integer
logprob	number
text	string
seed	integer
tokens	object[]
id	integer
logprob	number
special	boolean
text	string
top_tokens	array[]
id	integer
logprob	number
special	boolean
text	string
generated_text	string

If stream is true, generated tokens are returned as a stream, using Server-Sent Events (SSE). For more information about streaming, check out this guide.

Body
details	object
finish_reason	enum	Possible values: length, eos_token, stop_sequence.
generated_tokens	integer
seed	integer
generated_text	string
index	integer
token	object
id	integer
logprob	number
special	boolean
text	string
top_tokens	object[]
id	integer
logprob	number
special	boolean
text	string