distil-whisper/distil-large-v3 · Set temperature and prompt possible?

Jun 12

Hi,

is there a way to set the temperature and a prompt?

Something like this:
"temperature": "0.0",
"prompt": "Hello, welcome to my lecture. Today, we will discuss various topics. Let's begin."

I just want to get very precise transcriptions and also make the model always respond with punctuation.

Any suggestions?

Thanks.

jeffuli755 changed discussion status to closed Jun 13

jeffuli755 changed discussion status to open Jun 13

sanchit-gandhi

Whisper Distillation org Jun 13

Hey @jeffuli755 , you can achieve this with the following:

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "distil-whisper/distil-large-v3"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    torch_dtype=torch_dtype,
    device=device,
)

dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = dataset[0]["audio"]

# add a prompt
prompt = "Hello, welcome to my lecture. Today, we will discuss various topics. Let's begin."
prompt_ids = processor.get_prompt_ids(prompt, return_tensors="pt").to(device)
result = pipe(sample.copy(), generate_kwargs={"prompt_ids": prompt_ids})

# change the temperature and enable sampling
result = pipe(sample.copy(), generate_kwargs={"prompt_ids": prompt_ids, "do_sample": True, "temperature": 1.0})