Edit model card

How to use

This repository contains Athena-codegemma-2-9b-v1, for use with transformers and with the original llama codebase.

Use with transformers Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.

Make sure to update your transformers installation via pip install --upgrade transformers.

Best use to test or prompt:

You need to prepare prompt in alpaca format to generate properly:

Basic

f"""Below is an instruction that describes a task. \
    Write a response that appropriately completes the request.

    ### Instruction:
    {x['instruction']}

    ### Input:
    {x['input']}

    ### Response:
    """

Here is example:

def format_test(x):

  if x['input']:
    formatted_text = f"""Below is an instruction that describes a task. \
    Write a response that appropriately completes the request.

    ### Instruction:
    {x['instruction']}

    ### Input:
    {x['input']}

    ### Response:
    """

  else:
    formatted_text = f"""Below is an instruction that describes a task. \
    Write a response that appropriately completes the request.

    ### Instruction:
    {x['instruction']}

    ### Response:
    """

  return formatted_text

# using code_instructions_122k_alpaca dataset
Prompt = format_test(data[155])
print(Prompt)
  • huggingface transformers method:
from transformers import TextStreamer

FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    Prompt
], return_tensors = "pt").to("cuda")

text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 512)
  • unsloth method
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "EpistemeAI/Athena-codegemma-2-9b-v1", # YOUR MODEL YOU USED FOR TRAINING
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

# alpaca_prompt = You MUST copy from above!

inputs = tokenizer(
[
    alpaca_prompt.format(
        "Create a function to calculate the sum of a sequence of integers.", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

--

Inputs and outputs

  • Input: Text string, such as a question, a prompt, or a document to be summarized.
  • Output: Generated English-language text in response to the input, such as an answer to a question, or a summary of a document.

Citation

@article{gemma_2024,
    title={Gemma},
    url={https://www.kaggle.com/m/3301},
    DOI={10.34740/KAGGLE/M/3301},
    publisher={Kaggle},
    author={Gemma Team},
    year={2024}
}

Uploaded model

  • Developed by: EpistemeAI
  • License: apache-2.0
  • Finetuned from model : EpistemeAI/Athena-codegemma-2-9b

This gemma2 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
31
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for EpistemeAI/Athena-codegemma-2-9b-v1

Finetuned
this model
Finetunes
1 model
Quantizations
6 models