All Blog Posts

Picture of Asif Saleem

Fine Tune Llama3 In 5 Minutes

Picture of Asif Saleem

Fine Tune Llama3 In 5 Minutes

image-6

Llama3 is a new and improved language model of Meta AI. It is architected on the foundations of the successful Llama2 family. Llam3 has major improvements such as

  • This introduces a new attention mechanism, grouped-query attention, replacing the earlier multi-head attention.
  • To recognize and understand more words, it has a much larger vocabulary size: 128,256 vs 32,000 words.
  • A different tokenizer (tiktoken) breaks text into subwords differently than the previous sentence-piece tokenizer.
  • Larger intermediate dimensions in the MLP layers for stronger processing.
  • A larger base value for theta in rotary positional embeddings, helps the model understand context and positioning much better.

Together, these changes make Llama3 perform much better against various benchmarks, making it a much improved and efficient language model as compared to the earlier developed Llama2 family.

Download Llama3 from Hugging Face

tune download meta-llama/Meta-Llama-3-8B \
    --output-dir <download_directory> \
    --hf-token <ACCESS TOKEN>

Run on a single GPU

touchtone provides LORA, QLORA, and full fine-tuning LLAMA3-8B on one or more GPUs. For example below command is sufficient to fine-tune LLAMA3-8B on a single GPU device. The below properties apply to tunning

  • Epoch: one
  • Dataset: common instruct dataset
tune run lora_finetune_single_device --config llama3/8B_lora_single_device

To see the full list of recipes and their corresponding configs, simply run tune ls from the command line

we can also add command line overrides as needed for example

tune run lora_finetune_single_device --config llama3/8B_lora_single_device \
checkpointer.checkpoint_dir= \
tokenizer.path=/tokenizer.model \
checkpointer.output_dir=<checkpoint_dir>

this will load the Lamma3-8B checkpoint and tokenizer from the checkpoint directory used in the tune download command above and save the final checkpoint in the same directory following the original format.

Once training is complete, the model checkpoints will be saved and their locations will be logged. For LORA fine-tuning the final checkpoint will contain the merge weights and a copy of just LORA weights which will be saved separately.

If you have multiple GPUs available, you can run a distributed version of the recipe.

tune run --nproc_per_node 2 lora_finetune_distributed --config llama3/8B_lora

Reference and Further Information: https://pytorch.org/torchtune/stable/tutorials/llama3.html

Leave a Reply

Your email address will not be published. Required fields are marked *