Llama-3 Fine-Tuning Achieves 90% of GPT-4's Performance at Lower Cost
The success of Llama-3 has been remarkable, showcasing that open-source models are closing the gap with their closed-source counterparts, according to together.ai. By leveraging proprietary data, customers have been able to fine-tune smaller open-source software (OSS) models like Llama-3 to achieve higher accuracy than top-tier closed-source models.
Fine-Tuning Process
Together AI's platform allows users to fine-tune Llama-3-8B on proprietary data, creating custom models that outperform larger OSS alternatives like Llama-3-70B and are comparable to leading closed-source models like GPT-4, all at a fraction of the cost. A detailed guide demonstrates how a fine-tuned Llama-3 8B model improved from 47% accuracy to 65%, surpassing Llama-3-70B's 64% and nearing GPT-4's 71% accuracy.
The fine-tuning process involves several steps, including dataset transformation, uploading and verifying datasets, starting a fine-tuning job, and running evaluations to compare the results. The initial step requires downloading the Math Instruct dataset from HuggingFace, cleaning it up, and transforming it into a JSONL file format suitable for Together's platform.
Dataset Transformation
The transformation process involves loading the original JSON data, defining the Llama-3 prompt format, and converting the data into the correct format. This formatted dataset is then validated using Together's SDK before being uploaded for fine-tuning.
Uploading and Fine-Tuning
Once the dataset is prepared, it is uploaded to Together AI via the Python SDK. The fine-tuning job is then created using the Llama-3-8B base model, specifying the dataset, number of epochs, and other parameters. Users can monitor the fine-tuning job through Together AI's dashboard.
Evaluation and Results
After fine-tuning, the model's performance is evaluated using 1000 math problems. The fine-tuned Llama-3-8B model's accuracy is compared to the base Llama-3-8B, Llama-3-70B, and GPT-4. The fine-tuned model achieved a 65.2% accuracy, outperforming the base model's 47.2% and Llama-3-70B's 64.2%, and coming close to GPT-4's 71.4% accuracy.
The results indicate that the fine-tuned Llama-3-8B model outperformed the base model by nearly 20%, surpassed the top OSS model Llama-3-70B, and achieved over 90% of GPT-4's accuracy. Additionally, the fine-tuned model is faster, 50 times cheaper than GPT-4, and offers full ownership of the model and weights.
Conclusion
This fine-tuning approach demonstrates that small open-source models like Llama-3-8B can be customized to perform specific tasks with high accuracy, speed, and cost-efficiency. Users can leverage their proprietary data to fine-tune a model and either host it on Together AI or run it independently, maintaining full control and ownership.
The Llama-3-8B model trained on math problems outperformed leading OSS models and approached GPT-4's performance, with a total fine-tuning cost of less than $100 on Together AI.