Fine-Tuning LLM with DPO: Over 17,000 Downloads on Hugging Face
The model ranked at the time of publishing on rank 15 on the global Hugging Face leaderboard.
- MozaicAI Homepage
- Case Studies
- Fine-Tuning LLM with DPO: Over 17,000 Downloads on Hugging Face
Let’s Work Together for Development
Call us directly, submit a sample or email us!
Business Address
26133 Oldenburg
Germany
Contact Us
mazen@mozaic-ai-solutions.com
Working Time
Holiday : Closed
The Story
We set out to experiment with an innovative approach: merging top-tier Direct Preference Optimization (DPO) datasets with a blend of DPO-optimized and non-DPO-optimized models to see if this hybrid approach could push performance boundaries in complex reasoning and language tasks.
Link to the model: https://huggingface.co/MozaicAI/Mozaic-7B
What Did MozaicAI Do
MozaicAI’s goal was to push the boundaries of performance by training a single model with Direct Preference Optimization (DPO). By selecting a high-quality DPO dataset and implementing a refined chat template, we aimed to improve the model’s response quality and reasoning ability. This streamlined approach focused on optimizing one model through precise data filtering and structured formatting, achieving impressive benchmark results and strong community adoption.
MozaicAI selected the /Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp model as the foundation and curated a high-quality dataset from /argilla/distilabel-intel-orca-dpo-pairs, containing ~3000 samples. Rigorous filtering was applied: only samples with a chosen_score of 8 or higher were included, excluding any entries with ties or data from the GSM8k training set.
To ensure smooth, structured interactions, MozaicAI embedded the ChatML template directly into the model’s tokenizer, aligning perfectly with OpenHermes2.5’s chat format. This adjustment enabled seamless model responses and an optimized conversational flow.
The Results
With over 17k downloads on Hugging Face, the model quickly gained recognition. At the time of publishing, it ranked 15th on the Open LLM leaderboard for global LLMs with 7 billion parameters.
Benchmark results further highlighted its capabilities:
- Average Score: 71.71
- AI2 Reasoning Challenge (25-Shot): 68.94
- HellaSwag (10-Shot): 86.45
- MMLU (5-Shot): 63.97
- TruthfulQA (0-Shot): 64.01
- Winogrande (5-Shot): 79.95
- GSM8k (5-Shot): 66.94