Fine-Tuning LLM with DPO: Over 17,000 Downloads on Hugging Face

The model ranked at the time of publishing on rank 15 on the global Hugging Face leaderboard.

Let’s Work Together for Development

Call us directly, submit a sample or email us!

Business Address
Schellenberg 55C
26133 Oldenburg
Germany
Contact Us
Call us: (+49) 176-41555143
mazen@mozaic-ai-solutions.com
Working Time
Mon - Sat: 8.00am - 18.00pm
Holiday : Closed

The Story

We set out to experiment with an innovative approach: merging top-tier Direct Preference Optimization (DPO) datasets with a blend of DPO-optimized and non-DPO-optimized models to see if this hybrid approach could push performance boundaries in complex reasoning and language tasks.

Link to the model: https://huggingface.co/MozaicAI/Mozaic-7B

What Did MozaicAI Do

MozaicAI’s goal was to push the boundaries of performance by training a single model with Direct Preference Optimization (DPO). By selecting a high-quality DPO dataset and implementing a refined chat template, we aimed to improve the model’s response quality and reasoning ability. This streamlined approach focused on optimizing one model through precise data filtering and structured formatting, achieving impressive benchmark results and strong community adoption.

Model Selection and Dataset Curation

MozaicAI selected the /Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp model as the foundation and curated a high-quality dataset from /argilla/distilabel-intel-orca-dpo-pairs, containing ~3000 samples. Rigorous filtering was applied: only samples with a chosen_score of 8 or higher were included, excluding any entries with ties or data from the GSM8k training set.

Chat Template Integration

To ensure smooth, structured interactions, MozaicAI embedded the ChatML template directly into the model’s tokenizer, aligning perfectly with OpenHermes2.5’s chat format. This adjustment enabled seamless model responses and an optimized conversational flow.

The Results

With over 17k downloads on Hugging Face, the model quickly gained recognition. At the time of publishing, it ranked 15th on the Open LLM leaderboard for global LLMs with 7 billion parameters.

Benchmark results further highlighted its capabilities:

  • Average Score: 71.71
  • AI2 Reasoning Challenge (25-Shot): 68.94
  • HellaSwag (10-Shot): 86.45
  • MMLU (5-Shot): 63.97
  • TruthfulQA (0-Shot): 64.01
  • Winogrande (5-Shot): 79.95
  • GSM8k (5-Shot): 66.94
x

Contact MozaicAI for personalized AI solutions—our address, mobile number, and working hours are listed below for your convenience. We’re here to help you integrate innovative AI into your processes.

Business Address
Schellenberg 55C
26133, Oldenburg
Germany
Contact Us
Call: (+49) 176-41555143
E-mail: mazen@mozaic-ai-solutions.com
Working Time
Mon - Sat: 8.00am - 18.00pm
Holiday : Closed
Select the fields to be shown. Others will be hidden. Drag and drop to rearrange the order.
  • Image
  • SKU
  • Rating
  • Price
  • Stock
  • Availability
  • Add to cart
  • Description
  • Content
  • Weight
  • Dimensions
  • Additional information
Click outside to hide the comparison bar
Compare