Unsloth Direct Preference Optimization (DPO)

· Batch Import

Description

Memory-efficient Direct Preference Optimization for aligning language models with human preferences using paired chosen/rejected data, without requiring a separate reference model. Optimized for low VRAM environments with FP8 support.

Repository

https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/unsloth-dpo

View on GitHub

Related Tags

Machine Learning Deep Learning Natural Language Processing Automation High Performance