Unsloth Direct Preference Optimization (DPO)
· Batch Import
Description
Memory-efficient Direct Preference Optimization for aligning language models with human preferences using paired chosen/rejected data, without requiring a separate reference model. Optimized for low VRAM environments with FP8 support.
Repository
https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/unsloth-dpo
View on GitHub