Skills Nest
Back to list

Unsloth Direct Preference Optimization (DPO)

· Batch Import

Description

Memory-efficient Direct Preference Optimization for aligning language models with human preferences using paired chosen/rejected data, without requiring a separate reference model. Optimized for low VRAM environments with FP8 support.

Repository

https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/unsloth-dpo
View on GitHub

Related Tags

Unsloth Direct Preference Optimization (DPO) | Skills Nest | Skills Nest