Theoretic

KTO: Model Alignment as Prospect Theoretic Optimization
一、引言本报告介绍了一种基于前景理论（Prospect Theory）的大型语言模型对齐方法 ——KTO（Kahneman-Tversky Optimization）。该方法通过设计人类感知损失函数（HALO），直接最大化模型生成的效用
ALIGNMENTModelKTOoptimizationTheoretic
admin4月前
330
[NIPS2017] A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning 笔记
文章目录前言Background and Related WorkNeural Fictitious Self-PlayPolicy-Space Response OraclesMeta-Strategy SolversDeep Cogni
笔记GAMETheoreticUnifiedReinforcement
admin4月前
270

KTO: Model Alignment as Prospect Theoretic Optimization