LLM Optimization DPO PPO Grpo Slide - Search Videos

DeepSeekMath 7B: Open-Source Math Model Surpasses GPT-4 | Byte Goose AI posted on the topic | LinkedIn

DeepSeekMath 7B: Open-Source Math Model Surpasses GPT-4 | Byte Goose AI posted on the topic | LinkedIn

115 views3 months ago

LLM Reinforcement Learning Fine-Tuning DeepSeek Method GRPO

LLM Reinforcement Learning Fine-Tuning DeepSeek Method GRPO

40 viewsApr 10, 2025

SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks

SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks

129 views4 weeks ago

YouTubeResearch Paper Review

Rethinking Trust Region in LLM Reinforcement Learning PPO Limitations and DPPO for Stable FineTuning

Rethinking Trust Region in LLM Reinforcement Learning PPO Limitations and DPPO for Stable FineTuning

3 views2 months ago

Why Direct Preference Optimization ! Your LLM is Secretly a Reward Model. #ai #llm #researchpaper

Why Direct Preference Optimization ! Your LLM is Secretly a Reward Model. #ai #llm #researchpaper

857 views1 month ago

YouTubeTamil AI Hub

Turn-PPO: LLM 에이전트 멀티턴 강화학습 최적화 및 GRPO 비교 분석

Turn-PPO: LLM 에이전트 멀티턴 강화학습 최적화 및 GRPO 비교 분석

2 views4 months ago

S02E05 — Four Models to Teach One to Behave — PPO

S02E05 — Four Models to Teach One to Behave — PPO

YouTubeAI X-Rayed

SFT vs DPO vs GRPO vs PPO (In 30 Seconds) #LLM #ML #AI

36 views2 months ago

YouTubeNeurons Decoded

DGPO: Fine-Grained Credit for LLM Reasoning Steps

9 views1 week ago

YouTubeAI Research Roundup

#304 DeepSeekMath and RL for LLMs

219 views3 months ago

YouTubeData Science Gems

Is DPO Actually Better? The Shocking Truth About LLM Alignment!

YouTubemind shift

PPO vs DPO in RLHF: What LLM Job Candidates Should Know

[RL Fine-Tuning] From RLHF to GRPO: The Evolution and Optimization of AI LLM Models Alignment.

275 views3 months ago

YouTubeAI Podcast Series. Byte Goose AI.

【DPO】直接偏好优化详细原理推导快速上手实战

7.1K views3 months ago

bilibili东川路第一可爱猫猫虫

Advanced Concepts in Large Language Models. RL / SFT / MHA / GQA / RoPE, RLVR / DPO/ GRPO Arch

Proximal Policy Optimization Explained

78.7K viewsMay 20, 2021

YouTubeEdan Meyer

Let's Code Proximal Policy Optimization

17.6K viewsMay 28, 2021

YouTubeEdan Meyer

Introduction to Proximal Policy Optimization algorithm (PPO)

12.9K viewsMar 31, 2020

YouTubePython Lessons

Proximal Policy Optimization (PPO) is Easy With PyTorch | Full PPO Tutorial

86.5K viewsDec 24, 2020

YouTubeMachine Learning with Phil

PPO | Proximal Policy Optimization (PPO) architecture | PPO Explained

904 viewsJan 29, 2025

YouTubeAILinkDeepTech

RLHF Explained (and DPO!)

18K viewsJun 12, 2024

YouTubeMark Hennings

Deepseek r1 (prepare) - RLHF & PPO & GRPO

809 views11 months ago

YouTube酸果酿

MaPPO: New LLM Preference Optimization

153 views9 months ago

YouTubeAI Research Roundup

Direct Preference Optimization (DPO)

8.7K viewsNov 13, 2023

YouTubeTrelis Research

Proximal Policy Optimization (PPO) Tutorial - Master Roboschool!!!

18.5K viewsNov 12, 2018

YouTubeSkowster the Geek

Fine Tune Llama 3 using ORPO

6.4K viewsApr 21, 2024

YouTubeAI Anytime

PR-453: Direct Preference Optimization

5.3K viewsAug 13, 2023

YouTubeJoonHo LEE

DPO : Direct Preference Optimization

343 viewsJun 20, 2024

YouTubeDhiraj Madan

Direct Preference Optimization: Forget RLHF (PPO)

16.1K viewsJun 6, 2023

YouTubeDiscover AI

GRPO: The Reinforcement Learning Trick That Changed Everything

217 views5 months ago

YouTubemathtartic

See more