AlphaStar: Mastering the Real-Time Strategy Game StarCraft II 博客阅读-369IT编程

admin管理员组
文章数量:1130349

原文：https://deepmind/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii

SL = supervised learning, RL = reinforcement learning

how AlphaStar is trained

units, properties -> DNN -> instructions

DNN: transform torso(relational deep RL), deep LSTM core, auto-regressive policy head with pointer network, centralised value baseline

train: SL -> mico/macro strategies

compete -> hyper parameters updated by RL -> Nash distribution -> final agent

multi-agent RL: play against each other: population-based, multi-agent RL -> huge strategic space -> defeat strongest and eariler ones

explore new build orders, unit compositions, micro-management plans

personal objective: beat specific competitor/beat distribution of competitors/building more of specific unit

NN weights: off-policy actor-critic RL with experience replay, self-imitation learning, policy distillation

run on TPUs, final agent: Nash distribution of the league: best mixture of strategies

how AlphaStar plays and how to evaluate

TLO/MaNa ~ 100 APM

agent ~ 1000, 10000 APM

AlphaStar vs. TLO/MaNa ~280 APM (read screen frames use raw interface)

AlphaStar act: observation -> action: 350ms/avg, process every frame

results: 5:0

转载于:https://wwwblogs/yaoyaohust/p/10815039.html

原文：https://deepmind/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii

SL = supervised learning, RL = reinforcement learning

how AlphaStar is trained

units, properties -> DNN -> instructions

DNN: transform torso(relational deep RL), deep LSTM core, auto-regressive policy head with pointer network, centralised value baseline

train: SL -> mico/macro strategies

compete -> hyper parameters updated by RL -> Nash distribution -> final agent

multi-agent RL: play against each other: population-based, multi-agent RL -> huge strategic space -> defeat strongest and eariler ones

explore new build orders, unit compositions, micro-management plans

personal objective: beat specific competitor/beat distribution of competitors/building more of specific unit

NN weights: off-policy actor-critic RL with experience replay, self-imitation learning, policy distillation

run on TPUs, final agent: Nash distribution of the league: best mixture of strategies

how AlphaStar plays and how to evaluate

TLO/MaNa ~ 100 APM

agent ~ 1000, 10000 APM

AlphaStar vs. TLO/MaNa ~280 APM (read screen frames use raw interface)

AlphaStar act: observation -> action: 350ms/avg, process every frame

results: 5:0

转载于:https://wwwblogs/yaoyaohust/p/10815039.html

本文标签：博客 Real Time AlphaStar Mastering

版权声明：本文标题：AlphaStar: Mastering the Real-Time Strategy Game StarCraft II 博客阅读内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://it.en369.cn/jiaocheng/1754916713a2741357.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

更多相关文章

369IT编程

AlphaStar: Mastering the Real-Time Strategy Game StarCraft II 博客阅读

更多相关文章

310-个人博客搭建（主要为后端搭建）

如何搭建个人博客（详细图解）

博客之星2024年度总评选年度创作历程回顾

搜集整理的一些博客导航

个人博客系统后台登录

Windows电脑搭建Docsify极简个人博客内网穿透一键上线网站

飞鸽原创博客，真正的飞鸽官方博客

成功解决schedule.ScheduleValueError: Invalid time format

BPTT（Backpropagation Through Time）算法

【已解决】RuntimeError: Trying to backward through the graph a second time (or directly access saved tens

RuntimeError: Trying to backward through the graph a second time

A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection(翻译)

异常检测(二)——MVTec AD -A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection

MVTec AD—A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection 2019 CVPR

博客摘录「 Ubuntu Linux平台安装和使用免费反病毒软件ClamAV」2023年4月14日

经典的大牛博客推荐

一些大牛的博客

大牛们的博客

【博客7】缤果Qt5串口蓝牙调试助手V1.0(初级篇)

(细致入微的教程)Anaconda(python)中的虚拟环境系列博客(一)：安装

发表评论

推荐文章

Prompt Learning详解

python制作电脑软件_利用PYTHON制作桌面版爬虫软件（二）

几个不错的免费软件

傻瓜式软件开发工具推荐!无需写代码,拼图式极速制作手机APP

树莓派学习笔记：无线连接笔记本及换国内源

热门文章

mac上的腾讯qq无弹窗无广告

[230501] 4月29日考试真题第一篇｜Temporary Pools

ChatGPT Prompt（提示词）使用技巧

Linux2024年搜狗输入法用不了怎么办，没关系！！！我们还有它！！！！

小米WIFI组网心得

中文CAD R14 64bit下载说明

三星手机真假测试软件,三星手机最全攻略！选购，查验真伪、生产日期，系统使用技巧...

Electron 详解

java英文翻译_关于JAVA领域的外文翻译（适用于毕业论文外文翻译+中英文对照）.doc...

换一个电脑主机多少钱这个具体要看你换什么CPU

最新文章

Sublime 32位 激活码

windows下载安装远程桌面工具RealVNC-Server教程(RealVNC_E4_6_1版带注册码)

【亲测免费】 抖音直播伴侣推流密钥获取工具使用教程

【亲测免费】 Proxifer 安装包与注册码

Royal TSX许可证密钥(6.x后所有版本都可以用)

程序员刚毕业，先去大厂镀金还是先去小厂攒经验？

万象2008清空boss账户密码

【Tools】GitBook简明教程

oracle exadata celldisk 闪存盘受损导致性能下降

SDUT 2138 图结构练习——BFSDFS——判断可达性

WordPress get parent category taxonomy

Omit specific product categories from WooCommerce shortcode

Updating Posts table in database without overwriting user generated content

php - Use wp_get_recent_posts with search term

responsive - How to exclude an image size from the Wordpress srcset

Sublime 32位激活码

【亲测免费】抖音直播伴侣推流密钥获取工具使用教程