先手获胜逻辑题
After six months of competition (and a few last-minute submissions), we are happy to announce the conclusion and winners of the Obstacle Tower Challenge. We want to thank all of the participants for both rounds and congratulate Alex Nichol, the Compscience.org team, and Songbing Choi for placing in the challenge. We are also excited to share that we have open-sourced Obstacle Tower for the research community to extend for their own needs.
经过六个月的比赛(以及最后的提交),我们很高兴地宣布障碍塔挑战赛的结论和获奖者。 我们要感谢所有参会者,并祝贺Compscience.org团队的Alex Nichol和Choibing Choi参加了挑战。 我们也很高兴分享我们为研究界开放了开源障碍塔,以扩展其自身需求。
挑战赛冠军 (Challenge winners)
We started this challenge in February as a way to help foster research in the AI community, by providing a challenging new benchmark of agent performance built in Unity, which we called Obstacle Tower. The Obstacle Tower was developed to be difficult for current machine learning algorithms to solve, and push the boundaries of what was possible in the field by focusing on procedural generation. Key to that was only allowing participants access to one hundred instances of the Obstacle Tower, and evaluating their trained agents on a set of unique procedurally generated towers they had never seen before. In this way, agents had to be able not only to solve the versions of the environment they had seen before, but also do well on unexpected variations, a key property of intelligence referred to as generalization.
我们在2月开始了这项挑战,它是通过提供在Unity中构建的具有挑战性的代理性能新基准(我们称为Obstacle Tower)来帮助促进AI社区中的研究的一种方式。 障碍塔的开发使得当前的机器学习算法难以解决,并且通过专注于过程生成来突破该领域的极限。 这样做的关键是仅允许参与者访问障碍塔的一百个实例,并在一组他们从未见过的独特程序生成的塔上评估他们的受训代理。 这样,代理不仅必须能够解决他们之前所见过的环境的版本,而且还必须在意外变化方面做得很好,这是智能的一项关键属性,称为通用化。
Once we created Obstacle Tower we performed preliminary benchmarking ourselves using two of the state-of-the-art algorithms at the time. Our learned agents were able to solve a little over an average of 3 floors solved on these unseen instances of the tower used for evaluation. In contrast, humans without experience playing video games are able to solve an average of 15 floors, often getting as high as 20 floors into a tower.
创建障碍物塔之后,我们当时使用两种最新算法对自己进行了初步基准测试。 我们经验丰富的代理商能够解决在这些看不见的评估塔实例上平均解决的3层楼房。 相比之下,没有经验的人可以玩电子游戏,平均可以解决15层楼高的问题,通常可以爬入20层楼的塔楼。
Since the start of the contest, we have received close to 3,000 of submitted agents and been delighted to watch as participant’s continued to submit even more compelling agents for evaluation. The top six final agents submitted by participants were able to solve over 10 floors of unseen versions of the tower, with the top entry solving an average of nearly 20 floors, similar to the performance of experienced human players. We wanted to highlight all participants who solved at least ten floors during evaluation, as well as our top three winners.
自比赛开始以来,我们已收到近3,000名已提交的代理商,很高兴看到参与者继续提交更多有说服力的代理商进行评估。 参与者提交的前六名最终代理商能够解决10层以上未见过的塔楼问题,而最高入围者平均可解决近20层,这类似于经验丰富的人类玩家的表现。 我们想重点介绍在评估过程中至少解决了10个楼层的所有参与者以及我们的前三名获奖者。
Challenge Winners | ||||
Place | Name | Username | Average floors | Average reward |
1st | Alex Nichol | unixpickle | 19.4 | 35.86 |
2nd | Compscience.org | giadefa | 16 | 28.7 |
3rd | Songbin Choi | sungbinchoi | 13.2 | 23.2 |
Honorable Mentions | ||||
Place | Name | Username | Average floors | Average reward |
4th | Joe Booth | joe_booth | 10.8 | 18.06 |
5th | Doug Meng | dougm | 10 | 16.5 |
6th | UEFDL | Miffyli | 10 | 16.42 |
挑战优胜者 | ||||
地点 | 名称 | 用户名 | 平均楼层 | 平均奖励 |
第一 | 亚历克斯·尼科尔 | 善变 | 19.4 | 35.86 |
第二名 | Compscience.org | 贾迪法 | 16 | 28.7 |
第三名 | 崔松彬 | 松宾町 | 13.2 | 23.2 |
荣誉奖 | ||||
地点 | 名称 | 用户名 | 平均楼层 | 平均奖励 |
第四名 | 乔·布斯 | joe_booth | 10.8 | 18.06 |
第五名 | 道格·孟 | 道格 | 10 | 16.5 |
第六名 | UEFDL | 米菲利 | 10 | 16.42 |
开源发布 (Open source release)
We are happy to announce that all of the source code for Obstacle Tower is now available under the Apache 2 license. We waited to open source until the contest was completed to prevent anyone from reverse-engineering the task or evaluation process. Now that it is over, we hope researchers and users are able to take things apart to help learn how to solve the task better, as well as modify the Obstacle Tower for your own needs. The Obstacle Tower was built to be highly modular, and relies heavily on procedural generation of multiple aspects of the environment, from the floor layout to the item and module placement in each room. We expect that this modularity will make it easy for researchers to define their own custom tasks using the pieces and tools we’ve built.
我们很高兴地宣布, Obstacle Tower的所有源代码现在都可以通过Apache 2许可获得。 我们一直等到开源,直到比赛结束,以防止任何人对任务或评估过程进行逆向工程。 现在已经结束了,我们希望研究人员和用户能够将事情分解开来,以帮助学习如何更好地解决任务,并根据自己的需要修改障碍塔。 障碍物塔的高度模块化设计,严重依赖于环境的各个方面的程序生成,从地板布局到每个房间中的物品和模块放置。 我们希望这种模块化将使研究人员可以轻松地使用我们构建的部件和工具来定义自己的自定义任务。
The focus of the Obstacle Tower Challenge is what we refer to in our paper as weak generalization (sometimes called within-distribution generalization). For the challenge, agents had access to one hundred towers and were tested on an additional five towers. Importantly, all of these towers were generated using the same set of rules. As such, there were no big surprises for the agents.
在我们的论文中,障碍塔挑战的重点是弱泛化(有时称为分布内泛化)。 为了应对挑战,特工可以使用一百座塔,并在另外五座塔上进行了测试。 重要的是,所有这些塔都是使用相同的规则集生成的。 因此,对于代理商来说并没有什么大的惊喜。
Also of interest is a different kind of generalization, what we refer to as the strong kind (or sometimes called out of distribution). In this scenario, the agent would be tested on a version of Obstacle Tower, which was generated using a different set of rules from the training set. In our paper, we held out a separate visual theme for the evaluation phase, which used different textures, geometry, and lighting. Because our baseline agents performed catastrophically in these cases, we opted to only test for weak generalization in the challenge. That being said, we think that strong generalization benchmarks can be an even better measure of progress in artificial intelligence, as humans are easily able to strongly generalize, while agents typically fail at such tasks. We look forward to the community extending our work and proposing their own unique benchmarks using this open source release.
同样令人感兴趣的是另一种类型的概括,我们称为强类型的概括(有时称为分布不合理)。 在这种情况下,将在Obstacle Tower的版本上测试代理,该版本使用与训练集不同的规则集生成。 在我们的论文中,我们为评估阶段提供了一个单独的视觉主题,该主题使用了不同的纹理,几何形状和照明。 由于我们的基准代理在这些情况下表现出灾难性,因此我们选择仅测试挑战中的泛化能力弱。 话虽如此,我们认为强大的泛化基准可以更好地衡量人工智能的进步,因为人类很容易就能进行强泛化,而代理商通常无法完成此类任务。 我们期待社区使用此开源版本扩展我们的工作,并提出自己独特的基准。
Lastly, we want to give a shout-out to our collaborators on the project, Julian Togelius and Ahmed Khalifa, and thank them for their contributions in the design process, and for Ahmed’s open source procedural generation tool, which we utilized to create the floor layouts in Obstacle Tower.
最后,我们要对项目的合作者Julian Togelius和Ahmed Khalifa表示衷心的感谢,感谢他们在设计过程中的贡献以及Ahmed的开源程序生成工具 ,我们将其用于创建地板障碍塔的布局。
To learn more about the project, as well as how to extend it for your own uses, head over to the GitHub page for the project.
要了解有关该项目的更多信息,以及如何扩展该项目以供您自己使用,请转到该项目的GitHub页面 。
认识优胜者 (Meet the Winners)
第一名– Alex Nichol (1st Place – Alex Nichol)
About Alex
关于亚历克斯
Alex has been programming since he was 11 years old. As a senior in high school, Alex became very interested in AI. He is completely self-taught in AI, using online courses, blogs, and papers as necessary. He studied at Cornell for three semesters before leaving to pursue AI full-time and ultimately joining OpenAI (he has since left but still maintains a strong interest in AI). Recently he has taken up cooking!
Alex从11岁开始从事编程工作。 在高中时,Alex对AI非常感兴趣。 他完全学会了人工智能,可以根据需要使用在线课程,博客和论文。 在离开全职学习AI并最终加入OpenAI之前,他在康奈尔大学学习了三个学期(此后他离开了,但仍然对AI保持浓厚的兴趣)。 最近他开始做饭!
Details
细节
Alex trained his agent in several steps. First, he trained a classifier to identify objects (boxes, doors, etc..). This classifier was used throughout the process to tell the agent what objects it has seen in the past 50 timesteps. Then, Alex used behavioral cloning to train an agent to imitate human demonstrations. Lastly, Alex used a variant of Proximal Policy Optimization (PPO) which he calls “prierarchy” to fine-tune his behavioral cloned agent based on the game’s reward function. This variant of PPO replaces the entropy term with a KL-divergence term that keeps the agent close to the original behavior cloned policy. Alex tried a few other approaches that didn’t quite pan out – Generative Adversarial Imitation Learning (GAIL) for more sample-efficient imitation learning, CMA-ES to learn a policy from scratch, and stacking last layer features from the classifier and feeding it into the agent (instead of using the classifier’s outputs for the state).
亚历克斯分几步训练了他的经纪人。 首先,他训练了一个分类器来识别物体(盒子,门等)。 在整个过程中都使用此分类器来告诉代理在过去50个时间步中看到了哪些对象。 然后,亚历克斯使用行为克隆来训练特工模仿人类的示威游行。 最后,亚历克斯使用了近端策略优化 (PPO)的一种变体,他将其称为“主体系”,以根据游戏的奖励功能来微调其行为克隆代理。 PPO的这种变体用KL散度项替换了熵项,该项使代理保持接近原始行为克隆策略。 亚历克斯尝试了其他尚未成功的方法- 生成对抗式模仿学习 (GAIL),以提高样本效率,模仿学习, CMA-ES从头开始学习策略,并从分类器中堆叠最后一层功能并提供给它进入代理(而不是使用分类器的状态输出)。
If you would like to learn more, Alex wrote up a detailed blog post and has shared the code he used for the challenge. You can also find Alex on Twitter, Github, and his personal website.
如果您想了解更多信息,Alex撰写了详细的博客文章 ,并分享了用于挑战的代码 。 您还可以在Twitter , Github及其个人网站上找到Alex。
第二名– Compscience.org (2nd Place – Compscience.org)
|
|
About Compscience.org
关于Compscience.org
At the computational science laboratory (www.compscience.org) at Universitat Pompeu Fabra, Gianni and Miha work at the interface between computing and different application areas, looking at developing computational models with intelligent behavior. Gianni is the head of the computational science laboratory at University Pompeu Fabra, an Icrea research professor, and a founder at Acellera. Miha is a PhD student in Gianni’s biology group. The team felt that the Obstacle Tower Challenge was a good way to quickly learn and iterate new ideas in a relevant 3D environment.
在Pompeu Fabra大学的计算科学实验室(www.compscience.org),Gianni和Miha致力于计算与不同应用领域之间的接口,致力于开发具有智能行为的计算模型。 Gianni是Pompeu Fabra大学计算科学实验室的负责人,Icrea研究教授,Acellera的创始人。 Miha是Gianni生物学小组的博士研究生。 团队认为,障碍塔挑战赛是在相关3D环境中快速学习和迭代新想法的好方法。
Details
细节
The team’s final model was PPO with a reduced action set and a reshaped reward function. For the first floors, the team also used KL-devergence terms to induce behaviors into the agent similar to what Alex Nichol did. But was later dropped in higher floors. The team also used a sampling algorithm at the key floors to focus the actors to run more in floors and seeds where it was neither good nor bad. Later, the team used a more standard sampling at higher floors. The team did not have enough time to assess the exact benefits of each method, which they plan to do in the future. They plan to release the source code as soon as they can understand better and generalize these aspects. Lastly, the team tried world models (create a very compressed representation of the observation with an autoencoder and build a policy using evolutionary algorithms over this space). It did not work but the team learned a lot.
团队的最终模型是具有减少动作设置和重塑奖励功能的PPO。 在第一层,该团队还使用KL扩散术语将行为诱导给代理,类似于Alex Nichol所做的那样。 但是后来掉到了更高的楼层。 该团队还在关键楼层使用了采样算法,以使演员集中精力在既不好又不好的楼层和种子上跑。 后来,该团队在较高楼层使用了更标准的采样。 团队没有足够的时间来评估每种方法的确切收益,他们计划将来这样做。 他们计划尽快理解并概括这些方面,以发布源代码。 最后,该团队尝试了世界模型(使用自动编码器创建了非常压缩的观测结果表示,并在此空间上使用进化算法构建了策略)。 它没有用,但是团队学到了很多东西。
The team enjoyed the Obstacle Tower and believe that more realistic environments in terms of physics will be important so that the agents can do amazing things with enough samples. The team used 10B steps to train their agent. You can find out more about the team on Github and the lab’s website.
团队喜欢障碍物塔,并相信从物理角度而言,更现实的环境非常重要,这样代理商才能用足够的样本完成令人惊奇的事情。 团队使用了10B步骤来训练他们的特工。 您可以在Github和实验室的网站上找到有关该团队的更多信息。
第三名–崔松彬 (3rd Place – Songbin Choi)
About Songbin
关于松滨
Based in Seoul, Songbin has a PhD in biomedical engineering. Like many others who are fascinated with deep learning, Songbin is self-taught. He leverages the many papers, lectures, libraries, and code that are freely available online. He has tackled several computer vision tasks and challenges in the past. Songbin was excited about the Obstacle Tower and the chance to wrestle with a reinforcement learning problem.
Songbin总部位于首尔,拥有生物医学工程博士学位。 像许多其他对深度学习着迷的人一样,松斌是自学成才的。 他利用了许多在线免费提供的论文,讲座,图书馆和代码。 过去,他解决了一些计算机视觉任务和挑战。 松滨对障碍塔和与强化学习问题搏斗的机会感到兴奋。
Details
细节
Songbin used the PPO algorithm implemented as part of the ML-Agents toolkit. During the challenge, his agent took actions in a sequentially coordinated fashion to achieve certain subtasks (for example, moving the box to a certain position). He used a gated recurrent unit (GRU) in order for the agent to make memory-backed decisions. To reduce overfitting, dropout layers were added and left right flipping was also used, which is a common data augmentation method in imaging tasks. He then recorded human play and repeatedly added those experiences to the replay buffer while training. As a side effect of playing the Obstacle Tower during the challenge, Songbin has become an expert player of the game. Although human play is brutally expensive to collect, it is of high quality and reduced the amount of simulation time needed. Songbin also tried longer sequence lengths but failed to achieve better performance (even though it was contrary to his expectation). He is still trying to figure out why it did not work. He utilized all 100 tower seeds during training with no separate validation set for evaluation. He suspected overfitting in the model, even though he tried to reduce as much as possible.
Songbin使用了作为ML-Agents工具包的一部分实现的PPO算法。 在挑战期间,他的经纪人以顺序协调的方式采取行动,以实现某些子任务(例如,将盒子移动到某个位置)。 他使用门控循环单元 (GRU)来使代理做出由内存支持的决策。 为了减少过度拟合,添加了辍学层,并且还使用了左右翻转,这是成像任务中常用的数据增强方法。 然后,他录制了人类的游戏,并在训练时反复将这些体验添加到重播缓冲区中。 作为挑战赛期间玩障碍塔的副作用,松彬已成为游戏的专家玩家。 尽管收集人的游戏非常昂贵,但它的质量很高,并且减少了所需的模拟时间。 Songbin还尝试了更长的序列长度,但是未能获得更好的性能(即使这与他的预期相反)。 他仍在尝试弄清为什么它不起作用。 在培训期间,他使用了全部100颗塔种子,没有单独的验证集进行评估。 他怀疑模型过度拟合,尽管他尝试尽可能地减少。
Lastly, although deep learning methods in computer vision, especially image classification tasks, has matured in recent years, deep reinforcement learning tasks are relatively more tricky. His top scoring agent failed to show comparable performance to human player (around floor 30). Watching AlphaGo and AlphaStar beat professional players, Songbin believes there is still a lot of room for improvement in the Obstacle Tower.
最后,尽管近年来计算机视觉中的深度学习方法(尤其是图像分类任务)已经成熟,但是深度强化学习任务相对较棘手。 他的最佳得分经纪人未能表现出与人类选手相当的表现(大约30楼)。 看着AlphaGo和AlphaStar击败职业选手,宋斌认为障碍物塔仍有很大的改进空间。
荣誉奖 (Honorable mentions)
()
乔·布斯 (Joe Booth)
About Joe
关于乔
Joe Booth has over 25 years in the video game industry and worked on many familiar titles and franchises such as FIFA, Need For Speed, Ghost Recon, and Rollercoaster Tycoon. He is currently the VP of development for an incubator called Orions Wave. The main focus is Orions Systems, which is a video analytics platform that uses humans and AI/CV compute in an interchangeable, distributed way to get around the limits of today’s AI.
乔·布斯(Joe Booth)在视频游戏行业拥有25年以上的经验,曾从事许多熟悉的游戏和特许经营,例如FIFA,Need For Speed,Ghost Recon和Rollercoaster Tycoon。 他目前是Orions Wave孵化器的开发副总裁。 主要关注点是Orions Systems,这是一个视频分析平台,以可互换的分布式方式使用人类和AI / CV计算来克服当今AI的局限性。
Details
细节
Joe used an optimized version of PPO + Demonstrations for the Obstacle Tower challenge. He focused on compressing the input/output of the network, adding a recurrent memory and basing the hyperparameters on Large-Scale Study of Curiosity-Driven Learning’s Unity environment. For Round 2, he added demonstrations that passed floor 10, but never consistently. He also tried working towards using semantics. Although he realized it would not pay off in time, it is the direction he wanted to go in the long term.
Joe为障碍塔挑战使用了PPO +演示的优化版本。 他专注于压缩网络的输入/输出,添加循环内存,并将超参数基于好奇心驱动学习的 Unity环境的大规模研究 。 在第二轮中,他添加了超过10楼的示威活动,但从未保持一致。 他还尝试使用语义。 尽管他意识到这不会及时得到回报,但这是他希望长期发展的方向。
Joe wrote up a separate blog post on the Obstacle Tower and released a round 1 paper on Arxiv. His round 1 code can be found here. You can find Joe on Twitter, LinkedIn, Github, and his personal website.
乔在障碍塔上写了一篇博客文章 ,并在Arxiv上发表了第一轮论文 。 他的第一回合代码可以在这里找到。 您可以在Twitter , LinkedIn , Github及其个人网站上找到Joe。
道格·孟 (Doug Meng)
About Doug
关于道格
Doug Meng is a solution architect at NVIDIA, focusing on applied machine learning, enabling GPGPU in the cloud. Previously, he had a few years of experience in machine learning, statistics, and distributed systems, with some research experience in signal processing.
道格·孟 ( Doug Meng)是NVIDIA的解决方案架构师,专注于应用机器学习,可在云中启用GPGPU。 在此之前,他在机器学习,统计和分布式系统方面拥有几年的经验,并在信号处理方面具有一定的研究经验。
Details
细节
Doug trained his agent using a modified DeepMind IMPALA with batched inference and customized replay buffer. He used Obstacle Tower retro mode with frame stacking of 4 and a few other tricks from the OpenAI baseline. The agent took about 12 days to train. Most of the time was spent struggling to reduce the training time in order to try out more algorithms. He also tried PPO and Rainbow, but his hypothesis was that off policy hurts the model performance quite a bit, whereas IMPALA is slightly off-policy. Those agents could not consistently get past floor 7.
道格使用经过修改的DeepMind IMPALA ,分批推理和自定义重播缓冲区来训练其代理。 他使用了障碍塔复古模式,其中帧堆叠为4,还包含OpenAI基线的其他技巧。 代理商花了大约12天的时间训练。 为了尝试更多算法,大部分时间都花在了减少训练时间上。 他还尝试了PPO和Rainbow ,但他的假设是,偏离政策会严重损害模型性能,而IMPALA则稍微偏离政策。 这些特工无法持续越过7楼。
UEFDL (UEFDL)
About UEFDL
关于UEFDL
UEFDL is a three-member team including Anssi Kanervisto, Janne Karttunen, Ville Hautamaki. The team is from the School of Computing, University of Eastern Finland. Anssi is a 2nd year Ph.D. student working on using video games in reinforcement learning research. Janne is a recent MSc graduate with a thesis on deep reinforcement and transfers learning from games to robotics. Ville is a senior researcher focusing on machine learning, Bayseian inference, and speech technology.
UEFDL是一个三人小组,成员包括Anssi Kanervisto ,Janne Karttunen和Ville Hautamaki 。 该团队来自东芬兰大学计算机学院。 Anssi是第二年的博士学位。 学生在强化学习研究中使用视频游戏。 Janne是一位最近的理学硕士研究生,论文主题是深度强化,并将学习内容从游戏转移到机器人技术。 Ville是一名高级研究员,致力于机器学习,贝叶斯推理和语音技术。
Details
细节
The team used Advantage Actor Critic (A2C) with a Long-Short-Term Memory unit (LSTM) from stable-baselines packages. One model for floors 0-4 and another for 5-9. The models for floors 10+ did not learn to solve the puzzle so were not included. The team first trained models for floors 0-4 then use that model as a starting point for floors 5-9. This way, the agent can focus on finding the key in later levels and avoid issues with the model for floors 5-9 forgetting how to complete earlier floors (unlikely, but just in case). They also tried a few other experiments such as AC2 with curiosity, PPO at different entropies, and replacing some of the Obstacle Tower environments with “replay environments” of human gameplay. Overall, the team was excited about the competition and other machine learning video game competitions.
该团队使用了来自稳定基线软件包的Advantage Actor Critic (A2C)和一个 长期记忆 单元( LSTM)。 一种型号适用于0-4楼,另一种适用于5-9楼。 10层以上楼层的模型没有学会解决难题,因此未包含在内。 团队首先训练了0-4层的模型,然后将该模型用作5-9层的起点。 这样,代理可以集中精力在以后的级别中查找密钥,并避免5-9层模型的问题而忘记了如何完成较早的层(不太可能,但以防万一)。 他们还尝试了其他一些实验,例如具有好奇心的AC2 ,不同熵的PPO,以及用人类玩法的“重播环境”代替了一些障碍塔环境。 总体而言,团队对比赛和其他机器学习视频游戏比赛感到兴奋。
谢谢! (Thank you!)
Thanks so much to everyone who participated and our partners at Google Cloud for providing GCP credits and AICrowd for hosting the challenge. When we started the competition we weren’t sure if participants would be able to pass the ten-floor threshold, but the community has impressed us with getting as far as 19 floors into unseen versions of the tower. That being said, each instance of Obstacle Tower contains 100 floors. This means that there is still 80% of the tower left unsolved! Furthermore, there is a greater need for control and planning in the upper floors, as enemies, more dangerous rooms, and more complicated floor layouts are introduced. We think this means there is a lot of room for new methods to be developed in the field to make additional progress. We look forward to seeing what progress is made over the next months and years as researchers continue to tackle Obstacle Tower.
非常感谢参与此活动的每个人以及我们在Google Cloud上的合作伙伴提供的GCP积分,以及AICrowd主办的这项挑战。 当我们开始比赛时,我们不确定参与者是否能够超过10层楼的门槛,但是社区给我们留下了深刻的印象,因为它可以将多达19层楼变成看不见的塔楼。 话虽如此,障碍塔的每个实例都包含100层。 这意味着仍有80%的塔尚未解决! 此外,由于引入了敌人,更危险的房间以及更复杂的地板布局,因此更需要对高层进行控制和规划。 我们认为这意味着在现场开发新方法以取得更多进步的空间很大。 我们期待着随着研究人员继续研究障碍塔,在接下来的几个月和几年中取得什么进展。
If you have any questions about the challenge please email us at OTC@unity3d.com. If you’d like to work on this exciting intersection of Machine Learning and Games, we are hiring for several positions, please apply!
如果您对挑战有任何疑问,请发送电子邮件至OTC@unity3d.com。 如果您想在机器学习和游戏这个令人兴奋的交叉领域工作,我们正在招聘几个职位,请申请 !
先手获胜逻辑题