运动学习与控制-学习笔记(三)——运动控制理论


一、理解控制理论以及两个重要概念

1.Motor Control Theory

  • 理论的概念
    A theory helps us understand phenomena and explains the reasons why these phenomena exist or behave as they do.

  • 好的理论需要
    1)精准描述一大类观察
    2)明确预计出未来观察的结果

2.两个重要概念:Coordination,Degree of Freedom Problem

  • Coordinatin协调
    Coordination is the patterning of head, body, and limb movements relative to the patterning of environmental objects and events.
    强调:Patterning模式不仅仅只针对技术娴熟的人,无论技术如何,都有可能产生pattern.
    描述-角角图

  • 角角图的理解
    ……
    ……
    !

  • Degree of Freedom Problem,DoF自由度问题
    the number of independent components of in a control system and the number of ways each component can vary.
    关键:controy problem, complex system, specific result, determining how to contrain the system’s many DoFs

二、Open-loop vs. Closed-loop Control system,及运动领域合理举例

1.定义

  • Open-loop Control System开环控制系统:a control system in which all the information needed to initiate and carry out an action as planned is contained in the initial instructions to the effectors.

  • Closed-loop Control system闭环控制系统:a system of control in which, during the course of an action, feedback is compared against a standard or reference to enable an action to be carried out as planned.

  • 判断标准

  • 反馈的有无,反馈可使控制中心及时的修正运动。

  • 或者是否产生可利用的反馈,若反馈不是必要的或没有足够的时间利用反馈,指令中心已提供完成运动的全部信息,则运动控制系统仍属于开环控制系统。

  • 闭环控制系统中运动指令只用来起始动作,执行和完成由反馈信息支配。
    在这里插入图片描述

  • 强调与Open/Close Motor skills 的区别
    Open/Close Motor skills判断标准:环境的变化,与Open/Close control system没有相关性。

2.举例

Open-loop Control System:
足球运动中守门员防守点球、100米冲刺跑
Close- loop Control System:
长跑运动员长跑中配速控制、三级跳远踏板前步幅等调整和踏板后重心等调整

三、两种控制理论

1.Motor Program Theory运动程序理论

  • Motor program 运动程序
    a memory-based construct that controls coordinated movement
  • General motoe program(GMP) 通用运动程序
    a class of actions,which is a set of different actions having a common but unique set of features
  • Invariant Features 固定特征
    the basis of what is stored, movement-related features, fundamental pattern of the class of actions, does not vary from one performance to another
  • Parameters 参数
    movement-related features,added to the memory,be varied from one performance to another

2.Dynamic System Theory

  • 稳定态 stable refers to behavioral steady states
  • 吸引子 a behavior occurs when system is allowed to operate in a preferred manner
  • 有序参数和控制参数
  • 自组织
    when certain conditions characterize a situation,a specific stable pattern of bahavior emerges.
  • 协调结构
    skilled action,nervous constrains specific collections of muscles and joints to action cooperatively
    (可以使内在的,或者后天联系所得)
  • 知觉-运动耦合 知觉变量和动作变量相互作用,使某一动作的特定运动状态和知觉变量的特定特征相一致。

四、作业

1.Open-Loop 和 Close-Loop 控制系统各举一个例子,并指出其是 Open or Close Motor Skill。
答:
Open-Loop 控制系统:100米跑,因其运动开始时指令中心已经决定了运动过程,得到的一定程度的反馈不足以被利用,故为Open-Loop Control System;因进行时运动环境中会有来自对手的一定程度的变化干扰,所以此运动技巧是Open Montor Skill。
Close-Loop 控制系统:三级跳远,因踏板前步幅等调整和踏板后重心等调整依赖肌群的反馈和指令中心应答,故为Close-Loop Control System;因三级跳远进行时运动环境较为稳定,所以此运动技巧为Close Motor Skill。
2.尝试用Motion Program Theory 以及 Dynamic System Theory取描述接住球这套动作(提示:球飞行时间过了75%的样子,手会做好抓的准备,参数自己设置)。
答:
Using Motion Program Theory:
I think pass is a GMP. It controls a class of actions.For example,when we distinguish it by passing ways,we can get lobbing pass,triangular pass and ground pass,when we distingush it by a parameter named velocity,we can get quick pass,medium speed pass and slow pass and so on.
Next I will discribe the general pass action by its relative time.
First,when the server takes the ball and holds it in front of the chest,the receiver sets his pose and has a psychological preparation for receiving the ball.I call it preparation phase.
Then,from the server throws out the ball to the time when the ball flies till 75% time,I call it modification phase.In this phase,the receiver continuously modify his hands’position and receiving angle.
Next,from 75% flying time to the reveiver holds the ball,I call it catching phase.In this phase the receiver activates his hand muscles to hold the ball.
Last,from the receiver catches the ball to he finally holds the ball,I call it buffering phase.The receiver relieves the ball’s motive energy by putting his hands towards his chest in this phase.

Using Dynamic System Theory:
I think pass is a nonlinear behavior because it has many irregular behavior mutations.
For the order parameters,I think the passing ways such as lobbing pass,triangular pass and ground pass can be order parameters,they make the cooperative action patterns reappear.
For the control parameters,I think the ball’s velocity can be a control parameter.When pass is in a high speed,the receiver will put his hand more forward as a buffer against the ball’s motive energy.
According to the control parameters,pass in different speeds have different stability and attractors.The receiver will put his hand more forward is more stable for adjusting the high speed so its a stability in the high speed pattern.And people prefer to adopt the forword hands states to hold the ball,so this stable behivior can be a attractor.
When the receiver is ready to hold the ball,he will have a temporal cooordonation of vision and the hands that enables him to perform eye-hand coordination skills.I think it is the perception-action coupling in pass action pattern.
Furthermore,coordinative structure is accomplished by nervous system which constrains collections of hand and arm muscles and joints to act cooperatively.They can help players to achieve the action goal.


  • 4
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
Prioritized Replay 是 Deep Q-Network (DQN) 中的一种重要改进算法。在传统的 DQN 算法中,模型训练是基于经验回放技术的。简单来说,就是将之前的一些观察和动作的经验随机地从经验池中抽取出来进行训练。但是,这种随机抽样并没有考虑到每个经验的重要性。有些经验虽然出现的次数很少,但是对模型的训练影响很大。因此,如果我们能够对经验进行优先级的排序,就能够更加有效地训练模型。 在 Prioritized Replay 算法中,我们使用了一个优先级队列来对经验进行排序。每个经验的优先级是根据其对模型训练的贡献来计算的。具体来说,每个经验的优先级为: $P_i = |\delta_i| + \epsilon$ 其中 $|\delta_i|$ 表示当前状态下真实 Q 值与估计 Q 值之差的绝对值,$\epsilon$ 是一个很小的常数,避免了某些经验的优先级为 0。这个公式的意思是,我们更倾向于选择那些真实 Q 值与估计 Q 值之差较大的经验进行训练。 在进行经验回放时,我们根据经验的优先级从优先级队列中抽取出经验。我们还需要一个重要的参数 $\alpha$,它表示优先级的重要程度。在优先级队列中,每个经验的优先级 $P_i$ 都会被赋予一个权重 $w_i$,它表示该经验在训练中的重要性。这个权重的计算公式为: $w_i = (\frac{1}{N} \frac{1}{P_i})^{\alpha}$ 其中 $N$ 是经验池中经验的总数,$\alpha$ 是一个超参数,控制优先级的重要程度。这个公式的意思是,优先级较高的经验在训练中得到的权重也较高,从而更加有效地更新模型。 需要注意的是,在 Prioritized Replay 算法中,我们对经验进行了优先级排序,但是这并不意味着我们只选择优先级高的经验进行训练。为了保证训练的稳定性,我们还需要引入一个随机因素,以一定的概率从优先级较低的经验中进行抽样。 总之,Prioritized Replay 算法通过对经验进行优先级排序,从而更加有效地训练模型。它是 DQN 算法的一个重要改进,被广泛地应用于深度强化学习领域。
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值