麻将商用级别AI

一套AI逻辑,基本能商用,模拟中等水平玩家没有问题。

缺点:没有加入牌型的考虑

主要流程:

//要根据gamecode区分出来几个特殊流程
    public void Analysis(MjPlayer player, int[] cards) {

        TileType originType = new TileType();
        SliceCardList(originType, cards, false);

        //操作处理
        if (player.isCanChi() || player.isCanPeng() || player.isCanGang() || player.isCanAnGang() || player.isCanBuGang()) {
            int tmpCode = AiEventLogic(player, originType, cards, AnalyTingPaiLackCard(player, cards));
            if (tmpCode >= 1) {
                //当前用户(暗杠和补杠)不用取消,直接出牌
                if (tmpCode == 1 && player.getGame().getCurrentPlayer() == player.getPosition()) {
                    setActionCode(0);
                } else {
                    setActionCode(tmpCode);
                    return;
                }
            }
        }

        //只有当前玩家才检测可出牌
        if (player.getGame().getCurrentPlay
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 2
    评论
下面是一个使用 TensorFlow 和 Keras 实现的麻将AI代码示例,主要使用了深度强化学习算法: ```python import numpy as np import tensorflow as tf from tensorflow import keras class MahjongAI: def __init__(self): self.model = self.build_model() self.target_model = self.build_model() self.target_model.set_weights(self.model.get_weights()) self.replay_buffer = [] self.batch_size = 32 self.gamma = 0.99 self.epsilon = 1.0 self.epsilon_min = 0.01 self.epsilon_decay = 0.995 self.update_freq = 1000 self.steps_since_update = 0 def build_model(self): model = keras.Sequential() model.add(keras.layers.Dense(128, input_dim=34, activation='relu')) model.add(keras.layers.Dense(128, activation='relu')) model.add(keras.layers.Dense(34, activation='linear')) model.compile(loss='mse', optimizer=keras.optimizers.Adam()) return model def select_action(self, state): if np.random.rand() < self.epsilon: return np.random.randint(0, 34) q_values = self.model.predict(state.reshape(1, -1))[0] return np.argmax(q_values) def update_replay_buffer(self, state, action, reward, next_state, done): self.replay_buffer.append((state, action, reward, next_state, done)) if len(self.replay_buffer) > 100000: self.replay_buffer.pop(0) def update_model(self): if len(self.replay_buffer) < self.batch_size: return batch = np.random.choice(self.replay_buffer, self.batch_size) states = np.array([transition[0] for transition in batch]) actions = np.array([transition[1] for transition in batch]) rewards = np.array([transition[2] for transition in batch]) next_states = np.array([transition[3] for transition in batch]) dones = np.array([transition[4] for transition in batch]) q_values = self.model.predict(states) next_q_values = self.target_model.predict(next_states) for i in range(self.batch_size): if dones[i]: q_values[i][actions[i]] = rewards[i] else: q_values[i][actions[i]] = rewards[i] + self.gamma * np.max(next_q_values[i]) self.model.fit(states, q_values, verbose=0) self.steps_since_update += 1 if self.steps_since_update % self.update_freq == 0: self.target_model.set_weights(self.model.get_weights()) def train(self, env, episodes=100): for episode in range(episodes): state = env.reset() done = False total_reward = 0 while not done: action = self.select_action(state) next_state, reward, done, _ = env.step(action) self.update_replay_buffer(state, action, reward, next_state, done) self.update_model() state = next_state total_reward += reward print('Episode', episode+1, 'Total Reward:', total_reward) if self.epsilon > self.epsilon_min: self.epsilon *= self.epsilon_decay def play(self, env): state = env.reset() done = False while not done: action = self.select_action(state) print('Action:', action) state, _, done, _ = env.step(action) if __name__ == '__main__': ai = MahjongAI() # TODO: 实现麻将游戏环境,并传入 train 或 play 方法中进行训练或游戏 ``` 这个示例代码使用了深度强化学习算法,建立了一个神经网络模型来预测每个操作的质量分数,使用经验回放和目标网络来提高训练效率。在实际使用时,需要根据实际情况对代码进行修改和调整。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

豆浆456

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值