glm4-chat 使用 Xinference 推理踩坑

qq_42060824

已于 2024-08-23 08:32:26 修改

阅读量2k

点赞数 15

文章标签：人工智能

于 2024-08-14 15:11:54 首次发布

本文链接：https://blog.csdn.net/qq_42060824/article/details/141191516

版权

出现错误

GLM-4 chat 9b:'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'

解决方案步骤:

将原模型文件中的 modeling_chatglm.py 替换为THUDM/glm-4-9b-chat-1m 里面的modeling_chatglm.py ，并且对文件进行两处修改具体修改内容如下：

## 源文件
def _update_model_kwargs_for_generation(
            self,
            outputs: ModelOutput,
            model_kwargs: Dict[str, Any],
            is_encoder_decoder: bool = False,
    ) -> Dict[str, Any]:
#修改之后
  def _update_model_kwargs_for_generation(
            self,
            outputs: ModelOutput,
            model_kwargs: Dict[str, Any],
            is_encoder_decoder: bool = False,
            standardize_cache_format: bool = False, #添加了这个变量
    ) -> Dict[str, Any]:

## 源文件
# update past_key_values
        model_kwargs["past_key_values"] = self._extract_past_from_model_output(
            outputs, standardize_cache_format=standardize_cache_format
        )
## 替换
 # update past_key_values
        past_output = self._extract_past_from_model_output(
            outputs, standardize_cache_format=standardize_cache_format
        )
        # adapt transformers update (https://github.com/huggingface/transformers/pull/31116)
        if(type(past_output) is tuple and type(past_output[0]) is str):
            if past_output[0]=="past_key_values":
                model_kwargs["past_key_values"] = past_output[1]
            else:
                model_kwargs["past_key_values"] = None
                print(f"WARN: Get \"{past_output[0]}\" during self._extract_past_from_model_output, not \"past_key_values\"")
        else:
            model_kwargs["past_key_values"] = past_output