vllm的SamplingParams参数

vllm部署示例

from vllm import LLM, SamplingParams

# Sample prompts.
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

# Create an LLM.
llm = LLM(model="facebook/opt-125m")
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}"

参数列表

n: Number of output sequences to return for the given prompt.
best_of: Number of output sequences that are generated from the prompt.
    From these `best_of` sequences, the top `n` sequences are returned.
    `best_of` must be greater than or equal to `n`. This is treated as
    the beam width when `use_beam_search` is True. By default, `best_of`
    is set to `n`.
presence_penalty: Float that penalizes new tokens based on whether they
    appear in the generated text so far. Values > 0 encourage the model
    to use new tokens, while values < 0 encourage the model to repeat
    tokens.
frequency_penalty: Float that penalizes new tokens based on their
    frequency in the generated text so far. Values > 0 encourage the
    model to use new tokens, while values < 0 encourage the model to
    repeat tokens.
repetition_penalty: Float that penalizes new tokens based on whether
    they appear in the prompt and the generated text so far. Values > 1
    encourage the model to use new tokens, while values < 1 encourage
    the model to repeat tokens.
temperature: Float that controls the randomness of the sampling. Lower
    values make the model more deterministic, while higher values make
    the model more random. Zero means greedy sampling.
top_p: Float that controls the cumulative probability of the top tokens
    to consider. Must be in (0, 1]. Set to 1 to consider all tokens.
top_k: Integer that controls the number of top tokens to consider. Set
    to -1 to consider all tokens.
min_p: Float that represents the minimum probability for a token to be
    considered, relative to the probability of the most likely token.
    Must be in [0, 1]. Set to 0 to disable this.
use_beam_search: Whether to use beam search instead of sampling.
length_penalty: Float that penalizes sequences based on their length.
    Used in beam search.
early_stopping: Controls the stopping condition for beam search. It
    accepts the following values: `True`, where the generation stops as
    soon as there are `best_of` complete candidates; `False`, where an
    heuristic is applied and the generation stops when is it very
    unlikely to find better candidates; `"never"`, where the beam search
    procedure only stops when there cannot be better candidates
    (canonical beam search algorithm).
stop: List of strings that stop the generation when they are generated.
    The returned output will not contain the stop strings.
stop_token_ids: List of tokens that stop the generation when they are
    generated. The returned output will contain the stop tokens unless
    the stop tokens are special tokens.
include_stop_str_in_output: Whether to include the stop strings in output
    text. Defaults to False.
ignore_eos: Whether to ignore the EOS token and continue generating
    tokens after the EOS token is generated.
max_tokens: Maximum number of tokens to generate per output sequence.
logprobs: Number of log probabilities to return per output token.
    Note that the implementation follows the OpenAI API: The return
    result includes the log probabilities on the `logprobs` most likely
    tokens, as well the chosen tokens. The API will always return the
    log probability of the sampled token, so there  may be up to
    `logprobs+1` elements in the response.
prompt_logprobs: Number of log probabilities to return per prompt token.
skip_special_tokens: Whether to skip special tokens in the output.
spaces_between_special_tokens: Whether to add spaces between special
    tokens in the output.  Defaults to True.
logits_processors: List of functions that modify logits based on
    previously generated tokens.
  • 9
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
VLLM是指通过使用自然语言理解和生成技术,为llama2(即少儿启蒙推理游戏)构建一个强大的智能推理引擎。llama2是一个有趣的谜题游戏,旨在帮助儿童培养逻辑思维和推理能力。VLLM的目标是通过语义理解和生成技术,使llama2能够理解和响应玩家的问题和指令。 VLLM使用的技术包括自然语言处理、机器学习和推理算法。它能够理解不同玩家的提问,并根据题目提供合适的答案和指导。VLLM还能够根据玩家的反馈和行为进行学习,提供更准确的推理和解答。 通过使用VLLM,llama2游戏变得更加有趣和教育性。孩子们可以与电脑程序进行对话,提出问题,解决谜题,培养他们的思维能力和推理能力。VLLM能够提供有趣且适应儿童认知水平的谜题,并通过给予提示和解释,帮助他们学习解决问题的方法和策略。 VLLM的应用不仅局限于游戏中,它也可以在教育领域发挥重要作用。教育者可以利用VLLM的推理引擎开发更多有趣的教学资源,帮助孩子们在解决问题和推理推测过程中积累知识和技能。此外,VLLM还可以为教育者提供相关反馈和评估,帮助他们更好地了解学生的学习情况和需求。 总之,VLLM为llama2带来了智能推理的能力,使得孩子们能够通过游戏和探索培养自己的思维能力和推理能力。同时,VLLM的应用也拓展了教育领域的可能性,为教育者和学生提供更多有趣和有效的学习资源和工具。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值