OpenAI Embedding Client — vLLM
https://github.com/huggingface/text-embeddings-inference
打好镜像之后在部署服务的时候:
text-embeddings-router --model-id /mnt/model --port 8811 [--pooling cls,可有可无] --dtype float16 --json-output [还有一些其他参数,参考hf那个链接]
使用代码:(我是并发,异步使用的)
MAX_CHARS = 500
input_texts = [text[:MAX_CHARS] for text in examples]#要么部署的时候设置一下最大token数,要么在这里自行截断
async def _process_batch_async(self, session, batch_texts, batch_qids, batch_idx, max_retries=1):
"""Processes a single batch asynchronously with retries."""
payload = {"inputs": batch_texts}
for attempt in range(max_retries):
try:
async with session.post(self.api_url, json=payload, timeout=30.0) as response:
response.raise_for_status()
response_json = await response.json()
batch_results = {}
for j, data in enumerate(response_json):
batch_results[batch_qids[j]] = data
return batch_results
except Exception as e:
if attempt + 1 == max_retries:
return {} # Return empty dict on failure
await asyncio.sleep(1) # Wait 1 second before retrying
tasks = []
async with aiohttp.ClientSession() as session:
for i in range(0, len(input_texts), batch_size):
batch_texts = input_texts[i:i + batch_size]
batch_qids = qids[i:i + batch_size]
task = self._process_batch_async(session, batch_texts, batch_qids, i)
tasks.append(task)
batch_results_list = await asyncio.gather(*tasks)
for batch_results in batch_results_list:
if batch_results:
dqid2emb.update(batch_results)
2520

被折叠的 条评论
为什么被折叠?



