oss订阅智能体
写在开头:本系列跟随datewhale的 《MetaGPT多智能体课程》,深入理解并实践多智能体系统的开发,感谢datewhale的开源与组织学习!
介绍
订阅智能体能够在不同的时间里从外界收集信息并对信息总结。
对于python工具要简单了解
1、发起一个简单的http请求:
import asyncio
import aiohttp
async def fetch_webpage():
async with aiohttp.ClientSession() as session:
async with session.get('https://www.example.com') as response:
print(await response.text())
asyncio.run(fetch_webpage())
2、beautifulsoup怎么从html提取信息:
from bs4 import BeautifulSoup #导入 BeautifulSoup
# 解析
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
# 查找所有的 <a> 标签
all_links = soup.find_all('a')
for link in all_links:
print(link.get('href'))
# 查找 class 为 'title' 的元素
title_element = soup.find('p', {'class': 'title'})
print(title_element.text)
# 查找 id 为 'link2' 的元素
link2 = soup.find('a', {'id': 'link2'})
print(link2.text)
# 获取元素的文本内容
print(soup.title.string)
# 获取元素的属性值
print(link1['href'])
# 遍历元素的子节点
for child in soup.body.children:
print(child)
3、trigger() 函数负责定期生成需要发送的消息。它使用 asyncio.sleep() 函数实现了 24 小时的延迟,并使用 yield 关键字将消息传递给调用者。callback() 函数负责处理从 trigger() 函数中获取的消息。它接收一个 Message 对象,并打印出它的内容。
>>> async def trigger():
... while True:
... yield Message("the latest news about OpenAI")
... await asyncio.sleep(3600 * 24)
>>> async def callback(msg: Message):
... print(msg.content)
open source software( oss) 包含role trigger callback即智能体,触发器,数据回调。智能体OSSWatcher帮我们关注网页信息,当达到触发条件时触发trigger ,callback处理OSSWatcher生成的信息,可以发送到微信或者其他。
OSSWatcher角色
OSSWatcher实现了分析热点信息的功能,所以应该有两个action,第一个是获取热点信息,第二个是分析内容。热点用的是github trending榜单。筛选的条件如下:
github trend爬虫爬取
获取我们需要的部分
每一个仓库对应赢Box-class,首先把这些内容复制到github-trending-raw.html文件中,然后通过以下脚本对html瘦身得到slim.html
from bs4 import BeautifulSoup
with open("github-trending-raw.html") as f:
html = f.read()
soup = BeautifulSoup(html, "html.parser")
for i in soup.find_all(True):
for name in list(i.attrs):
if i[name] and name not in ["class"]:
del i[name]
for i in soup.find_all(["svg", "img", "video", "audio"]):
i.decompose()
with open("github-trending-slim.html", "w") as f:
f.write(str(soup))
然后提出两个仓库的html文件,让gpt写爬取代码
import aiohttp
import asyncio
from bs4 import BeautifulSoup
async def fetch_html(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
return await response.text()
async def parse_github_trending(html):
soup = BeautifulSoup(html, 'html.parser')
repositories = []
for article in soup.select('article.Box-row'):
repo_info = {}
repo_info['name'] = article.select_one('h2 a').text.strip()
repo_info['url'] = article.select_one('h2 a')['href'].strip()
# Description
description_element = article.select_one('p')
repo_info['description'] = description_element.text.strip() if description_element else None
# Language
language_element = article.select_one('span[itemprop="programmingLanguage"]')
repo_info['language'] = language_element.text.strip() if language_element else None
# Stars and Forks
stars_element = article.select('a.Link--muted')[0]
forks_element = article.select('a.Link--muted')[1]
repo_info['stars'] = stars_element.text.strip()
repo_info['forks'] = forks_element.text.strip()
# Today's Stars
today_stars_element = article.select_one('span.d-inline-block.float-sm-right')
repo_info['today_stars'] = today_stars_element.text.strip() if today_stars_element else None
repositories.append(repo_info)
return repositories
async def main():
url = 'https://github.com/trending'
html = await fetch_html(url)
repositories = await parse_github_trending(html)
for repo in repositories:
print(f"Name: {repo['name']}")
print(f"URL: https://github.com{repo['url']}")
print(f"Description: {repo['description']}")
print(f"Language: {repo['language']}")
print(f"Stars: {repo['stars']}")
print(f"Forks: {repo['forks']}")
print(f"Today's Stars: {repo['today_stars']}")
print()
完整的爬取代码如下,保存在metagpt/actions/githubTrending.py中。后面有完整代码,直接在.py中定义,也可以不用保存。
import aiohttp
from bs4 import BeautifulSoup
from metagpt.actions.action import Action
from metagpt.config import CONFIG
class CrawlOSSTrending(Action):
async def run(self, url: str = "https://github.com/trending"):
async with aiohttp.ClientSession() as client:
async with client.get(url, proxy=CONFIG.global_proxy) as response:
response.raise_for_status()
html = await response.text()
soup = BeautifulSoup(html, 'html.parser')
repositories = []
for article in soup.select('article.Box-row'):
repo_info = {}
repo_info['name'] = article.select_one('h2 a').text.strip().replace("\n", "").replace(" ", "")
repo_info['url'] = "https://github.com" + article.select_one('h2 a')['href'].strip()
# Description
description_element = article.select_one('p')
repo_info['description'] = description_element.text.strip() if description_element else None
# Language
language_element = article.select_one('span[itemprop="programmingLanguage"]')
repo_info['language'] = language_element.text.strip() if language_element else None
# Stars and Forks
stars_element = article.select('a.Link--muted')[0]
forks_element = article.select('a.Link--muted')[1]
repo_info['stars'] = stars_element.text.strip()
repo_info['forks'] = forks_element.text.strip()
# Today's Stars
today_stars_element = article.select_one('span.d-inline-block.float-sm-right')
repo_info['today_stars'] = today_stars_element.text.strip() if today_stars_element else None
repositories.append(repo_info)
return repositories
trend分析
主要是提示词的书写,让LLM从以下几个角度分析:
1、今天榜单的整体趋势,例如哪几个编程语言比较热门、最热门的项目是哪些、主要集中在哪些领域
2、榜单的仓库分类
3、推荐进一步关注哪些仓库,推荐原因是什么
同样写入githubTrending.py
from typing import Any
from metagpt.actions.action import Action
TRENDING_ANALYSIS_PROMPT = """# Requirements
You are a GitHub Trending Analyst, aiming to provide users with insightful and personalized recommendations based on the latest
GitHub Trends. Based on the context, fill in the following missing information, generate engaging and informative titles,
ensuring users discover repositories aligned with their interests.
# The title about Today's GitHub Trending
## Today's Trends: Uncover the Hottest GitHub Projects Today! Explore the trending programming languages and discover key domains capturing developers' attention. From ** to **, witness the top projects like never before.
## The Trends Categories: Dive into Today's GitHub Trending Domains! Explore featured projects in domains such as ** and **. Get a quick overview of each project, including programming languages, stars, and more.
## Highlights of the List: Spotlight noteworthy projects on GitHub Trending, including new tools, innovative projects, and rapidly gaining popularity, focusing on delivering distinctive and attention-grabbing content for users.
---
# Format Example
```
# [Title]
## Today's Trends
Today, ** and ** continue to dominate as the most popular programming languages. Key areas of interest include **, ** and **.
The top popular projects are Project1 and Project2.
## The Trends Categories
1. Generative AI
- [Project1](https://github/xx/project1): [detail of the project, such as star total and today, language, ...]
- [Project2](https://github/xx/project2): ...
...
## Highlights of the List
1. [Project1](https://github/xx/project1): [provide specific reasons why this project is recommended].
...
```
---
# Github Trending
{trending}
"""
class AnalysisOSSTrending(Action):
async def run(
self,
trending: Any
):
return await self._aask(TRENDING_ANALYSIS_PROMPT.format(trending=trending))
Trigger实现
通过aiocron 就很方便了
import time
from aiocron import crontab
from typing import Optional
from pytz import BaseTzInfo
from pydantic import BaseModel, Field
from metagpt.schema import Message
class GithubTrendingCronTrigger:
def __init__(
self,
spec: str, # str 是 cron 表达式,用于指定访问 GitHub 趋势页面的时间间隔
tz: Optional[BaseTzInfo] = None, # 是可选的时区信息,用于设置 cron 表达式的时区。
url: str = "https://github.com/trending",
) -> None:
self.crontab = crontab(spec, tz=tz)
self.url = url
def __aiter__(self): #异步迭代器
return self
async def __anext__(self):
await self.crontab.next()
return Message(content=self.url)
想要每天8点更新,GithubTrendingCronTrigger(“0 8 ***”,tz = beijing_tz)
from pytz import timezone
beijing_tz = timezone('Asia/Shanghai') # 获取北京时间的时区
cron_trigger = GithubTrendingCronTrigger("0 8 * * *", tz=beijing_tz)
如果需要榜单更新再推送,可以如何实现?
import asyncio
import requests
from datetime import datetime
from typing import Optional
from dateutil.parser import parse
class GithubTrendingCronTrigger:
def __init__(
self,
spec: str,
tz: Optional[BaseTzInfo] = None,
url: str = "https://github.com/trending",
) -> None:
self.crontab = crontab(spec, tz=tz)
self.url = url
# 新增属性,用于记录最近一次 GitHub 趋势页面更新的时间
self.last_updated = None
def __aiter__(self):
return self
async def __anext__(self):
await self.crontab.next()
try:
# 获取 GitHub 趋势页面的内容
response = requests.get(self.url)
response.raise_for_status()
# 从响应头中解析出 Last-Modified 时间戳
last_modified = parse(response.headers['Last-Modified'])
# 检查是否需要更新 last_updated 属性
if self.last_updated is None or last_modified > self.last_updated:
self.last_updated = last_modified
return Message(content=self.url)
except (requests.exceptions.RequestException, ValueError):
pass
# 如果页面没有更新,返回 None
return None
如果直接使用 crontab 定时不太方便调试,可以考虑使用 asyncio.sleep() 函数来模拟定时器,这样就可以更方便地进行调试.使用 asyncio.sleep(self.interval) 函数来暂停执行 self.interval 秒,模拟定时器的行为。
import asyncio
import requests
from datetime import datetime, timedelta
from typing import Optional
from dateutil.parser import parse
class GithubTrendingCronTrigger:
def __init__(
self,
interval: int = 60, # 更改为间隔时间,单位为秒
url: str = "https://github.com/trending",
) -> None:
self.interval = interval
self.url = url
self.last_updated = None
async def __aiter__(self):
while True:
try:
response = requests.get(self.url)
response.raise_for_status()
last_modified = parse(response.headers['Last-Modified'])
if self.last_updated is None or last_modified > self.last_updated:
self.last_updated = last_modified
yield Message(content=self.url)
except (requests.exceptions.RequestException, ValueError):
pass
await asyncio.sleep(self.interval)
通过第三方公众号提供消息推送功能 eg:WxPusher
开发文档异步客户端的实现。
import os
from typing import Optional
import aiohttp
class WxPusherClient:
def __init__(self, token: Optional[str] = None, base_url: str = "http://wxpusher.zjiecode.com"):
self.base_url = base_url
self.token = token or os.environ["WXPUSHER_TOKEN"]
async def send_message(
self,
content,
summary: Optional[str] = None,
content_type: int = 1,
topic_ids: Optional[list[int]] = None,
uids: Optional[list[int]] = None,
verify: bool = False,
url: Optional[str] = None,
):
payload = {
"appToken": self.token,
"content": content,
"summary": summary,
"contentType": content_type,
"topicIds": topic_ids or [],
"uids": uids or os.environ["WXPUSHER_UIDS"].split(","),
"verifyPay": verify,
"url": url,
}
url = f"{self.base_url}/api/send/message"
return await self._request("POST", url, json=payload)
async def _request(self, method, url, **kwargs):
async with aiohttp.ClientSession() as session:
async with session.request(method, url, **kwargs) as response:
response.raise_for_status()
return await response.json()
最后实现call back
async def wxpusher_callback(msg: Message):
client = WxPusherClient()
await client.send_message(msg.content, content_type=3)
WXPUSHER_TOKEN,获取api_token
WXPUSHER_UIDS,进入可以从应用管理页的”用户管理->用户列表“获取用户的UID,如果要发送给多个用户,可以用逗号将不同用户UID隔开
在项目的根目录下新建.env文件放环境变量,在main.py中添加即可
from dotenv import load_dotenv
# 加载 .env 文件
load_dotenv()
完整代码 main.py
import asyncio
import os
from typing import Any, AsyncGenerator, Awaitable, Callable, Dict, Optional
import aiohttp
import discord
from aiocron import crontab
from bs4 import BeautifulSoup
from pydantic import BaseModel, Field
from pytz import BaseTzInfo
from metagpt.actions.action import Action
from metagpt.config import CONFIG
from metagpt.logs import logger
from metagpt.roles import Role
from metagpt.schema import Message
# fix SubscriptionRunner not fully defined
from metagpt.environment import Environment as _ # noqa: F401
# 订阅模块,可以from metagpt.subscription import SubscriptionRunner导入,这里贴上代码供参考
class SubscriptionRunner(BaseModel):
"""A simple wrapper to manage subscription tasks for different roles using asyncio.
Example:
>>> import asyncio
>>> from metagpt.subscription import SubscriptionRunner
>>> from metagpt.roles import Searcher
>>> from metagpt.schema import Message
>>> async def trigger():
... while True:
... yield Message("the latest news about OpenAI")
... await asyncio.sleep(3600 * 24)
>>> async def callback(msg: Message):
... print(msg.content)
>>> async def main():
... pb = SubscriptionRunner()
... await pb.subscribe(Searcher(), trigger(), callback)
... await pb.run()
>>> asyncio.run(main())
"""
tasks: Dict[Role, asyncio.Task] = Field(default_factory=dict)
class Config:
arbitrary_types_allowed = True
async def subscribe(
self,
role: Role,
trigger: AsyncGenerator[Message, None],
callback: Callable[
[
Message,
],
Awaitable[None],
],
):
"""Subscribes a role to a trigger and sets up a callback to be called with the role's response.
Args:
role: The role to subscribe.
trigger: An asynchronous generator that yields Messages to be processed by the role.
callback: An asynchronous function to be called with the response from the role.
"""
loop = asyncio.get_running_loop()
async def _start_role():
async for msg in trigger:
resp = await role.run(msg)
await callback(resp)
self.tasks[role] = loop.create_task(_start_role(), name=f"Subscription-{role}")
async def unsubscribe(self, role: Role):
"""Unsubscribes a role from its trigger and cancels the associated task.
Args:
role: The role to unsubscribe.
"""
task = self.tasks.pop(role)
task.cancel()
async def run(self, raise_exception: bool = True):
"""Runs all subscribed tasks and handles their completion or exception.
Args:
raise_exception: _description_. Defaults to True.
Raises:
task.exception: _description_
"""
while True:
for role, task in self.tasks.items():
if task.done():
if task.exception():
if raise_exception:
raise task.exception()
logger.opt(exception=task.exception()).error(
f"Task {task.get_name()} run error"
)
else:
logger.warning(
f"Task {task.get_name()} has completed. "
"If this is unexpected behavior, please check the trigger function."
)
self.tasks.pop(role)
break
else:
await asyncio.sleep(1)
# Actions 的实现
TRENDING_ANALYSIS_PROMPT = """# Requirements
You are a GitHub Trending Analyst, aiming to provide users with insightful and personalized recommendations based on the latest
GitHub Trends. Based on the context, fill in the following missing information, generate engaging and informative titles,
ensuring users discover repositories aligned with their interests.
# The title about Today's GitHub Trending
## Today's Trends: Uncover the Hottest GitHub Projects Today! Explore the trending programming languages and discover key domains capturing developers' attention. From ** to **, witness the top projects like never before.
## The Trends Categories: Dive into Today's GitHub Trending Domains! Explore featured projects in domains such as ** and **. Get a quick overview of each project, including programming languages, stars, and more.
## Highlights of the List: Spotlight noteworthy projects on GitHub Trending, including new tools, innovative projects, and rapidly gaining popularity, focusing on delivering distinctive and attention-grabbing content for users.
---
# Format Example
```
# [Title]
## Today's Trends
Today, ** and ** continue to dominate as the most popular programming languages. Key areas of interest include **, ** and **.
The top popular projects are Project1 and Project2.
## The Trends Categories
1. Generative AI
- [Project1](https://github/xx/project1): [detail of the project, such as star total and today, language, ...]
- [Project2](https://github/xx/project2): ...
...
## Highlights of the List
1. [Project1](https://github/xx/project1): [provide specific reasons why this project is recommended].
...
```
---
# Github Trending
{trending}
"""
class CrawlOSSTrending(Action):
async def run(self, url: str = "https://github.com/trending"):
async with aiohttp.ClientSession() as client:
async with client.get(url, proxy=CONFIG.global_proxy) as response:
response.raise_for_status()
html = await response.text()
soup = BeautifulSoup(html, "html.parser")
repositories = []
for article in soup.select("article.Box-row"):
repo_info = {}
repo_info["name"] = (
article.select_one("h2 a")
.text.strip()
.replace("\n", "")
.replace(" ", "")
)
repo_info["url"] = (
"https://github.com" + article.select_one("h2 a")["href"].strip()
)
# Description
description_element = article.select_one("p")
repo_info["description"] = (
description_element.text.strip() if description_element else None
)
# Language
language_element = article.select_one(
'span[itemprop="programmingLanguage"]'
)
repo_info["language"] = (
language_element.text.strip() if language_element else None
)
# Stars and Forks
stars_element = article.select("a.Link--muted")[0]
forks_element = article.select("a.Link--muted")[1]
repo_info["stars"] = stars_element.text.strip()
repo_info["forks"] = forks_element.text.strip()
# Today's Stars
today_stars_element = article.select_one(
"span.d-inline-block.float-sm-right"
)
repo_info["today_stars"] = (
today_stars_element.text.strip() if today_stars_element else None
)
repositories.append(repo_info)
return repositories
class AnalysisOSSTrending(Action):
async def run(self, trending: Any):
return await self._aask(TRENDING_ANALYSIS_PROMPT.format(trending=trending))
# Role实现
class OssWatcher(Role):
def __init__(
self,
name="Codey",
profile="OssWatcher",
goal="Generate an insightful GitHub Trending analysis report.",
constraints="Only analyze based on the provided GitHub Trending data.",
):
super().__init__(name=name, profile=profile, goal=goal, constraints=constraints)
self._init_actions([CrawlOSSTrending, AnalysisOSSTrending])
self._set_react_mode(react_mode="by_order")
async def _act(self) -> Message:
logger.info(f"{self._setting}: ready to {self.rc.todo}")
# By choosing the Action by order under the hood
# todo will be first SimpleWriteCode() then SimpleRunCode()
todo = self.rc.todo
msg = self.get_memories(k=1)[0] # find the most k recent messages
result = await todo.run(msg.content)
msg = Message(content=str(result), role=self.profile, cause_by=type(todo))
self.rc.memory.add(msg)
return msg
# Trigger
class GithubTrendingCronTrigger:
def __init__(
self,
spec: str,
tz: Optional[BaseTzInfo] = None,
url: str = "https://github.com/trending",
) -> None:
self.crontab = crontab(spec, tz=tz)
self.url = url
def __aiter__(self):
return self
async def __anext__(self):
await self.crontab.next()
return Message(content=self.url)
# callback
async def discord_callback(msg: Message):
intents = discord.Intents.default()
intents.message_content = True
client = discord.Client(intents=intents, proxy=CONFIG.global_proxy)
token = os.environ["DISCORD_TOKEN"]
channel_id = int(os.environ["DISCORD_CHANNEL_ID"])
async with client:
await client.login(token)
channel = await client.fetch_channel(channel_id)
lines = []
for i in msg.content.splitlines():
if i.startswith(("# ", "## ", "### ")):
if lines:
await channel.send("\n".join(lines))
lines = []
lines.append(i)
if lines:
await channel.send("\n".join(lines))
class WxPusherClient:
def __init__(
self,
token: Optional[str] = None,
base_url: str = "http://wxpusher.zjiecode.com",
):
self.base_url = base_url
self.token = token or os.environ["WXPUSHER_TOKEN"]
async def send_message(
self,
content,
summary: Optional[str] = None,
content_type: int = 1,
topic_ids: Optional[list[int]] = None,
uids: Optional[list[int]] = None,
verify: bool = False,
url: Optional[str] = None,
):
payload = {
"appToken": self.token,
"content": content,
"summary": summary,
"contentType": content_type,
"topicIds": topic_ids or [],
"uids": uids or os.environ["WXPUSHER_UIDS"].split(","),
"verifyPay": verify,
"url": url,
}
url = f"{self.base_url}/api/send/message"
return await self._request("POST", url, json=payload)
async def _request(self, method, url, **kwargs):
async with aiohttp.ClientSession() as session:
async with session.request(method, url, **kwargs) as response:
response.raise_for_status()
return await response.json()
async def wxpusher_callback(msg: Message):
client = WxPusherClient()
await client.send_message(msg.content, content_type=3)
# 运行入口,
async def main(spec: str = "0 9 * * *", discord: bool = True, wxpusher: bool = True):
callbacks = []
if discord:
callbacks.append(discord_callback)
if wxpusher:
callbacks.append(wxpusher_callback)
if not callbacks:
async def _print(msg: Message):
print(msg.content)
callbacks.append(_print)
async def callback(msg):
await asyncio.gather(*(call(msg) for call in callbacks))
runner = SubscriptionRunner()
await runner.subscribe(OssWatcher(), GithubTrendingCronTrigger(spec), callback)
await runner.run()
if __name__ == "__main__":
import fire
fire.Fire(main)
运行前需要配置回调用的环境变量在.env文件中。