MetaGPT实践——基于订阅智能体实现GitHubTrendingWatcher

MetaGPT已经封装了一些智能体实例,可以直接使用。比如订阅智能体SubscriptionRunner,就是一个”当关注的事件发生时,获取信息并进行处理,然后通过一些如邮件、微信、discord等渠道将处理后的信息进行发送“的智能体。

当然,理解了下面的实现思路,你也可以实现自己的智能体实例封装。

下面是SubscriptionRunner类的源代码:

class SubscriptionRunner(BaseModel):
    """A simple wrapper to manage subscription tasks for different roles using asyncio.

    Example:
        >>> import asyncio
        >>> from metagpt.address import SubscriptionRunner
        >>> from metagpt.roles import Searcher
        >>> from metagpt.schema import Message

        >>> async def trigger():
        ...     while True:
        ...         yield Message(content="the latest news about OpenAI")
        ...         await asyncio.sleep(3600 * 24)

        >>> async def callback(msg: Message):
        ...     print(msg.content)

        >>> async def main():
        ...     pb = SubscriptionRunner()
        ...     await pb.subscribe(Searcher(), trigger(), callback)
        ...     await pb.run()

        >>> asyncio.run(main())
    """

    model_config = ConfigDict(arbitrary_types_allowed=True)

    tasks: dict[Role, asyncio.Task] = Field(default_factory=dict)

    async def subscribe(
        self,
        role: Role,
        trigger: AsyncGenerator[Message, None],
        callback: Callable[
            [
                Message,
            ],
            Awaitable[None],
        ],
    ):
        """Subscribes a role to a trigger and sets up a callback to be called with the role's response.

        Args:
            role: The role to subscribe.
            trigger: An asynchronous generator that yields Messages to be processed by the role.
            callback: An asynchronous function to be called with the response from the role.
        """
        loop = asyncio.get_running_loop()

        async def _start_role():
            async for msg in trigger:
                resp = await role.run(msg)
                await callback(resp)

        self.tasks[role] = loop.create_task(_start_role(), name=f"Subscription-{role}")

    async def unsubscribe(self, role: Role):
        """Unsubscribes a role from its trigger and cancels the associated task.

        Args:
            role: The role to unsubscribe.
        """
        task = self.tasks.pop(role)
        task.cancel()

    async def run(self, raise_exception: bool = True):
        """Runs all subscribed tasks and handles their completion or exception.

        Args:
            raise_exception: _description_. Defaults to True.

        Raises:
            task.exception: _description_
        """
        while True:
            for role, task in self.tasks.items():
                if task.done():
                    if task.exception():
                        if raise_exception:
                            raise task.exception()
                        logger.opt(exception=task.exception()).error(f"Task {task.get_name()} run error")
                    else:
                        logger.warning(
                            f"Task {task.get_name()} has completed. "
                            "If this is unexpected behavior, please check the trigger function."
                        )
                    self.tasks.pop(role)
                    break
            else:
                await asyncio.sleep(1)

从代码中可以看出,要调用SubscriptionRunner,需要三个入参。

    role: Role,
    trigger: AsyncGenerator[Message, None],
    callback: Callable[[Message,], Awaitable[None],

role:一个Role对象,具体的订阅智能体实现。 trigger:订阅智能体运行的触发条件,可以是定时触发或者是某个事件触发。 callback:处理智能体角色运行生成的信息,比如将数据发送到微信或者discord。

这三个入参,是需要根据需要实现的订阅智能体的具体功能,来进行具体定义的。

复刻的目标是开发一个能够每天定时获取GitHub的当天的热门项目的清单,并发送给微信。

先将需求分解到role、trigger和callback。

role: GitHubTrendingWatcher

两个动作:

CrawlGitHubTrending,爬取GitHub的Trending网页的信息

AnalyzeGitHubTrending,提取GitHub的Trending网页中我们感兴趣的信息

callback:wxpusher_callback

将GitHubTrendingAnalyzer运行后的结果发送给微信。可以用第三方公众号提供的消息推送的功能,例如server酱、wxpusher、Pushplus等。代码将选择wxpusher,它的代码是开源的,也有详细的文档,开发文档见:https://wxpusher.zjiecode.com/docs/#/。

trigger:GithubTrendingCronTrigger

GitHub大约是在每天10:00 AM UTC更新,就将定时器设在9:00AM,也即每天上午9:00,会触发SubscriptionRunner运行一次,包括GitHubTrendingAnalyzer和wxpusher_callback

实现CrawlGitHubTrending动作(可以在ChatGPT的帮助下写下面的爬虫代码。实际上也直接将获得的网页源代码丢给LLM来实现网页内容信息的爬取,当然需要写很复杂的prompt并选择靠谱的LLM,而且由于LLM的幻觉,结果还不是100%可控。)

class CrawlGitHubTrending(Action):

    async def run(self, url: str = "<https://github.com/trending>"):
        async with aiohttp.ClientSession() as client:
            # async with client.get(url, proxy=CONFIG.global_proxy) as response:
            async with client.get(url) as response:
                response.raise_for_status()
                html = await response.text()

        soup = BeautifulSoup(html, 'html.parser')

        repositories = []

        for article in soup.select('article.Box-row'):
            repo_info = {}

            repo_info['name'] = article.select_one('h2 a').text.strip().replace("\\n", "").replace(" ", "")
            repo_info['url'] = "<https://github.com>" + article.select_one('h2 a')['href'].strip()

            # Description
            description_element = article.select_one('p')
            repo_info['description'] = description_element.text.strip() if description_element else None

            # Language
            language_element = article.select_one('span[itemprop="programmingLanguage"]')
            repo_info['language'] = language_element.text.strip() if language_element else None

            # Stars and Forks
            stars_element = article.select('a.Link--muted')[0]
            forks_element = article.select('a.Link--muted')[1]
            repo_info['stars'] = stars_element.text.strip()
            repo_info['forks'] = forks_element.text.strip()

            # Today's Stars
            today_stars_element = article.select_one('span.d-inline-block.float-sm-right')
            repo_info['today_stars'] = today_stars_element.text.strip() if today_stars_element else None

            repositories.append(repo_info)

        return repositories

实现AnalyzeGitHubTrending动作(这部分属于内容生成的功能,只能用LLM来实现)—MetaGPT没有实现prompt和代码分离,严重影响程序的可读性!

TRENDING_ANALYSIS_PROMPT = """# Requirements
You are a GitHub Trending Analyst, aiming to provide users with insightful and personalized recommendations based on the latest
GitHub Trends. Based on the context, fill in the following missing information, generate engaging and informative titles, 
ensuring users discover repositories aligned with their interests.

# The title about Today's GitHub Trending
## Today's Trends: Uncover the Hottest GitHub Projects Today! Explore the trending programming languages and discover key domains capturing developers' attention. From ** to **, witness the top projects like never before.
## The Trends Categories: Dive into Today's GitHub Trending Domains! Explore featured projects in domains such as ** and **. Get a quick overview of each project, including programming languages, stars, and more.
## Highlights of the List: Spotlight noteworthy projects on GitHub Trending, including new tools, innovative projects, and rapidly gaining popularity, focusing on delivering distinctive and attention-grabbing content for users.
---
# Format Example

```
# [Title]

## Today's Trends
Today, ** and ** continue to dominate as the most popular programming languages. Key areas of interest include **, ** and **.
The top popular projects are Project1 and Project2.

## The Trends Categories
1. Generative AI
    - [Project1](https://github/xx/project1): [detail of the project, such as star total and today, language, ...]
    - [Project2](https://github/xx/project2): ...
...

## Highlights of the List
1. [Project1](https://github/xx/project1): [provide specific reasons why this project is recommended].
...
```

---
# Github Trending
{trending}
"""

class AnalyzeGitHubTrending(Action):
    async def run(
        self,
        trending: Any
    ):
        return await self._aask(TRENDING_ANALYSIS_PROMPT.format(trending=trending))

实现GitHubTrendingWatcher Role:

class GitHubTrendingWatcher(Role):
    name: str ="Codey"
    profile: str ="GitHubTrendingWatcher"
    goal: str ="Generate an insightful GitHub Trending analysis report."
    constraints: str ="Only analyze based on the provided GitHub Trending data."
    def __init__(
        self,
        **kwargs,
    ):
        super().__init__(**kwargs)
        self.set_actions([CrawlGitHubTrending, AnalyzeGitHubTrending])
        self._set_react_mode(react_mode="by_order")

    async def _act(self) -> Message:
        logger.info(f"{self._setting}: ready to {self.rc.todo}")
        # By choosing the Action by order under the hood
        # todo will be first CrawlGitHubTrending() then AnalyzeGitHubTrending()
        todo = self.rc.todo

        msg = self.get_memories(k=1)[0] # find the most k recent messages
        result = await todo.run(msg.content)

        msg = Message(content=str(result), role=self.profile, cause_by=type(todo))
        self.rc.memory.add(msg)
        return msg

实现wxpusher_callback。

首先要基于wxpusher实现一个异步的客户端。

wxpusher虽然有python客户端,但是是同步的,但我们根据API文档,可以快速简单地实现一个异步的客户端。

class WxPusherClient:
    def __init__(self, token: Optional[str] = None, base_url: str = "<http://wxpusher.zjiecode.com>"):
        self.base_url = base_url
        self.token = token or os.environ["WXPUSHER_TOKEN"]

    async def send_message(
        self,
        content,
        summary: Optional[str] = None,
        content_type: int = 1,
        topic_ids: Optional[list[int]] = None,
        uids: Optional[list[int]] = None,
        verify: bool = False,
        url: Optional[str] = None,
    ):
        payload = {
            "appToken": self.token,
            "content": content,
            "summary": summary,
            "contentType": content_type,
            "topicIds": topic_ids or [],
            "uids": uids or os.environ["WXPUSHER_UIDS"].split(","),
            "verifyPay": verify,
            "url": url,
        }
        url = f"{self.base_url}/api/send/message"
        return await self._request("POST", url, json=payload)

    async def _request(self, method, url, **kwargs):
        async with aiohttp.ClientSession() as session:
            async with session.request(method, url, **kwargs) as response:
                response.raise_for_status()
                return await response.json()

然后实现callback:

async def wxpusher_callback(msg: Message):
    client = WxPusherClient()
    await client.send_message(msg.content, content_type=3)

然后要获取推送消息的参数WXPUSHER_TOKENWXPUSHER_UIDS

参考官方文档获取appToken——WXPUSHER_TOKEN。

WXPUSHER_UIDS可以从应用管理页的”用户管理->用户列表“获取用户的UID,如果要发送给多个用户,可以用逗号将不同用户UID隔开。

实现GithubTrendingCronTrigger

使用crontab实现定时触发是非常常见的一个做法,而且python也有一个异步的cron工具,即aiocron,使用aiocron我们可以直接使用cron的语法制定定时任务。

class OssInfo(BaseModel):
    url: str
    timestamp: float = Field(default_factory=time.time)

class GithubTrendingCronTrigger():

    def __init__(self, spec: str, tz: Optional[BaseTzInfo] = None, url: str = "<https://github.com/trending>") -> None:
        self.crontab = crontab(spec, tz=tz)
        self.url = url

    def __aiter__(self):
        return self

    async def __anext__(self):
        await self.crontab.next()
        return Message(content=self.url, instruct_content=OssInfo(url=self.url))

可以用cron语法非常灵活地配置定时规则。

下面是一些调用这个类GithubTrendingCronTrigger的例子:

# 创建 GithubTrendingCronTrigger 实例,每天 UTC 时间 10:00 AM 触发
cron_trigger = GithubTrendingCronTrigger("0 10 * * *")

# 创建 GithubTrendingCronTrigger 实例,每10分钟触发一次
cron_trigger = GithubTrendingCronTrigger("*/10 * * * *")

from pytz import timezone
# 指定时区为 Asia/Shanghai
shanghai_tz = timezone('Asia/Shanghai')
# 创建 GithubTrendingCronTrigger 实例,每天北京时间 10:00 AM 触发
cron_trigger = GithubTrendingCronTrigger("0 10 * * *", tz=shanghai_tz)

# 创建 GithubTrendingCronTrigger 实例,使用自定义 URL
custom_url = "<https://github.com/trending/python>"
cron_trigger = GithubTrendingCronTrigger("0 10 * * *", url=custom_url)

cron表达式的详细说明:

Cron表达式是一种强大的用于配置定时任务(计划任务)的时间表达式。它由5或6个字段组成,通常用于Unix/Linux系统中的cron守护进程来安排定时任务的执行。每个字段代表一个时间单位,字段之间用空格分隔。以下是cron表达式的字段,从左到右依次为:

1. **分钟(0 - 59)**:指定分钟,可以是具体数字,也可以是特殊字符(如`*`代表每分钟)。
2. **小时(0 - 23)**:指定小时,同样可以是具体数字或特殊字符。
3. **日期(1 - 31)**:指定日期,可以是具体数字或特殊字符。
4. **月份(1 - 12 或 JAN-DEC)**:指定月份,可以是数字或月份的缩写。
5. **星期几(0 - 7 或 SUN-SAT)**:指定星期几,0和7都代表星期日,可以是数字或星期的缩写。
6. **年份(可选)**:有些cron表达式包含年份字段,但大多数情况下这个字段是可选的。

### 特殊字符
- `*`:代表所有可能的值。例如,在分钟字段中,`*`表示每小时的每分钟都会触发。
- `?`:仅在日期和星期几字段中使用,表示不指定值。当其中一个字段被指定时,另一个字段可以用`?`表示不关心。
- `-`:表示一个范围。例如,在小时字段中,`10-12`表示从10点到12点每分钟都会触发。
- `/`:表示起始时间开始,每隔一定的间隔触发。例如,在分钟字段中,`0/30`表示每30分钟触发一次,即在0分、30分、60分等时间点触发。

### 示例
以下是一些cron表达式的例子及其含义:

- `0 * * * *`:每小时的整点触发一次(如 00:00, 01:00, 02:00 ...)。
- `0 9,18 * * *`:每天的上午9点和晚上6点触发。
- `0 0/30 * * * *`:每30分钟触发一次,即每小时的0分和30分。
- `0 9-17 * * MON-FRI`:在工作日(周一至周五)的上午9点到下午5点之间,每小时的整点触发。
- `0 8,13,18 * * 1-5`:在周一至周五的上午8点、下午1点和晚上6点触发。
- `0 0 8,14 * * MON-FRI`:在工作日的上午8点和下午2点触发。
- `0 0 9 ? * 1-5`:等同于`0 0/9 9-17 * * MON-FRI`,表示在周一至周五的上午9点触发。

请注意,cron表达式的解析可能会因不同的cron工具或库而有所不同,特别是在处理边界条件和时区方面。在实际使用中,建议参考具体工具或库的文档来确保cron表达式的正确性。

写调用GitHubTrendingAnalyzer的代码(注意main函数需要用异步模式,才能调用异步函数)

**# 运行入口,注意为了调试,这里的spec的cron表达式为"*/5 * * * *",表示任务每5分钟执行一次。**
async def main(spec: str = "*/5 * * * *", wxpusher: bool = True):
    callbacks = []

    if wxpusher:
        callbacks.append(wxpusher_callback)

    if not callbacks:
        async def _print(msg: Message):
            print(msg.content)
        callbacks.append(_print)

    async def callback(msg):
        await asyncio.gather(*(call(msg) for call in callbacks))

    runner = SubscriptionRunner()
    await runner.subscribe(GitHubTrendingWatcher(), GithubTrendingCronTrigger(spec), callback)
    await runner.run()

if __name__ == "__main__":
    import fire
    fire.Fire(main)

在运行上面的程序前,需要先设置环境变量WXPUSHER_TOKENWXPUSHER_UIDS。

Windows环境下,在VS Code中的PowerShell中输入下面的命令,设置环境变量(临时,仅当前会话有效)。

$env:WXPUSHER_TOKEN=”AT_xxxxxxxx”

$env:WXPUSHER_UIDS=”UID_xxxxxxxx”

运行程序,可以得到下面的运行log:

2024-05-19 16:37:37.090 | INFO     | metagpt.const:get_metagpt_package_root:29 - Package root set to D:\\0GPT\\playground
2024-05-19 16:39:59.996 | INFO     | __main__:_act:122 - Codey(GitHubTrendingWatcher): ready to CrawlGitHubTrending
2024-05-19 16:40:01.820 | INFO     | __main__:_act:122 - Codey(GitHubTrendingWatcher): ready to AnalyzeGitHubTrending
# Today's GitHub Trending Report

## Today's Trends
Today, Rust and TypeScript continue to be the most popular programming languages on GitHub, dominating the trending list. Key areas of interest include AI chatbots, no-code platforms, and parallel programming languages. The top trending projects are HigherOrderCO/HVM and lencx/ChatGPT.

## The Trends Categories
1. AI and Machine Learning
    - [HigherOrderCO/HVM](<https://github.com/HigherOrderCO/HVM>): A massively parallel, optimal functional runtime in Rust, with 651 stars today.
    - [lencx/ChatGPT](<https://github.com/lencx/ChatGPT>): ChatGPT Desktop Application, written in Rust, has gained 830 stars today.
    - [likejazz/llama3.np](<https://github.com/likejazz/llama3.np>): A pure NumPy implementation for Llama 3 model, with 158 stars today.
    - [zhayujie/ChatGPT-on-wechat](<https://github.com/zhayujie/ChatGPT-on-wechat>): A chatbot based on large models, with 93 stars today.
 - [Azure-Samples/chat-with-your-data-solution-accelerator](<https://github.com/Azure-Samples/chat-with-your-data-solution-accelerator>): An Azure AI Search and large language model-powered chat solution, with 4 stars today.

2. No-Code and Low-Code Platforms
    - [nocobase/nocobase](<https://github.com/nocobase/nocobase>): A scalability-first, open-source no-code/low-code platform, with 270 stars today.
    - [mendableai/firecrawl](<https://github.com/mendableai/firecrawl>): A tool to turn websites into LLM-ready markdown, with 111 stars today.

3. Parallel and High-Level Programming Languages
    - [HigherOrderCO/Bend](<https://github.com/HigherOrderCO/Bend>): A massively parallel, high-level programming language, with a staggering 4,583 stars today.

4. Developer Tools and Frameworks
    - [expo/react-conf-app](<https://github.com/expo/react-conf-app>): A React conference application, with 83 stars today.
    - [LazyVim/LazyVim](<https://github.com/LazyVim/LazyVim>): Neovim configuration for the lazy, with 30 stars today.
    - [tjdevries/config.nvim](<https://github.com/tjdevries/config.nvim>): An nvim configuration, with 19 stars today.

## Highlights of the List
1. **HigherOrderCO/HVM**: This project stands out for its innovative approach to parallel functional programming and has seen a significant increase in stars today.        
2. **lencx/ChatGPT**: As an AI chatbot application, it showcases the growing interest in AI-driven communication tools and has also seen a notable rise in stars.
3. **HigherOrderCO/Bend**: The Bend programming language is gaining attention for its parallel programming capabilities, with an impressive number of stars today.
4. **zhayujie/ChatGPT-on-wechat**: This project integrates large model-based chatbots with various messaging platforms and is worth noting for its application in social media and customer service.
5. **nocobase/nocobase**: This no-code/low-code platform is trending for enabling users to build business applications with ease, reflecting the industry's shift towards simplifying development processes.

Explore these projects to stay updated with the latest trends and technologies in the GitHub community.
2024-05-19 16:40:31.138 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.049 | Max budget: $10.000 | Current cost: $0.049, prompt_tokens: 2752, completion_tokens: 763

同时,在微信端,扫码关注的用户可以收到推送的消息。

  • 8
    点赞
  • 18
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值