Datawhale AI夏令营MobileAgent学习活动笔记

最新推荐文章于 2024-10-07 16:07:33 发布

qq_39826554

最新推荐文章于 2024-10-07 16:07:33 发布

阅读量359

点赞数 5

文章标签：学习笔记

本文链接：https://blog.csdn.net/qq_39826554/article/details/141831783

版权

本次活动是打算跑通mobile agent 框架，并根据这个来做出自己的agent应用。我在跑通过程遇到了一些问题。

就比如报错：找不到文件".//screenshot//screenshot.jpg"，这是拍照这一步出现了问题，我定位到拍照的代码

def get_perception_infos(adb_path, screenshot_file):
    get_screenshot(adb_path)
    
    width, height = Image.open(screenshot_file).size
    
    text, coordinates = ocr(screenshot_file, ocr_detection, ocr_recognition)
    text, coordinates = merge_text_blocks(text, coordinates)
    
    center_list = [[(coordinate[0]+coordinate[2])/2, (coordinate[1]+coordinate[3])/2] for coordinate in coordinates]
    draw_coordinates_on_image(screenshot_file, center_list)
    
    perception_infos = []
    for i in range(len(coordinates)):
        perception_info = {"text": "text: " + text[i], "coordinates": coordinates[i]}
        perception_infos.append(perception_info)
    
    coordinates = det(screenshot_file, "icon", groundingdino_model)
    
    for i in range(len(coordinates)):
        perception_info = {"text": "icon", "coordinates": coordinates[i]}
        perception_infos.append(perception_info)
    
    image_box = []
    image_id = []
    for i in range(len(perception_infos)):
        if perception_infos[i]['text'] == 'icon':
            image_box.append(perception_infos[i]['coordinates'])
            image_id.append(i)
    
    for i in range(len(image_box)):
        crop(screenshot_file, image_box[i], image_id[i])
    images = get_all_files_in_folder(temp_file)
    if len(images) > 0:
        images = sorted(images, key=lambda x: int(x.split('/')[-1].split('.')[0]))
        image_id = [int(image.split('/')[-1].split('.')[0]) for image in images]
        icon_map = {}
        prompt = 'This image is an icon from a phone screen. Please briefly describe the shape and color of this icon in one sentence.'
        if caption_call_method == "local":
            for i in range(len(images)):
                image_path = os.path.join(temp_file, images[i])
                icon_width, icon_height = Image.open(image_path).size
                if icon_height > 0.8 * height or icon_width * icon_height > 0.2 * width * height:
                    des = "None"
                else:
                    des = generate_local(tokenizer, model, image_path, prompt)
                icon_map[i+1] = des
        else:
            for i in range(len(images)):
                images[i] = os.path.join(temp_file, images[i])
            icon_map = generate_api(images, prompt)
        for i, j in zip(image_id, range(1, len(image_id)+1)):
            if icon_map.get(j):
                perception_infos[i]['text'] = "icon: " + icon_map[j]
    for i in range(len(perception_infos)):
        perception_infos[i]['coordinates'] = [int((perception_infos[i]['coordinates'][0]+perception_infos[i]['coordinates'][2])/2), int((perception_infos[i]['coordinates'][1]+perception_infos[i]['coordinates'][3])/2)]
        
    return perception_infos, width, height

先排查输入。是否是(adb_path, screenshot_file)出现了问题，于是定位到代码行：

adb_path = "D:/leidian/LDPlayer9/adb.exe.exe"

就发现原来是自己输错了，多敲了.exe，汗。

虽然是一个非常初学者的错误，但是把排查过程记录下来还是很有意义的。希望大家遇到报错不必慌，一点一点捋顺逻辑就好了。

qq_39826554

关注

5
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫