本次活动是打算跑通mobile agent 框架,并根据这个来做出自己的agent应用。我在跑通过程遇到了一些问题。
就比如报错:找不到文件".//screenshot//screenshot.jpg",这是拍照这一步出现了问题,我定位到拍照的代码
def get_perception_infos(adb_path, screenshot_file): get_screenshot(adb_path) width, height = Image.open(screenshot_file).size text, coordinates = ocr(screenshot_file, ocr_detection, ocr_recognition) text, coordinates = merge_text_blocks(text, coordinates) center_list = [[(coordinate[0]+coordinate[2])/2, (coordinate[1]+coordinate[3])/2] for coordinate in coordinates] draw_coordinates_on_image(screenshot_file, center_list) perception_infos = [] for i in range(len(coordinates)): perception_info = {"text": "text: " + text[i], "coordinates": coordinates[i]} perception_infos.append(perception_info) coordinates = det(screenshot_file, "icon", groundingdino_model) for i in range(len(coordinates)): perception_info = {"text": "icon", "coordinates": coordinates[i]} perception_infos.append(perception_info) image_box = [] image_id = [] for i in range(len(perception_infos)): if perception_infos[i]['text'] == 'icon': image_box.append(perception_infos[i]['coordinates']) image_id.append(i) for i in range(len(image_box)): crop(screenshot_file, image_box[i], image_id[i]) images = get_all_files_in_folder(temp_file) if len(images) > 0: images = sorted(images, key=lambda x: int(x.split('/')[-1].split('.')[0])) image_id = [int(image.split('/')[-1].split('.')[0]) for image in images] icon_map = {} prompt = 'This image is an icon from a phone screen. Please briefly describe the shape and color of this icon in one sentence.' if caption_call_method == "local": for i in range(len(images)): image_path = os.path.join(temp_file, images[i]) icon_width, icon_height = Image.open(image_path).size if icon_height > 0.8 * height or icon_width * icon_height > 0.2 * width * height: des = "None" else: des = generate_local(tokenizer, model, image_path, prompt) icon_map[i+1] = des else: for i in range(len(images)): images[i] = os.path.join(temp_file, images[i]) icon_map = generate_api(images, prompt) for i, j in zip(image_id, range(1, len(image_id)+1)): if icon_map.get(j): perception_infos[i]['text'] = "icon: " + icon_map[j] for i in range(len(perception_infos)): perception_infos[i]['coordinates'] = [int((perception_infos[i]['coordinates'][0]+perception_infos[i]['coordinates'][2])/2), int((perception_infos[i]['coordinates'][1]+perception_infos[i]['coordinates'][3])/2)] return perception_infos, width, height
先排查输入。是否是(adb_path, screenshot_file)出现了问题,于是定位到代码行:
adb_path = "D:/leidian/LDPlayer9/adb.exe.exe"
就发现原来是自己输错了,多敲了.exe,汗。
虽然是一个非常初学者的错误,但是把排查过程记录下来还是很有意义的。希望大家遇到报错不必慌,一点一点捋顺逻辑就好了。