AI大模型开发利器：揭秘简化流程，提升效率的方法

最新推荐文章于 2024-09-27 10:21:15 发布

AI程序猿人

最新推荐文章于 2024-09-27 10:21:15 发布

阅读量1.1k

点赞数 25

文章标签：人工智能 ai 语言模型

本文链接：https://blog.csdn.net/python1222_/article/details/138675454

版权

史蒂夫・乔布斯曾经把计算机称作 “心灵之自行车”。不过，人们对他这个比喻的背景知之甚少，他是在谈及地球上所有物种移动效率的时候提到的。

由 DALL·E 3 生成的图片，提示 “将计算机想象成心灵的自行车”

秃鹫赢了，位居榜首，超过了其他所有物种。人类排在榜单大约三分之一的位置…… 但是，一旦人类骑上自行车，就能远远超越秃鹫，登顶榜首。这让我深受启发，人类是工具制造者，我们可以制造出将这些固有能力放大到惊人程度的工具。对我来说，计算机一直是思维的自行车，它让我们远远超越了固有的能力。我认为我们只是处于这个工具的早期阶段，非常早期的阶段。我们只走了很短的一段距离，它仍处于形成阶段，但我们已经看到了巨大的变化。我认为，与未来 100 年发生的事情相比，这算不了什么。

—— 史蒂夫・乔布斯（1990）

#01

谨慎乐观

LLM 在加速软件开发方面的作用引发了广泛讨论。有人认为，自动生成的代码质量过低，以至于使用这些代码产生的是负面效果。而另一方面，许多人声称编程的时代已经结束。已经有众多研究试图客观评估 LLM 在诸如 HumanEval 或 MBPP 这样的代码质量基准数据集上的表现。这些评估对于该领域的发展至关重要，但这并非本文的焦点。

本文旨在为开发者，尤其是那些对利用这些模型持保留态度的开发者，提供实用的建议。要注意的是，我相信通过恰当地使用这项技术，一个人的工作效率确实能够实现质的飞跃。

需要明确的是，使用 LLM 并不意味着一定能产出高性能的代码。即便是最先进的模型也会犯下严重的错误。但重要的是要记住，最优秀的人类工程师也难免犯错。这就是我们为什么不会在没有设置防护措施的情况下发布代码，比如进行自动化测试和同行评审。LLM 没有改变最佳实践的本质，但可能改变了实现这些最佳实践的容易度。

描述现在交付代码的过程（图片由 DALL·E 3 生成）

编写高性能代码的速度真的那么重要吗？

如果大多数专业软件工程师的大部分时间都花在了键盘敲击上，我会觉得非常意外。如果真是这样，我们或许会更认真地把 “代码行数” 当作衡量工作表现的一个标准。**我认为，工程师实际上更多的时间是用来决定首先需要编写哪些代码。**目前，LLM 还没有能力进行这种高级的抽象思维 ——OpenAI 和许多其他机构正在努力使这些步骤成为可能，但这很可能需要的不仅仅是增加模型的参数或扩大训练数据，而是需要一种全新的思维方式。然而，LLM 已经能够在很大程度上加速我们的工作中的很大一部分了。

#02

编写高效的函数

让我举个例子来说明，当 LLM 生成了一小段代码后，我们需要采取哪些步骤来优化它，使之变得实用。在地球上的某个点和另一组坐标之间测量距离是地理信息分析中的一个常见任务。想象一下这样的情境。

在这里插入图片描述

首先看到的代码输出是这样的：

import math

def haversine(lat1, lon1, lat2, lon2):
    # Radius of the Earth in kilometers
    R = 6371.0

    # Convert degrees to radians
    lat1_rad = math.radians(lat1)
    lon1_rad = math.radians(lon1)
    lat2_rad = math.radians(lat2)
    lon2_rad = math.radians(lon2)

    # Differences in coordinates
    dlat = lat2_rad - lat1_rad
    dlon = lon2_rad - lon1_rad

    # Haversine formula
    a = math.sin(dlat / 2)**2 + math.cos(lat1_rad) * math.cos(lat2_rad) * math.sin(dlon / 2)**2
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))

    # Distance in kilometers
    distance = R * c

    return distance

def find_closest_oil_platform(vessel_lat, vessel_lon, platforms):
    closest_platform = None
    min_distance = float('inf')
    
    for platform in platforms:
        distance = haversine(vessel_lat, vessel_lon, platform['latitude'], platform['longitude'])
        if distance < min_distance:
            min_distance = distance
            closest_platform = platform

    return closest_platform, min_distance

那么，这段代码的质量怎么样呢？

这段代码的质量并不理想。它的问题不少，最明显的是它难以阅读，因为它使用了难以理解和不一致的变量命名，包括大小写的单字符命名。此外，它缺少了文档说明和类型提示，使用了一个不必要且低效的 for 循环，也没有处理一些特殊情况，比如相等距离的问题。

但经过修改之后：

在这里插入图片描述

from typing import Tuple, List
import numpy as np

def haversine_vectorized(vessel_lat: float, vessel_lon: float, platform_lats: np.ndarray, platform_lons: np.ndarray) -> np.ndarray:
    """
    Calculate the Haversine distance between a vessel and multiple platforms in a vectorized manner.

    Parameters:
    - vessel_lat: Latitude of the vessel in decimal degrees.
    - vessel_lon: Longitude of the vessel in decimal degrees.
    - platform_lats: Numpy array of latitudes of oil platforms in decimal degrees.
    - platform_lons: Numpy array of longitudes of oil platforms in decimal degrees.

    Returns:
    - distances: Numpy array of distances from the vessel to each platform in kilometers.
    """
    # Convert decimal degrees to radians
    lat1, lon1, lat2, lon2 = map(np.radians, [vessel_lat, vessel_lon, platform_lats, platform_lons])

    # Haversine formula
    dlat = lat2 - lat1
    dlon = lon2 - lon1
    a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
    c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a))
    r = 6371  # Radius of Earth in kilometers
    return c * r

def find_closest_platform(vessel_lat: float, vessel_lon: float, platform_lats: np.ndarray, platform_lons: np.ndarray) -> Tuple[List[int], float]:
    """
    Finds the closest oil platform(s) to a vessel given arrays of platform latitudes and longitudes, handling equidistant platforms.

    Parameters:
    - vessel_lat: Latitude of the vessel in decimal degrees.
    - vessel_lon: Longitude of the vessel in decimal degrees.
    - platform_lats: Numpy array of latitudes for oil platforms.
    - platform_lons: Numpy array of longitudes for oil platforms.

    Returns:
    - A tuple containing a list of indices of the closest platforms and the distance to them in kilometers.
    """
    # Calculate distances to all platforms
    distances = haversine_vectorized(vessel_lat, vessel_lon, platform_lats, platform_lons)
    
    # Find the minimum distance
    min_distance = np.min(distances)
    
    # Find all indices with the minimum distance
    closest_indices = np.where(distances == min_distance)[0].tolist()

    # Return the indices of all closest platforms and the minimum distance
    return closest_indices, min_distance

改进后的代码有了明显提升。它变得更容易阅读了，增加了文档说明和类型提示，并且用更高效的向量计算方式替换了原有的 for 循环。

但是，代码的 “好坏”，更重要的是，它是否满足需求这些都取决于代码将要运行的具体环境。要知道，我们无法仅凭几行代码就能有效评估其质量，这一点对人类如此，对 LLM 也是如此。

比如说，这段代码的准确度是否满足用户的预期？它会被频繁运行吗？是一年一次，还是每微秒一次？使用的硬件条件如何？预期的使用量和规模是否值得我们去追求那些细小的优化？在考虑到你的薪资之后，这样做是否划算？

让我们在上述因素的基础上来评估这段代码。

在准确性方面，虽然半正矢公式（haversine formula）表现不错，但并非最佳选择，因为它将地球视为一个完美的球体，而实际上地球更接近于一个扁球体。在需要跨越巨大距离进行毫米级精确测量时，这种差异变得非常重要。如果真的需要这样的精确度，虽然有更精确的公式（如 Vincenty 公式）可用，但这会带来性能上的折中。因为对于这段代码的用户而言，毫米级的精确度并不是必须的（事实上，由于卫星图像导出的船舶坐标的误差，这种精度也并不相关），所以在准确性方面，半正弦函数是一个合理的选择。

代码运行得够快吗？考虑到只需要对几千个海上石油平台计算距离，特别是通过向量计算方法，这种计算是非常高效的。但如果应用场景变成了计算与岸边任意点的距离（岸线上有数以亿计的点），那么采用 “分而治之” 的策略可能会更加合适。在实际应用中，考虑到节约计算成本的需要，这个函数设计为在一个尽可能配置低的虚拟机上每天运行约 1 亿次。

基于这些详细的背景信息，我们可以认为上面的代码实现是合理的。这也意味着，在代码最终合并前，它应该先经过测试（我通常不推荐仅依赖 LLM 进行测试）和人工同行评审。

#03

加速前进

像之前那样利用 LLM 自动生成实用的函数不仅可以节省时间，而且当你开始利用它们来生成整套的库、处理模块间的依赖、撰写文档、实现可视化（通过多模态能力）、编写 README 文件、开发命令行接口等时，它们带来的价值将会成倍增长。

我们来试着从零开始，借助 LLM 的广泛辅助，创建、训练、评估并推断一个全新的计算机视觉模型。以一篇最近发表的论文为例，“通过深度学习识别 Sentinel-2 图像中船舶尾迹组件的关键点方法”（Del Prete 等人，IEEE GRSL，2023），这篇论文就是我们前进的动力和灵感来源。

在这里插入图片描述

Sentinel-2 卫星图像中显示的一艘船及其尾流。

为什么我们需要关心船舶在卫星图像中的行进方向，这项任务有什么难点呢？

通过静态图像识别船只的航行方向，对于那些需要监控水域中人类活动的组织来说，是极其宝贵的信息。比如，如果一艘船正朝向一个海洋保护区行进，这可能意味着需要警觉或者采取拦截措施。通常，全球范围内公开的卫星图像的分辨率不足以精确判断一艘船的朝向，尤其是那些在图像上只占据几个像素的小型船只（例如，Sentinel-2 的图像分辨率为 10 米 / 像素）。然而，即便是小型船只留下的水波纹也可能相当明显，这就为我们提供了一个判断船只朝向和行进方向的线索，即使船的尾部无法直接识别。

这项研究之所以引人注目，是因为它采用的模型基于 EfficientNetB0，这是一个足够小的模型，能够在不花费太多计算资源的情况下进行大规模应用。虽然我没有找到具体的代码实现，但作者公开了包括标注在内的数据集，这是值得赞赏的一步。

开始我们的探索吧！

如同启动任何新的机器学习项目一样，首先对数据进行可视化是极富启发性的一步。
在这里插入图片描述

import os
import json
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt
import seaborn as sns

# Define the path to your data directory
data_dir = "/path/to/your/data"  # Adjust this to the path of your data directory
annotations_dir = os.path.join(data_dir, "annotations")
images_dir = os.path.join(data_dir, "imgs")

# Initialize Seaborn for better visual aesthetics
sns.set(style="whitegrid", palette="muted")

# Create a list to hold file paths for images and their corresponding annotations
image_files = []
annotation_files = []

# Loop through the annotations directory to get the list of annotation files
for annotation_file in os.listdir(annotations_dir):
    if annotation_file.endswith(".json"):
        annotation_path = os.path.join(annotations_dir, annotation_file)
        image_file = annotation_file.replace(".json", ".png")  # Assuming image file names match annotation file names
        image_path = os.path.join(images_dir, image_file)

        # Check if the corresponding image file exists
        if os.path.exists(image_path):
            annotation_files.append(annotation_path)
            image_files.append(image_path)

# Plotting
num_examples = min(len(image_files), 10)  # Limiting to 10 examples for visualization
fig, axes = plt.subplots(2, 5, figsize=(20, 8))

for idx, (image_path, annotation_path) in enumerate(zip(image_files[:num_examples], annotation_files[:num_examples])):
    # Load the image
    img = Image.open(image_path).convert("RGB")  # Ensure the image is treated as RGB
    draw = ImageDraw.Draw(img)

    # Load the corresponding annotations and draw keypoints
    with open(annotation_path, 'r') as f:
        annotations = json.load(f)
        for point in annotations["tooltips"]:
            x, y = point["x"], point["y"]
            # Draw keypoints in cyan for visibility
            draw.ellipse([(x-10, y-10), (x+10, y+10)], fill='cyan', outline='black')

    # Plot the image with keypoints
    ax = axes[idx // 5, idx % 5]
    ax.imshow(img)
    ax.axis('off')
    ax.set_title(f"Image {idx+1}")

plt.tight_layout()
plt.show()

结果如下：

在这里插入图片描述
Sentinel-2 卫星图像中的船只和尾流关键点注释（青色）样本。

结果展示了几个关键的发现：首先，不同图片中标注的关键点（青色点）数量不一；其次，这些图像都是灰度的，并且形状各异；第三，尽管样本量不大，但其内部差异性却相当大。在明确了这些基础之后，我们接下来将编写用于加载数据集、定义神经网络、进行训练和验证的代码。在请求代码之前，我们会先向 LLM 提供一些背景信息。

在这里插入图片描述

这导致了以下输出：

import os
import json
import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
from torchvision.io import read_image
from typing import Tuple, List

class WakeDataset(Dataset):
    """
    Custom Dataset for loading wake detection imagery and annotations, with preprocessing for EfficientNet.
    Images are grayscale and will be converted to three-channel images for model compatibility.
    """
    
    def __init__(self, annotations_dir: str, images_dir: str, transform=None):
        """
        Initializes the dataset with paths to annotations and images directories, and a transform.

        Parameters:
        - annotations_dir: Path to the directory containing annotation JSON files.
        - images_dir: Path to the directory containing grayscale image files.
        - transform: A torchvision.transforms transform to be applied on a sample.
        """
        self.annotations_dir = annotations_dir
        self.images_dir = images_dir
        self.transform = transform if transform else self.default_transform()
        self.annotations = self._load_annotations()

    def default_transform(self):
        """
        Defines the default transform to resize images, convert them to three-channel format, and to tensors.

        Returns:
        - A torchvision.transforms composition for image preprocessing.
        """
        return transforms.Compose([
            transforms.Resize((224, 224)),  # Resize to the input size for EfficientNet
            transforms.Grayscale(num_output_channels=3),  # Convert grayscale to three-channel
            transforms.ConvertImageDtype(torch.float),  # Ensures image is a float tensor
        ])

    def _load_annotations(self) -> List[Tuple[str, List[int]]]:
        """
        Loads and filters annotations with at least 2 keypoints.

        Returns:
        - A list of tuples, where each tuple contains the image filename and its keypoints.
        """
        annotations = []
        for filename in os.listdir(self.annotations_dir):
            if not filename.endswith('.json'):
                continue
            filepath = os.path.join(self.annotations_dir, filename)
            with open(filepath, 'r') as f:
                data = json.load(f)
                if len(data['tooltips']) >= 2:  # Filter annotations with at least 2 keypoints
                    keypoints = [item for sublist in [[kp['x'], kp['y']] for kp in data['tooltips']] for item in sublist]
                    annotations.append((filename.replace('.json', '.png'), keypoints))
        return annotations

    def __len__(self) -> int:
        """
        Returns the number of items in the dataset.
        """
        return len(self.annotations)

    def __getitem__(self, idx) -> Tuple[torch.Tensor, torch.Tensor]:
        """
        Retrieves an item from the dataset at the specified index, with preprocessing for EfficientNet.

        Parameters:
        - idx: Index of the item to retrieve.

        Returns:
        - A tuple containing the preprocessed image tensor and its corresponding keypoints tensor.
        """
        img_name, keypoints = self.annotations[idx]
        img_path = os.path.join(self.images_dir, img_name)
        image = read_image(img_path)  # Load image
        keypoints = torch.tensor(keypoints + [0] * (6 - len(keypoints)), dtype=torch.float)  # Pad keypoints
        if self.transform:
            image = self.transform(image)
        return image, keypoints

最终得到的代码看起来还算合理，并且似乎注意到了问题的一些细节。虽然不是完美的（例如，选择在 default_transform 中将图像从灰度转为 RGB 颜色模式是个奇怪的决定，我个人更倾向于使用 pathlib 而非 os。listdir），但我们现在可以先放下这些细节不谈。
在这里插入图片描述

import torch
import torch.nn as nn
from torchvision import models
from typing import Tuple

class EfficientNetB0KeypointDetector(nn.Module):
    """
    A custom EfficientNet-B0 model for keypoint detection in grayscale images.
    The network modifies the input layer to accept single-channel images and 
    adds a custom output layer for predicting a fixed number of keypoints.
    """

    def __init__(self, num_keypoints: int = 3):
        """
        Initializes the model.

        Parameters:
        - num_keypoints: The number of keypoints to predict. Default is 3, 
                         corresponding to the maximum number of keypoints in the dataset.
        """
        super(EfficientNetB0KeypointDetector, self).__init__()
        self.num_keypoints = num_keypoints
        # Load a pre-trained EfficientNet-B0
        self.efficientnet_b0 = models.efficientnet_b0(pretrained=True)
        
        # Modify the first convolutional layer to accept single-channel (grayscale) images
        original_first_conv = self.efficientnet_b0.features[0][0]
        self.efficientnet_b0.features[0][0] = nn.Conv2d(
            in_channels=1, 
            out_channels=original_first_conv.out_channels, 
            kernel_size=original_first_conv.kernel_size, 
            stride=original_first_conv.stride, 
            padding=original_first_conv.padding, 
            bias=False)

        # Modify the classifier to predict keypoints (num_keypoints * 2 because x, y for each keypoint)
        self.efficientnet_b0.classifier = nn.Linear(
            in_features=self.efficientnet_b0.classifier[1].in_features, 
            out_features=num_keypoints * 2)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        Forward pass of the network.

        Parameters:
        - x: Input tensor of shape (batch_size, 1, 224, 224)

        Returns:
        - A tensor of shape (batch_size, num_keypoints * 2) representing the predicted keypoints.
        """
        return self.efficientnet_b0(x)

# Example usage
if __name__ == "__main__":
    model = EfficientNetB0KeypointDetector()
    print(model)

    # Example input tensor (batch_size, channels, height, width)
    example_input = torch.randn(1, 1, 224, 224)
    output = model(example_input)
    print(output.shape)  # Expected shape: (1, 6) for 3 keypoints (each with x, y coordinates)

在这里插入图片描述

import os
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from dataset import WakeDataset, get_dataloader  # Assuming dataset.py is in the same directory
from net import EfficientNetB0KeypointDetector
from torchvision import transforms
import wandb

# Initialize Weights & Biases
wandb.init(project="wake_model_llm_assist", entity="your_wandb_entity", mode="offline")

def custom_collate_fn(batch):
    """
    Custom collate function to pad keypoints to a fixed size.
    
    Parameters:
    - batch: A list of tuples (image, keypoints).
    
    Returns:
    - Batched images and keypoints with padding as necessary.
    """
    images, keypoints = zip(*batch)
    images = torch.stack(images, 0)
    max_keypoints = 3 * 2  # Maximum number of keypoints (3) times 2 for (x, y)
    padded_keypoints = [torch.cat((kp, torch.zeros(max_keypoints - len(kp)))) for kp in keypoints]
    keypoints = torch.stack(padded_keypoints, 0)
    return images, keypoints

def train_one_epoch(model: nn.Module, dataloader: DataLoader, optimizer: optim.Optimizer, device: torch.device):
    """
    Trains the model for one epoch.
    
    Parameters:
    - model: The neural network model.
    - dataloader: DataLoader providing the training data.
    - optimizer: Optimizer used for model training.
    - device: The device to train on.
    """
    model.train()
    total_loss = 0.0
    for images, keypoints in dataloader:
        images, keypoints = images.to(device), keypoints.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = nn.MSELoss()(outputs, keypoints)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    
    average_loss = total_loss / len(dataloader)
    wandb.log({"train_loss": average_loss})

def evaluate(model: nn.Module, dataloader: DataLoader, device: torch.device):
    """
    Evaluates the model on the validation set.
    
    Parameters:
    - model: The neural network model.
    - dataloader: DataLoader providing the validation data.
    - device: The device to evaluate on.
    """
    model.eval()
    total_loss = 0.0
    with torch.no_grad():
        for images, keypoints in dataloader:
            images, keypoints = images.to(device), keypoints.to(device)
            outputs = model(images)
            loss = nn.MSELoss()(outputs, keypoints)
            total_loss += loss.item()
    
    average_loss = total_loss / len(dataloader)
    wandb.log({"val_loss": average_loss})

def main():
    # Setup
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = EfficientNetB0KeypointDetector().to(device)
    optimizer = optim.Adam(model.parameters(), lr=1e-3)

    # DataLoaders
    train_dataloader = get_dataloader('path/to/train/annotations', 'path/to/train/images', batch_size=4, transform=transforms.ToTensor(), collate_fn=custom_collate_fn)
    val_dataloader = get_dataloader('path/to/val/annotations', 'path/to/val/images', batch_size=4, transform=transforms.ToTensor(), collate_fn=custom_collate_fn)

    # Training loop
    for epoch in range(num_epochs):
        train_one_epoch(model, train_dataloader, optimizer, device)
        evaluate(model, val_dataloader, device)
        # Log model checkpoints, images, or other metrics as needed
        wandb.log({})

if __name__ == "__main__":
    main()

当我开始尝试运行代码时，遇到了一系列与张量形状不匹配相关的问题（想起了之前 RGB 与灰度图像的差异以及自定义 collate 函数的问题）。我尝试了几分钟的调试，但最终选择将所有模块代码复制到提示框里，让它帮我一起找出问题所在。

在这里插入图片描述

这样做有效解决了所有挂起的问题，我随即开始了模型的训练…… 但训练速度慢得出奇。这时我才意识到我没有利用 Apple 芯片上的 Metal 性能着色器（MPS）后端。虽然我平时不太在个人电脑上训练模型，MPS 对我还比较新，但我还是决定添加一个条件判断来使用它。

鉴于训练数据量适中（共 581 张图片），加上 EfficientNet 已经在 ImageNet 上进行了预训练，我决定进行 1000 个周期的训练。

在这里插入图片描述

500 个周期后，train_loss 仍在减少，但验证损失似乎已经收敛（至少对于快速评估而言足够收敛）。图像是根据权重和偏差复制的。

值得注意的是，尽管存在一些已经在卫星图像上进行预训练的基础模型（相对于 ImageNet），这些模型很可能对这项任务更有效，但这些网络体积比 EfficientNet 大得多，因此训练速度更慢（而且太新，还没被 LLM 训练数据集包括）。

现在，让我们编写一个推理脚本和一个命令行界面（CLI），来测试我们刚训练好的模型。

在这里插入图片描述

import torch
from torchvision import transforms
from PIL import Image
import matplotlib.pyplot as plt
import click

# Assuming net.py contains the EfficientNetB0KeypointDetector class
# and dataset.py contains the transforms used during training
from net import EfficientNetB0KeypointDetector
from dataset import WakeDataset  # Modify as necessary to import transforms

def load_model(model_path):
    """Load the trained model from a file."""
    model = EfficientNetB0KeypointDetector()
    model.load_state_dict(torch.load(model_path))
    model.eval()  # Set the model to inference mode
    return model

def preprocess_image(image_path, transform):
    """Load and preprocess an image."""
    image = Image.open(image_path).convert("L")  # Assuming grayscale conversion as in your dataset
    image = transform(image)
    # Add batch dimension (BxCxHxW)
    image = image.unsqueeze(0)
    return image

def plot_keypoints(image, keypoints):
    """Plot keypoints on the image."""
    plt.imshow(image.squeeze(), cmap='gray')  # Remove batch dimension and show image
    plt.scatter(keypoints[:, 0], keypoints[:, 1], s=50, marker='.', c='red')
    plt.show()

@click.command()
@click.argument('model_path', type=click.Path(exists=True))
@click.argument('image_path', type=click.Path(exists=True))
def run_inference(model_path, image_path):
    """Run inference on an image using a trained model."""
    # Use the same transforms as during training
    transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Grayscale(num_output_channels=3),
    ])
    
    model = load_model(model_path)
    image = preprocess_image(image_path, transform)
    
    # Perform inference
    with torch.no_grad():
        keypoints = model(image)
        keypoints = keypoints.view(-1, 2).cpu().numpy()  # Reshape and convert to numpy for plotting
    
    # Load original image for plotting
    original_image = Image.open(image_path).convert("L")
    plot_keypoints(original_image, keypoints)

if __name__ == '__main__':
    run_inference()

让我们开始吧！
在这里插入图片描述

在这里插入图片描述

虽不完美，但对于第一次通过来说是合理的。

在这里插入图片描述

你可以在 GitHub 上找到包括所有模块、模型及权重（第 500 周期的）和一个 readme 的完整代码。我花了不到一个小时就生成了整个库，这个过程比写这篇文章花费的时间要少得多。所有这些工作都是在我的个人开发环境中完成的：MacBook Air M2 + VS Code + Copilot + 保存时自动格式化（使用 black、isort 等）+ 一个 Python 3.9.6 的虚拟环境（.venv）。

GitHub：https://github.com/pbeukema/wakemodel_llmassist

学到的教训

向模型提供尽可能多的相关上下文，帮助其解决任务。要记住，模型缺少许多你可能认为理所当然的假设。
LLM 生成的代码通常远非完美，预测其失败的方式也颇具挑战。因此，在 IDE 中有一个辅助工具（比如 Copilot）非常有帮助。
当你的代码高度依赖 LLM 时，要记得编写代码的速度往往是限制因素。避免请求重复且不需要任何改动的代码，这不仅浪费能源，也会拖慢你的进度。
LLM 很难 “记住” 它们输出的每一行代码，经常需要提醒它们当前的状态（特别是当存在跨多个模块的依赖时）。
对 LLM 生成的代码保持怀疑态度。尽可能多地进行验证，使用测试、可视化等手段。并且在重要的地方投入时间。相比于神经网络部分，我在 haversine 函数上花费了更多的时间（因为预期规模对性能的要求较高），对于神经网络，我更关注的是快速发现失败。

#04

LLM 与工程领域的未来

唯有变化是永恒的。

—— 赫拉克利特

在 LLM 引发的热潮和巨额资金流动的背景下，人们很容易一开始就期待完美。然而，有效利用这些工具，需要我们勇于尝试、学习并做出调整。

LLM 是否会改变软件工程团队的根本结构呢？可能吧，我们现在只是新世界的门前小道。但 LLM 已经使代码的获取变得更加民主化了。即使是没有编程经验的人，也能快速而容易地构建出功能性原型。如果你有严格的需求，将 LLM 应用在你已经熟悉的领域或许更为明智。根据我个人的经验，LLM 能够使得编写高效代码所需的时间缩短约 90%。如果你发现它们一直输出低质量的代码，那么也许是时候重新审视你的输入了。