Face XY project

最新推荐文章于 2025-02-17 14:49:47 发布

okimaru

最新推荐文章于 2025-02-17 14:49:47 发布

阅读量652

点赞数 25

文章标签：人工智能

本文链接：https://blog.csdn.net/okimaru/article/details/138287536

版权

The goal of this project is to build an Image Regression project that can predict the X and Y coordinates of a facial feature in a live image.

Interactive Tool Startup Steps

You will implement the project by collecting your own data using a clickable image display tool, training a model to find the XY coordinates of the feature, and then testing and updating your model as needed using images from the live camera. Since you are collecting two values for each category, the model may require more training and data to get a satisfactory result.

Be patient! Building your model is an iterative process.

Step 1: Open The Notebook
To get started, navigate to the regression folder in your JupyterLab interface and double-click the regression_interactive.ipynb notebook to open it.
Step 2: Execute All Of The Code Blocks
The notebook is designed to be reusable for any XY regression task you wish to build. Step through the code blocks and execute them one at a time.
- 1. Camera
    This block sets the size of the images and starts the camera. If your camera is already active in this notebook or in another notebook, first shut down the kernel in the active notebook before running this code cell. Make sure that the correct camera type is selected for execution (USB). This cell may take several seconds to execute.
  2. Task
    You get to define your TASK and CATEGORIES parameters here, as well as how many datasets you want to track. For the Face XY Project, this has already been defined for you as the face task with categories of nose, left_eye, and right_eye. Each category for the XY regression tool will require both an X and Y values. Go ahead and execute the cell. Subdirectories for each category are created to store the example images you collect. The file names of the images will contain the XY coordinates that you tag the images with during the data collection step. This cell should only take a few seconds to execute.
  3. Data Collection
    You’ll collect images for your categories with a special clickable image widget set up in this cell. As you click the “nose” or “eye” in the live feed image, the data image filename is automatically annotated and saved using the X and Y coordinates from the click.
  4. Model
    The model is set to the same pre-trained ResNet18 model for this project:
    
    model = torchvision.models.resnet18(pretrained=True)
    
    For more information on available PyTorch pre-trained models, see the PyTorch documentation. In addition to choosing the model, the last layer of the model is modified to accept only the number of classes that we are training for. In the case of the Face XY Project, it is twice the number of categories, since each requires both X and Y coordinates (i.e. nose X, nose Y, left_eye X, right_eye X and right_eye Y).
    
    output_dim = 2 * len(dataset.categories)
    
    model.fc = torch.nn.Linear(512, output_dim)
    
    This code cell may take several seconds to execute.
  5. Live Execution
    This code block sets up threading to run the model in the background so that you can view the live camera feed and visualize the model performance in real time. This cell should only take a few seconds to execute. For this project,a blue circle will overlay the model prediction for the location of the feature selected.
  6. Training and Evaluation
    The training code cell sets the hyper-parameters for the model training (number of epochs, batch size, learning rate, momentum) and loads the images for training or evaluation. The regression version is very similar to the simple classification training, though the loss is calculated differently. The mean square error over the X and Y value errors is calculated and used as the loss for backpropagation in training to improve the model. This code cell may take several seconds to execute.
  7. Display the Interactive Tool!
    This is the last code cell. All that's left to do is pack all the widgets into one comprehensive tool and display it. This cell may take several seconds to run and should display the full tool for you to work with. There are three image windows. Initially, only the left camera feed is populated. The middle window will display the most recent annotated snapshot image once you start collecting data. The right-most window will display the live prediction view once the model has been trained.
Step 3: Collect Data, Train, Test

Position the camera in front of your face and collect initial data. Point to the target feature with the mouse cursor that matches the category you've selected (such as the nose). Click to collect data. The annotated snapshot you just collected will appear in the middle display box. As you collect each image, vary your head position and pose:
- 1. Add 20 images of your nose with the nose category selected
  2. Add 20 images of your left eye face with the left_eye category selected
  3. Add 20 images of your right eye with the right_eye category selected
  4. Set the number of epochs to 10 and click the train button
  5. Once the training is complete, try the live view and observe the prediction. A blue circle should appear on the feature selected.
Step 4: Improve Your Model

Use the live inference as a guide to improve your model! The live feed shows the model's prediction. As you move your head, does the target circle correctly follow your nose (or left_eye, right_eye)? If not, then click the correct location and add data. After you've added some data for a new scenario, train the model some more. For example:
- - Move the camera so that the face is closer. Is the performance of the predictor still good? If not, try adding some data for each category (10 each) and retrain (5 epochs). Does this help? You can experiment with more data and more training.
  - Move the camera to provide a different background. Is the performance of the predictor still good? If not, try adding some data for each category (10 each) and retrain (5 epochs). Does this help? You can experiment with more data and more training.
  - Are there any other scenarios you think the model might not perform well? Try them out!
  - Can you get a friend to try your model? Does it work the same? You know the drill: more data and training!
Step 5: Save Your Model

When you are satisfied with your model, save it by entering a name in the "model path" box and click "save model".