Legacy-main项目更新迭代


前言

本文旨在将llm.c 组件以及 SMLNJIntegration,DataLoader,和 DataProcessing 模块集成到项目Legacy-main中。主要目标是将 llm.c 项目整合到 Legacy-main 的根目录下,添加新的头文件目录并确保它们能够被项目其他部分正确引用并且在安装脚本中添加新的构建和测试步骤以验证整合的正确性。

具体步骤

1.集成 llm.c:

        1.将llm.c本身利用cmake控制编译通过: 

        在llm.c项目的根目录下,创建一个名为CMakeLists.txt的文件,将以下内容复制粘贴:

cmake_minimum_required(VERSION 3.10)
project(llm_c_project LANGUAGES C)

# 设置C语言标准
set(CMAKE_C_STANDARD 99)
set(CMAKE_C_STANDARD_REQUIRED ON)

# 查找并链接OpenMP
find_package(OpenMP)
if(OpenMP_C_FOUND)
    link_libraries(OpenMP::OpenMP_C)
endif()

# 查找Python解释器,并确保所有Python依赖都被安装
find_package(Python3 REQUIRED Interpreter)
add_custom_target(install_python_deps
    COMMAND ${Python3_EXECUTABLE} -m pip install -r "${PROJECT_SOURCE_DIR}/requirements.txt"
    COMMENT "Installing Python dependencies"
)

# 添加编译器优化和警告标志
add_compile_options(-Wall -Wextra -pedantic -Ofast)

# 指定源文件
set(SOURCE_FILES
    train_gpt2.c
)

# 添加执行文件
add_executable(llm_c_project ${SOURCE_FILES})

# 链接数学库
target_link_libraries(llm_c_project m)

# 自定义目标:预处理数据
add_custom_target(preprocess
    COMMAND ${Python3_EXECUTABLE} "${PROJECT_SOURCE_DIR}/prepro_tinyshakespeare.py"
    COMMENT "Preprocessing TinyShakespeare dataset"
    WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}
)

# 自定义目标:下载并初始化GPT-2模型
add_custom_target(prepare_model
    COMMAND ${Python3_EXECUTABLE} "${PROJECT_SOURCE_DIR}/train_gpt2.py"
    COMMENT "Preparing GPT-2 model weights"
    WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}
)

# 添加依赖关系:prepare_model 依赖于 preprocess
add_dependencies(prepare_model preprocess)

# 添加依赖关系:主项目依赖于模型准备、数据预处理和Python依赖安装
add_dependencies(llm_c_project install_python_deps prepare_model)

  2.定义一个函数来处理 llm.c 的安装:

       在 Legacy-main项目的config文件夹下install.sh 脚本中定义一个新的函数来负责 llm.c 的编译和安装。这个函数放在文件的开始部分,以便于管理和修改:

install_llm_c() {
    echo "$this: Installing llm.c components..."
    cd "$ROOT/llm.c"  # Assuming llm.c is directly under the root directory of legacy-main

    # 这里假设 llm.c 使用 CMake 进行构建
    if [ -d build ]; then
        rm -rf build
    fi
    mkdir build && cd build
    cmake ..
    if make; then
        echo "$this: llm.c components built successfully."
        # 可以选择安装到特定位置
        make install
    else
        complain "$this: Building llm.c components failed."
    fi
    cd "$ROOT"
}

 3.调用 install_llm_c 函数:

在intall.sh脚本的适当位置调用这个函数,调用位置的选择取决于其他项目是否依赖于llm.c,但由于我们是在较浅层(根目录)进行的集成,一般不涉及其他功能的链接,因此放在所有安装步骤的末尾是可以的:

# 在最后完成其他库和程序的安装后,安装 llm.c
if [ $nolib = false ] ; then
    echo $this: Installing other libraries and programs:
    export ROOT INSTALLDIR CONFIGDIR BINDIR
    CM_TOLERATE_TOOL_FAILURES=true
    export CM_TOLERATE_TOOL_FAILURES
    if "$BINDIR"/sml $SIZE_OPT -m \$smlnj/installer.cm
    then
        # Install llm.c components
        install_llm_c

        # because we create heap2exec without knowing if heap2asm is going
        # to be installed, we need this hack to remove heap2exec when heap2asm
        # is not available
        if [ ! -x "$BINDIR"/heap2asm ] ; then
            rm -f "$BINDIR"/heap2exec
        fi
        vsay $this: Installation complete.
    else
        complain "$this: !!! Installation of libraries and programs failed."
    fi
fi

exit 0

4.错误处理 :

确保错误处理是在集成和安装新组件的过程中是很重要的,因此在 install.sh 脚本中对于llm.c这个新功能我们应有相应的错误处理报告方法,在脚本中已有相应函数complain():

complain() {
    echo "$@"
    exit 1
}

我们要做的就是在上方的 install_llm_c 函数中加入错误报告:

install_llm_c() {
    echo "$this: Installing llm.c components..."
    if [ ! -d "$ROOT/llm.c" ]; then
        complain "$this: llm.c directory does not exist."
    fi

    cd "$ROOT/llm.c" || complain "$this: Failed to enter the llm.c directory."

    # Create and enter the build directory
    mkdir -p build && cd build || complain "$this: Failed to create or enter the build directory."

    # Run CMake to configure the project
    cmake .. || complain "$this: CMake configuration failed."

    # Build the project
    if ! make; then
        complain "$this: Building llm.c components failed."
    fi

    echo "$this: llm.c components built successfully."
    cd "$ROOT"  # Go back to the root directory
}

并且在install.sh脚本开始时进行工具检查,例如cmake或make,检查是否安装可用,在install.sh开头加入:

# Check for necessary tools
command -v cmake >/dev/null 2>&1 || complain "cmake is not installed. Please install it and run this script again."
command -v make >/dev/null 2>&1 || complain "make is not installed. Please install it and run this script again."

5.运行验证 :

验证是否并入成功,终端运行项目:

./config/install.sh

tip

途中报错

Error :CMake Error: The source "/mnt/g/legacy-main/llm.c/CMakeLists.txt" does not match the source "/mnt/g/llm.c-master1/CMakeLists.txt" used to generate cache. Re-run cmake with a different source directory../config/install.sh: CMake configuration failed.

分析:这个 CMake 错误说明之前在一个不同的源目录(/mnt/g/llm.c-master1)运行过 CMake,生成了一些配置缓存,而现在您尝试在一个新的源目录(/mnt/g/legacy-main/llm.c)中再次运行 CMake。这种情况下,CMake 无法重用之前的配置缓存,因为源文件路径已改变。

解决方案:清除旧的 CMake 缓存,然后用新的源目录重新运行 CMake。

1.清除旧的构建目录

清除包含旧 CMake 缓存的构建目录,确保没有旧的构建信息残留。

cd /mnt/g/legacy-main/llm.c
rm -rf build  # 删除旧的构建目录
2. 重新创建构建目录

创建一个新的构建目录,并在此目录中重新运行 CMake,确保所有路径和配置都是基于当前的源代码目录。

mkdir build
cd build
cmake ..

由于这种问题可能在用户端再次发生,我们在脚本中添加代码以确保每次运行前都清除旧的构建目录 :

install_llm_c() {
    echo "$this: Installing llm.c components..."
    cd "$ROOT/llm.c"  # llm.c is directly under the root directory of legacy-main

    # Clean up any existing build directory
    rm -rf build
    mkdir -p build && cd build

    # Run CMake to configure the project
    cmake .. || complain "$this: CMake configuration failed."

    # Build the project
    if ! make; then
        complain "$this: Building llm.c components failed."
    fi

    echo "$this: llm.c components built successfully."
    cd "$ROOT"  # Go back to the root directory
}

2.添加py库功能

1.为SMLNJIntegrationDataLoaderDataProcessing三个库目录添加单独的CMakeLists.txt 文件:

1.SMLNJIntegration/CMakeLists.txt:

包含与 SML/NJ 代码交互的接口:

cmake_minimum_required(VERSION 3.10)
project(SMLNJIntegration)

# 设置 C++ 标准
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

# 包含头文件目录
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/include)

# 如果您需要将头文件安装到系统路径,可以启用下面的安装命令
# install(FILES ${CMAKE_CURRENT_SOURCE_DIR}/SMLNJBridge.hpp DESTINATION include/SMLNJIntegration)
2.DataLoader/CMakeLists.txt

用作数据加载功能:

cmake_minimum_required(VERSION 3.10)
project(DataLoader)

# 设置 C++ 标准
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

# 包含头文件目录
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/include)

# 如果您需要将头文件安装到系统路径,可以启用下面的安装命令
# install(FILES ${CMAKE_CURRENT_SOURCE_DIR}/DataLoader.hpp DESTINATION include/DataLoader)
3.DataProcessing/CMakeLists.txt

数据处理功能

cmake_minimum_required(VERSION 3.10)
project(DataProcessing)

# 设置 C++ 标准
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

# 包含头文件目录
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/include)

# 如果您需要将头文件安装到系统路径,可以启用下面的安装命令
# install(FILES ${CMAKE_CURRENT_SOURCE_DIR}/DataFrame.hpp DESTINATION include/DataProcessing)

 上述 CMakeLists.txt 文件中的 include_directories 命令用于添加包含目录,这确保在编译时能够找到这些头文件。

2.修改 install.sh 脚本

插入以下代码:

# 创建存放头文件的目录
mkdir -p "$INSTALLDIR/include/SMLNJIntegration"
mkdir -p "$INSTALLDIR/include/DataLoader"
mkdir -p "$INSTALLDIR/include/DataProcessing"

# 复制头文件到安装目录
cp "$ROOT/SMLNJIntegration/SMLNJBridge.hpp" "$INSTALLDIR/include/SMLNJIntegration/"
cp "$ROOT/DataLoader/DataLoader.hpp" "$INSTALLDIR/include/DataLoader/"
cp "$ROOT/DataProcessing/DataFrame.hpp" "$INSTALLDIR/include/DataProcessing/"

# 设置 CXXFLAGS 或 INCLUDE_DIRS
CXXFLAGS="$CXXFLAGS -I$INSTALLDIR/include/SMLNJIntegration -I$INSTALLDIR/include/DataLoader -I$INSTALLDIR/include/DataProcessing"
export CXXFLAGS

 3.检验三个功能加入成功

1.创建测试目录和测试文件

tests/functionality_tests.cpp 中,添加测试代码:

​
​
#include "SMLNJIntegration/SMLNJBridge.hpp"
#include "DataLoader/DataLoader.hpp"
#include "DataProcessing/DataFrame.hpp"
#include <iostream>

int main() {
    std::cout << "Running Functionality Tests...\n";
    std::cout << "Tests completed successfully.\n";
    return 0;
}

​

​
 2.install.sh 中添加编译和运行测试的命令

插入在创建所有必要目录并编译项目组件之后,在进行任何安装任务之前:

echo "Compiling functionality tests..."
g++ -std=c++11 -o tests/functionality_tests tests/functionality_tests.cpp -I./include

echo "Running functionality tests..."
./tests/functionality_tests
if [ $? -ne 0 ]; then
    echo "Tests failed."
    exit 1
else
    echo "All tests passed successfully."
fi
 3.运行测试

报错及解决方案:

Error1In file included from /mnt/g/legacy-main/tests/functionality_tests.cpp:3: /mnt/g/legacy-main/include/DataProcessing/DataFrame.hpp:4:10: fatal error: xtensor/xarray.hpp: No such file or directory 4 | #include <xtensor/xarray.hpp> | ^~~~~~~~~~~~~~~~~~~~ compilation terminated. Running functionality tests... ./config/install.sh: 279: /mnt/g/legacy-main/tests/functionality_tests: not found Tests failed.

分析:大意为编译器在尝试编译 DataFrame.hpp 时无法找到 xtensor/xarray.hpp 文件

解决方案:在sh脚本中添加自动安装 xtensor 的步骤

# 检查xtensor是否已安装,并安装它
xtensor_path=$(dpkg -L libxtensor-dev | grep xtensor/xarray.hpp || true)
if [ -z "$xtensor_path" ]; then
    echo "xtensor is not installed. Installing xtensor..."
    sudo apt-get update
    sudo apt-get install -y libxtensor-dev
    if [ $? -ne 0 ]; then
        complain "Failed to install xtensor. Please install it manually and run this script again."
    fi
fi

 最好放到靠前位置,避免再次发生错误

Error2:from /usr/include/xtensor/xcontainer.hpp:23, from /usr/include/xtensor/xarray.hpp:20, from /mnt/g/legacy-main/include/DataProcessing/DataFrame.hpp:4, from /mnt/g/legacy-main/tests/functionality_tests.cpp:3: /usr/include/xtensor/xiterator.hpp:479:21: note: candidate: ‘xt::linear_end<int>(const int&)::<lambda(int)>’ 479 | }, /*else*/ [&](auto self) | ^ /usr/include/xtensor/xiterator.hpp:479:21: note: no known conversion for argument 1 from ‘xtl::identity’ to ‘int’ ..........

分析:报错非常多且繁杂,但大意为各种c++类型使用错误,编译命令使用了 C++14 或更高的标准。

解决方案:

install.sh 中修改编译器的调用,添加 -std=c++14 标志:

echo "Compiling functionality tests with C++14 support..."
g++ -std=c++14 -o $INSTALLDIR/tests/functionality_tests $ROOT/tests/functionality_tests.cpp -I$INSTALLDIR/include

echo "Running functionality tests..."
$INSTALLDIR/tests/functionality_tests
if [ $? -ne 0 ]; then
    echo "Tests failed."
    exit 1
else
    echo "All tests passed successfully."
fi

 Error3:from /mnt/g/legacy-main/include/DataProcessing/DataFrame.hpp:4, from /mnt/g/legacy-main/tests/functionality_tests.cpp:3: /usr/include/xtensor/xtensor_forward.hpp:136:11: note: ‘xt::xtensor’ declared here 136 | using xtensor = xtensor_container<XTENSOR_DEFAULT_DATA_CONTAINER(T, A), N, L>; | ^~~~~~~ In file included from /mnt/g/legacy-main/tests/functionality_tests.cpp:3: /mnt/g/legacy-main/include/DataProcessing/DataFrame.hpp:26:49: error: template argument 2 is invalid 26 | std::map<std::string, xtensor::xarray<double>> data; | ^~ /mnt/g/legacy-main/include/DataProcessing/DataFrame.hpp:26:49: error: template argument 4 is invalid Running functionality tests... ./config/install.sh: 290: /mnt/g/legacy-main/tests/functionality_tests: not found Tests failed.

分析:同样是大部报错,但大意为 "xtensor" 没有被声明,xarray无法正常引入,以及头文件包含错误。

解决方案:

在DataFrame.hpp中修改xtensorxarray引入方式

 修改DataFrame.hpp

#ifndef DATAFRAME_HPP
#define DATAFRAME_HPP

#include <xtensor/xarray.hpp> // 正确的 xtensor 头文件
#include <xtensor/xio.hpp>
#include <xtensor/xcsv.hpp>
#include <xtensor/xview.hpp>
#include <xtensor/xsort.hpp>
#include <string>
#include <map>
#include <vector>
#include <iostream>

class DataFrame {
public:
    DataFrame();
    // Load data from CSV
    void load_csv(const std::string& filename);
    // Data selection, filtering, joining, and grouping
    DataFrame filter(const std::string& column, const std::string& value) const;
    DataFrame join(const DataFrame& other, const std::string& key) const;
    void group_by(const std::string& column);
    // Add more functions as needed

private:
    std::map<std::string, xt::xarray<double>> data; // 使用正确的 xt 名称空间
    // Utility functions
};

#endif // DATAFRAME_HPP

修改内容:

 修改数据容器的类型:,将 xtensor::xarray 更改为 xt::xarray,这是正确的命名空间用法。

 确保你的编译指令包括适当的 C++ 标准,如 C++14 或更高,因为 xtensor 使用了现代 C++ 特性

3.整体再次运行编译

运行install.sh脚本

./config/install.sh

若无问题,则终端正常输出

参考结果 

功能测试及编译:

最后结果:


总结

本次项目更新中,我们成功集成了 llm.c 组件以及新增的 SMLNJIntegrationDataLoaderDataProcessing 模块到 Legacy-main 项目中。此次集成工作增强了项目的功能性并提高了其处理数据的能力。通过创建和配置 CMakeLists.txt,并在安装脚本中添加构建和安装步骤,llm.c 组件被顺利集成,编写了测试代码并在安装脚本中加入了编译和执行测试的步骤,这帮助我们验证了新功能的正确实现,解决了因未正确配置 xtensor 库和 C++ 标准版本导致的编译错误,确保了代码的稳定性和兼容性。项目后续还可以进行性能优化,设置自动化的构建和测试流程等工作。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值