问题1 :
pcie 4:1:0 接系统盘,pcie 7:1:0 通过CFE卡套接数据盘,或许是cfe卡套通信质量不稳定,导致uefi启动时会有nvme命令超时现象甚至系统卡死(拔电冷重启恢复)
报错提示:
Private->Cap.Mpsmin + 12) <= 12
分析:
获取官方uefi源码uefi-202404.0
edk2/MdeModulePkg/Bus/Pci/NvmExpressDxe/NvmExpressHci.c :NvmeControllerInit函数中
Status = ReadNvmeControllerCapabilities (Private, &Private->Cap);读取有可能存在通信失败导致Private->Cap读出0xFFFFFFFFFFFFFFFF,而返回值却表示成功,导致后面
ASSERT ((Private->Cap.Mpsmin + 12) <= EFI_PAGE_SHIFT);判定生效触发冷重启。而冷重启又存在失败卡死现象。
解决:
在读取Private->Cap全FF后通过以下判断避免断言冷重启即可
if(Private->Cap.Mpsmin == 0x0f) {
return EFI_TIMEOUT; // 超时返回指示nvme失败
}
问题2:
反复重启下有可能存在参数分区信息被修改(原因未知)导致启动卡死在uefi断言错误.
解决:
1. 该步骤为厂商提供
1) edk2-nvidia/Silicon/NVIDIA/Drivers/FvbNorFlashDxe/FvbNorFlashStandaloneMm.c: ValidateFvHeader
if (IsMeasurementPartitionErasedOrZero (NorFlashProtocol, MeasurementOffset, MeasurementPartitionSize) == TRUE) {
DEBUG ((DEBUG_ERROR, "%a: No Valid Measurements found. Re-initializing the Variable Store\n", __FUNCTION__));
Status = EraseMeasurementPartition (NorFlashProtocol, MeasurementOffset, MeasurementPartitionSize);
if (EFI_ERROR (Status)) {
DEBUG ((
DEBUG_ERROR,
"%a: Failed to Erase Partition %r\n",
__FUNCTION__,
Status
));
--- return Status;
+++ return EFI_NOT_FOUND; // 强制re-init
}
}
2) edk2-nvidia/Silicon/NVIDIA/Drivers/FvbNorFlashDxe/VarIntCheck.c: VarIntValidate
+++ ZeroMem (This->CurMeasurement, This->MeasurementSize);
return Status;
2. 配置中禁止变量校验失败断言终止
edk2-nvidia/Platform/NVIDIA/StandaloneMmOptee/StandaloneMmOptee.dsc.inc: [PcdsFeatureFlag]
----- gNVIDIATokenSpaceGuid.PcdAssertOnVarStoreIntegrityCheckFail|TRUE
+++ gNVIDIATokenSpaceGuid.PcdAssertOnVarStoreIntegrityCheckFail|FALSE
以上修改需要重新编译uefi及配套optee,下面是编译方法
uefi
1. 根据官方wiki配置好docker环境
edk2_docker edk2-nvidia/Platform/NVIDIA/Jetson/build.sh
// images/uefi_Jetson_RELEASE.bin 替换官方sdk 的 Linux_for_Tegra/bootloader/uefi_jetson.bin
2. 问题是官方没提示standalonemm_optee_t234.bin(构建optee必需)如何构建。
edk2_docker edk2-nvidia/Platform/NVIDIA/StandaloneMmOptee/build.sh
// images/uefi_StandaloneMmOptee_RELEASE.bin 替换 Linux_for_Tegra/bootloader/standalonemm_optee_t234.bin
optee
1. 下载官方linux bsp源码public_sources.tbz2
拷出并解压其中nvidia-jetson-optee-source.tbz2与atf_src.tbz2
准备环境变量
export UEFI_STMM_PATH=$(pwd)/../Linux_for_Tegra/bootloader/standalonemm_optee_t234.bin
export CROSS_COMPILE_AARCH64_PATH=$(pwd)/../l4t-gcc
export CROSS_COMPILE_AARCH64=${CROSS_COMPILE_AARCH64_PATH}/bin/aarch64-buildroot-linux-gnu-
2. 编译optee
cd nvidia-jetson-optee
./optee_src_build.sh -p t234
dtc -I dts -O dtb -o ./optee/tegra234-optee.dtb ./optee/tegra234-optee.dts
3. 编译arm-trusted
cd arm-trusted/arm-trusted-firmware
make BUILD_BASE=./build \
CROSS_COMPILE="${CROSS_COMPILE_AARCH64}" \
DEBUG=0 LOG_LEVEL=20 PLAT=tegra SPD=opteed TARGET_SOC=t234 V=0
4. 创建tos-optee_t234.img
../../../Linux_for_Tegra/nv_tegra/tos-scripts/gen_tos_part_img.py --monitor $(pwd)/build/tegra/t234/release/bl31.bin \
--os $(pwd)/../../nvidia-jetson-optee/optee/build/t234/core/tee-raw.bin \
--dtb $(pwd)/../../nvidia-jetson-optee/optee/tegra234-optee.dtb \
--tostype optee \
./tos-optee_t234.img
cp tos-optee_t234.img ../../../Linux_for_Tegra/bootloader/tos-optee_t234.img
tos-optee_t234.img 替换 Linux_for_Tegra/bootloader/tos-optee_t234.img
- 烧写tos-optee_t234.img及uefi
cd Linux_for_Tegra
sudo ./flash.sh -k A_cpu-bootloader -c bootloader/t186ref/cfg/flash_t234_qspi.xml jetson-orin-nano-devkit-nvme nvme0n1p1
sudo ./flash.sh -k A_secure-os -c bootloader/t186ref/cfg/flash_t234_qspi.xml jetson-orin-nano-devkit-nvme nvme0n1p1
烧写运行会遇到新问题,断言卡死在edk2/StandaloneMmPkg/Core/Dispatcher.c: MmLoadImage
分析:
跟踪代码发现MmLoadImage加载efi时通过InternalAllocMaxAddress从mMmMemoryMap分配内存时内存紧缺了
mMmMemoryMap是StandaloneMmPkg维护的用于动态内存分配的一个链表,每个节点都有一系列内存页,默认初始化时通过以下调用链用一个整块的大内存作为一个链表节点挂载到mMmMemoryMap
MemoryAllocationLibConstructor
MmInitializeMemoryServices
MmAddMemoryRegion
分配信息来自EFI_HOB_GUID_TYPE类型的gEfiMmPeiMmramMemoryReserveGuid。
gEfiMmPeiMmramMemoryReserveGuid由
edk2-nvidia/Silicon/NVIDIA/Library/StandaloneMmCoreEntryPointOptee/Arm/StandaloneMmCoreEntryPoint.c: _ModuleEntryPoint中调用HobStart = CreateHobListFromBootInfo (&CpuDriverEntryPoint, PayloadBootInfo);创建
地址分配信息PayloadBootInfo来自_ModuleEntryPoint的入参SharedBufAddress强制转换获得,mMmMemoryMap大小来自PayloadBootInfo->SpHeapSize(其值为PayloadBootInfo->SpStackBase-PayloadBootInfo->SpHeapBase,进一步首PayloadBootInfo->SpMemLimit影响) 目前并未找到SharedBufAddress的来源所以暂无法改变SpHeapSize的值。
虽然无法改变SpHeapSize以增加mMmMemoryMap避免加载镜像内存紧缺,但分析可知最后加载的UserAuthenticationMm镜像可以不要
解决:
edk2-nvidia/Platform/NVIDIA/StandaloneMmOptee/StandaloneMmOptee.dsc.inc: [Components.common]
# Silicon/NVIDIA/Drivers/UserAuthenticationDxeMm/UserAuthenticationMm.inf
edk2-nvidia/Platform/NVIDIA/StandaloneMmOptee/StandaloneMmOptee.fdf.inc:
# INF Silicon/NVIDIA/Drivers/UserAuthenticationDxeMm/UserAuthenticationMm.inf