酷睿Ultra 7 255H拥有16个核心16线程,基础频率为2.0 GHz,加速频率可达5.1 GHz,三级缓存24 MB的L3缓存和8个Xe-LPG核心,GPU频率为2.25 GHz。
选R9 8945HX还是Ultra 7 255H这些点很重要看过你就懂了 http://www.adiannao.cn/dy
R9 8945HX采用Zen 4架构和4纳米生产工艺112核心24线程基础频率为3.0GHz,最高加速频率5.2GHz三级缓存为64MB
代码
import numpy as np
import multiprocessing as mp
from numba import cuda
# CPU多线程矩阵乘法
def parallel_matmul_cpu(matrix_a, matrix_b, result, start_row, end_row):
result[start_row:end_row] = np.dot(matrix_a[start_row:end_row], matrix_b)
# GPU加速矩阵乘法
@cuda.jit
def matmul_gpu(a, b, c):
row, col = cuda.grid(2)
if row < c.shape[0] and col < c.shape[1]:
tmp = 0.0
for k in range(a.shape[1]):
tmp += a[row, k] * b[k, col]
c[row, col] = tmp
def main():
size = 4096 # 大矩阵尺寸
matrix_a = np.random.rand(size, size).astype(np.float32)
matrix_b = np.random.rand(size, size).astype(np.float32)
result_cpu = np.zeros((size, size), dtype=np.float32)
# CPU多线程计算
threads = 16 # 匹配16线程
pool = mp.Pool(threads)
chunk_size = size // threads
tasks = [(matrix_a, matrix_b, result_cpu,
i * chunk_size,
(i + 1) * chunk_size if i < threads - 1 else size)
for i in range(threads)]
pool.starmap(parallel_matmul_cpu, tasks)
# GPU加速计算
d_a = cuda.to_device(matrix_a)
d_b = cuda.to_device(matrix_b)
d_c = cuda.device_array((size, size), dtype=np.float32)
threads_per_block = (16, 16)
blocks_per_grid_x = (size + threads_per_block[0] - 1) // threads_per_block[0]
blocks_per_grid_y = (size + threads_per_block[1] - 1) // threads_per_block[1]
blocks_per_grid = (blocks_per_grid_x, blocks_per_grid_y)
matmul_gpu[blocks_per_grid, threads_per_block](d_a, d_b, d_c)
result_gpu = d_c.copy_to_host()
# 结果验证
assert np.allclose(result_cpu, result_gpu, atol=1e-5)
if __name__ == "__main__":
main()
641

被折叠的 条评论
为什么被折叠?



