昇腾AI原生创新算子挑战赛（S1赛季）复盘-Cross算子

最新推荐文章于 2024-10-16 17:48:16 发布

william_myq

最新推荐文章于 2024-10-16 17:48:16 发布

阅读量320

点赞数 2

文章标签： AI-native python 机器学习

本文链接：https://blog.csdn.net/xiujiti6871/article/details/139362776

版权

正文前感谢昇腾各位工作人员，没有你们的辛勤就没有我们的进步
本文立意交流大赛Cross算子编译过程
这道题难点在于如何实现算法公式，以及其中广播逻辑
本代码源于0xcccccccc仓库代码，做了部分修改，再次感谢0xcccccccc分享

注意有以下要求
返回沿着维度 dim 上,两个张量 input 和 other 的向量积 (叉积), input 和 other 必须有相同的形状,且指定的 dim 维上 size 必须为 3.
如果不指定 dim, 则默认为第一个尺度为 3 的维.

看公式很简单，只需要想着如何去实现即可
这里沿用了GreaterEqual中的index查找思路，将广播数据用原非广播数据替代

    __aicore__ inline int64_t get_index(int64_t i, int64_t j) {
        int64_t index = 0, step = 1, offset = 1;
        for (int64_t k = numshapes - 1; k >= 0; --k) {
            const int64_t idx = i / step % outshape[k];
            step *= outshape[k];
            index += idx % shape[j][k] * offset;
            offset *= shape[j][k];
        }
        return index;
    }

    __aicore__ inline void Process() {
        using F = typename map<T>::type;
        for (int64_t i = 0; i < maxbatchSize; ++i) {
            for (int64_t j = 0; j < maxstepSize; ++j) {
                auto index1 = i * 3 * maxstepSize + 0 * maxstepSize + j;
                auto index2 = i * 3 * maxstepSize + 1 * maxstepSize + j;
                auto index3 = i * 3 * maxstepSize + 2 * maxstepSize + j;
                F a1 = Gm_x1.GetValue(get_index(index1, 0));
                F a2 = Gm_x1.GetValue(get_index(index2, 0));
                F a3 = Gm_x1.GetValue(get_index(index3, 0));
                F b1 = Gm_x2.GetValue(get_index(index1, 1));
                F b2 = Gm_x2.GetValue(get_index(index2, 1));
                F b3 = Gm_x2.GetValue(get_index(index3, 1));
                auto result1 = a2 * b3 - a3 * b2;
                auto result2 = a3 * b1 - a1 * b3;
                auto result3 = a1 * b2 - a2 * b1;
                Gm_y(index1) =  (T)result1;
                Gm_y(index2) = (T)result2;
                Gm_y(index3) = (T)result3;
            }
        }
    }

使用如下数据进行验证

  x1_tensor = np.random.uniform(-1e9, 1e9, [4, 3, 6, 5, 7, 8, 3]).astype(np.int8)
  x2_tensor = np.random.uniform(-1e9, 1e9, [1, 3, 1, 1, 1, 1, 1]).astype(np.int8)