NET9的新特性、图像处理、使用AI优化代码、优化AI给出的代码-CSDN博客

本文链接：https://blog.csdn.net/withcsharp2/article/details/148491362

起因

在群里有网友说C#的图像处理比较慢，并给出了一个3440*1440 的图片，要求进行半径为30的模糊处理。

俺看到这个问题时第一时间就想起了NET9的新特性：

System.Drawing.Imaging.Effects命名空間包含您可以套用的效果：

代码就一行 img.ApplyEffect(new BlurEffect(30, false));

执行速度是560毫秒，（同样的这行代码，有网友执行的速度是286毫秒，俺觉得可能是俺的机器配置低，下面的所有测试数据也是在俺的机器上执行的，可能看上去有点慢）

private void button1_Click(object sender, EventArgs e)
{ 
    Stopwatch stopwatch = Stopwatch.StartNew();
    img.ApplyEffect(new BlurEffect(30, false));
    label1.Text = ($"执行时间: {stopwatch.ElapsedMilliseconds} 毫秒");
    pictureBox1.Image = img;
    pictureBox1.Refresh();
}

事情到了这里，好像有了一个不错的结果，是否收工？

俺的回答：并不是，俺还要继续努力。

网友说他用的是.NET Framework 4.52

那么没关系，我们按选CTRL键点击ApplyEffect 查看定义

看到了GdipBitmapApplyEffect这个函数名，是不是有点眼熟啊，这不就是gdiplus的吗。其实有了函数名，搜定义很方便，附件中有俺搜的代码

[DllImport( "gdiplus.dll",SetLastError = true, ExactSpelling = true,CharSet = CharSet.Unicode)]
private static extern int GdipBitmapApplyEffect(IntPtr bitmap, IntPtr effect, ref Rectangle rectOfInterest, bool useAuxData, IntPtr auxData, int auxDataSize);

执行 598毫秒，和NET9的 img.ApplyEffect(new BlurEffect(30, false)); 速度基本一样。

这样在就可以在很低的.NET Framework，也可以使用了。

事情到了这里，好像有了一个不错的结果，是否收工？

俺的回答：并不是，俺还要继续努力。

开始学习

我们通过手撸代码开始我们的学习。先上结果：‌

‌框架‌	‌代码‌	‌编译‌	‌并发‌	时间
‌NET9	ApplyEffect(new BlurEffect(30, false))	64位发布版	无	560毫秒
.NET Framework 4	DllImport "gdiplus.dll"‌	64位	无	598毫秒
D‌elphi	Graphics32 FastBlur	32位	无	690毫秒
.NET Framework 4	手撸 ‌FastBlurEx.Apply(img, 30, 1)‌	64位发布版	无	684毫秒
.NET Framework 4	手撸 FastBlurEx.Apply(img, 30, 6)	64位发布版	6	273毫秒

从上面数据可以看出，C# .NET Framework 4 在 64位编译下，还是不错的。

虽然和C++（gdiplus.dll）有些差距，684毫秒 VS 598毫秒 慢了14%

但是C#通过Parallel.For 很容易的实现并发，又比gdiplus.dll快很多 273毫秒 VS 598毫秒

首先从高斯模糊开始学习

以下是8种主流模糊算法的特性、性能及适用场景综合对比，基于工业实践及性能测试数据整理而成：

‌模糊算法特性对比表‌

‌算法‌	‌实时性‌	‌时间复杂度‌	‌视觉特点‌	‌典型应用场景‌
‌高斯模糊‌ (Gaussian Blur)	中低（大核时差）	O(n²·k)	平滑自然，边缘过渡柔和	景深散景、高精度Bloom特效
‌盒式模糊‌ (Box Blur)	‌极高‌	O(n)（滑动窗口优化）	均匀模糊，易产生块状伪影	移动端UI实时模糊、低性能设备
‌Kawase模糊‌	高	O(n²·log k)	接近高斯模糊，多次迭代后精度提升	全屏Bloom、光晕特效
‌双重Kawase模糊‌ (Dual Kawase)	‌极高‌	O(n²·log k)	大范围模糊无失真，优于单次Kawase	4K分辨率下实时Bloom
‌径向模糊‌ (Radial Blur)	中	O(n²·r)（r为半径）	沿中心扩散，模拟光源散射	镜头眩光、速度线特效
‌表面模糊‌ (Surface Blur)	低	O(n²·k²)	保边平滑，抑制平坦区噪声	人像磨皮、医学图像去噪
‌中值模糊‌ (Median Blur)	中高	O(n²·log k)	椒盐噪声抑制强，边缘保留中等	传感器去噪、旧照片修复
‌运动模糊‌ (Motion Blur)	低（动态场景差）	O(n²·d)（d为方向数）	定向拖影，模拟物体高速移动	赛车游戏、动态UI特效

‌关键性能与场景深度分析‌

‌移动端首选方案‌
- ‌盒式模糊‌：3次迭代可逼近90%高斯效果，512×512图像处理≤5ms（骁龙8 Gen2）
- ‌双重Kawase‌：1080p全屏模糊稳定60fps，性能为高斯模糊的2倍以上
‌高质量渲染场景‌
- ‌高斯模糊‌：小核（σ≤2）时细节保留最佳，适合精细Bloom混合
- ‌表面模糊‌：阈值自适应保护纹理，人像处理避免皮肤塑料感
‌极端性能优化‌
- ‌分离高斯卷积‌：二维拆分为两次一维卷积，计算量降至O(2nk)
- ‌盒式+下采样‌：先1/4降分辨率再模糊，性能提升4×（牺牲边缘锐度）
‌物理模拟场景‌
- ‌径向模糊‌：光源散射需动态调整半径，实时性依赖GPU并行
- ‌运动模糊‌：需结合光流法估计运动方向，计算开销大

‌注‌：工业实践常组合使用算法，如Bloom采用 ‌双重Kawase预模糊 + 高斯精细混合‌，人像处理采用 ‌表面模糊保边 + 中值滤波去斑‌。

其实高斯模糊有两种

代码实现

写这种算法的代码其实可以用AI的。提示词如下：

/*
你是一个c# 程序员，请写一个对bitmap 进行 FastBlur 处理的类。使用的框架是.net framework 4.8 ，使用unsafe 指针提高速度 ,for循环使用并发处理。
处理时使用先 horizontal pass 。然后 update the remaining pixels in the row
颜色的处理使用求和再除以数量的快速处理。
*/

AI会给出一个能够编译运行的代码（见附件 FastBlur）。

单线程执行 1846毫秒，6并发 703毫秒。

事情到了这里，好像有了一个不错的结果，是否收工？

俺的回答：并不是，俺还要继续努力。

我们先来分析AI给出的代码

private static unsafe void HorizontalPass(byte* src, byte* dest, int width, int height, int stride,
int radius, int y)
{
int kernelSize = radius * 2 + 1;
int kernelRadius = radius;

for (int x = 0; x < width; x++)
{
int bSum = 0, gSum = 0, rSum = 0, aSum = 0;
int count = 0;
for (int i = -kernelRadius; i <= kernelRadius; i++)
{
int px = Math.Min(Math.Max(x + i, 0), width - 1);
int offset = y * stride + px * 4;

bSum += src[offset];
gSum += src[offset + 1];
rSum += src[offset + 2];
aSum += src[offset + 3];
count++;
}
int destOffset = y * stride + x * 4;
dest[destOffset] = (byte)(bSum / count);
dest[destOffset + 1] = (byte)(gSum / count);
dest[destOffset + 2] = (byte)(rSum / count);
dest[destOffset + 3] = (byte)(aSum / count);
}
}

AI 给出的代码，没啥大问题，就是没有发挥指针的性能。所以我们调整一下提示词：

加了2句提示词，这时AI给出代码是这样的。

int kernelSize = radius * 2 + 1;
int* rowStart = src + y * stride;
int* rowEnd = rowStart + width;
int* destRow = dest + y * stride;
int count = kernelSize;

for (int* px = rowStart; px < rowEnd; px++)
{
int bSum = 0, gSum = 0, rSum = 0;
int* kernelStart = px - radius;
int* kernelEnd = px + radius;
for (int* k = kernelStart; k <= kernelEnd; k++)
{
int* sample = k;
if (sample < rowStart) sample = rowStart;
if (sample >= rowEnd) sample = rowEnd - 1;
byte* channels = (byte*)sample;
bSum += channels[0];
gSum += channels[1];
rSum += channels[2];
}

byte* destChannels = (byte*)(destRow + (px - rowStart));
destChannels[0] = (byte)(bSum / count);
destChannels[1] = (byte)(gSum / count);
destChannels[2] = (byte)(rSum / count);
destChannels[3] = 255;
}

符合俺对for循环的要求。

事情到了这里，好像有了一个不错的结果，是否收工？

俺的回答：并不是，俺还要继续努力。

这段代码明显还有优化的空间。代码中的下面2句其实有很大的浪费，因为这个只是头尾两边的处理，中间大段的地方不需要这2句。

if (sample < rowStart) sample = rowStart;
if (sample >= rowEnd) sample = rowEnd - 1;

那么优化也很简单，分开处理一下，分成3段处理

for (int* px = rowStart; px < rowStart + radius; px++)

for (int* px = rowStart+ radius; px < rowEnd- radius; px++)

for (int* px = rowEnd - radius; px < rowEnd ; px++)

中间段改为

for (int* px = rowStart+ radius; px < rowEnd- radius; px++)
{
。。。
for (int* k = kernelStart; k <= kernelEnd; k++)
{
byte* channels = (byte*)k;
bSum += channels[0];
gSum += channels[1];
rSum += channels[2];
}

这样计算量最大的地方，速度优化了不少。

速度也达到了单线程执行 684毫秒，6并发 273毫秒

其实还有很多可以优化的地方，只是上面的优化，改动小，效果显著。

后来

后来又和网友讨论显示，显示又很多方式：

搜 “DirectX blur” 有非常多的教程。

或者直接css，在现在的浏览器中 <img style="filter: blur(30px); " src="test.jpg" /> 也是极快的。

这些都可以有很好的实时性能。

附件

GdipEffect

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Runtime.InteropServices;
using System.Drawing;
using System.Reflection;
namespace GdipEffect
{
    // ***************** GDI+ Effect函数的示例代码 *********************
    // 作者     ： laviewpbt 
    // 作者简介 ： 对图像处理（非识别）有着较深程度的理解
    // 使用语言 ： VB6.0/C#/VB.NET
    // 联系方式 ： QQ-33184777  E-Mail:laviewpbt@sina.com
    // 开发时间 ： 2012.12.10-2012.12.12
    // 致谢     ： Aaron Lee Murgatroyd
    // 版权声明 ： 复制或转载请保留以上个人信息
    // *****************************************************************

    public static class Effect
    {
        private static Guid BlurEffectGuid = new Guid("{633C80A4-1843-482B-9EF2-BE2834C5FDD4}");
        private static Guid UsmSharpenEffectGuid = new Guid("{63CBF3EE-C526-402C-8F71-62C540BF5142}");

        [StructLayout(LayoutKind.Sequential)]
        private struct BlurParameters
        {
            internal float Radius;
            internal bool ExpandEdges;
        }

        [StructLayout(LayoutKind.Sequential)]
        private struct SharpenParams
        {
            internal float Radius;
            internal float Amount;
        }

        internal enum PaletteType               // GDI+1.1还可以针对一副图像获取某种特殊的调色
        {
            PaletteTypeCustom = 0,
            PaletteTypeOptimal = 1,
            PaletteTypeFixedBW = 2,
            PaletteTypeFixedHalftone8 = 3,
            PaletteTypeFixedHalftone27 = 4,
            PaletteTypeFixedHalftone64 = 5,
            PaletteTypeFixedHalftone125 = 6,
            PaletteTypeFixedHalftone216 = 7,
            PaletteTypeFixedHalftone252 = 8,
            PaletteTypeFixedHalftone256 = 9
        };

        internal enum DitherType                    // 这个主要用于将真彩色图像转换为索引图像，并尽量减低颜色损失
        {
            DitherTypeNone = 0,
            DitherTypeSolid = 1,
            DitherTypeOrdered4x4 = 2,
            DitherTypeOrdered8x8 = 3,
            DitherTypeOrdered16x16 = 4,
            DitherTypeOrdered91x91 = 5,
            DitherTypeSpiral4x4 = 6,
            DitherTypeSpiral8x8 = 7,
            DitherTypeDualSpiral4x4 = 8,
            DitherTypeDualSpiral8x8 = 9,
            DitherTypeErrorDiffusion = 10
        }


        [DllImport("gdiplus.dll",SetLastError = true, ExactSpelling = true,CharSet = CharSet.Unicode)]
        private static extern int GdipCreateEffect(Guid guid, out IntPtr effect);

        [DllImport("gdiplus.dll",SetLastError = true, ExactSpelling = true,CharSet = CharSet.Unicode)]
        private static extern int GdipDeleteEffect(IntPtr effect);

        [DllImport("gdiplus.dll", SetLastError = true, ExactSpelling = true, CharSet = CharSet.Unicode)]
        private static extern int GdipGetEffectParameterSize(IntPtr effect, out uint size);

        [DllImport("gdiplus.dll",SetLastError = true, ExactSpelling = true, CharSet = CharSet.Unicode)]
        private static extern int GdipSetEffectParameters(IntPtr effect, IntPtr parameters, uint size);
       
        [DllImport("gdiplus.dll",SetLastError = true, ExactSpelling = true,CharSet = CharSet.Unicode)]
        private static extern int GdipGetEffectParameters(IntPtr effect, ref uint size, IntPtr parameters);
  
        [DllImport( "gdiplus.dll",SetLastError = true, ExactSpelling = true,CharSet = CharSet.Unicode)]
        private static extern int GdipBitmapApplyEffect(IntPtr bitmap, IntPtr effect, ref Rectangle rectOfInterest, bool useAuxData, IntPtr auxData, int auxDataSize);

        [DllImport("gdiplus.dll",SetLastError = true, ExactSpelling = true,CharSet = CharSet.Unicode)]
        private static extern int GdipBitmapCreateApplyEffect(ref IntPtr SrcBitmap, int numInputs, IntPtr effect, ref Rectangle rectOfInterest, ref Rectangle outputRect, out IntPtr outputBitmap, bool useAuxData, IntPtr auxData, int auxDataSize);

        
        // 这个函数我在C#下已经调用成功
        [DllImport("gdiplus.dll", SetLastError = true, ExactSpelling = true, CharSet = CharSet.Unicode)]
        private static extern int GdipInitializePalette(IntPtr palette, int palettetype, int optimalColors, int useTransparentColor, int bitmap);

        // 该函数一致不成功，不过我在VB6下调用很简单，也很成功，主要是结构体的问题。
        [DllImport("gdiplus.dll", SetLastError = true, ExactSpelling = true, CharSet = CharSet.Unicode)]
        private static extern int GdipBitmapConvertFormat(IntPtr bitmap, int pixelFormat, int dithertype, int palettetype, IntPtr palette, float alphaThresholdPercent);

        /// <summary>
        /// 获取对象的私有字段的值，感谢Aaron Lee Murgatroyd
        /// </summary>
        /// <typeparam name="TResult">字段的类型</typeparam>
        /// <param name="obj">要从其中获取字段值的对象</param>
        /// <param name="fieldName">字段的名称.</param>
        /// <returns>字段的值</returns>
        /// <exception cref="System.InvalidOperationException">无法找到该字段.</exception>
        /// 
        internal static TResult GetPrivateField<TResult>(this object obj, string fieldName)
        {
            if (obj == null) return default(TResult);
            Type ltType = obj.GetType();
            FieldInfo lfiFieldInfo = ltType.GetField( fieldName,System.Reflection.BindingFlags.GetField |System.Reflection.BindingFlags.Instance |System.Reflection.BindingFlags.NonPublic);
            if (lfiFieldInfo != null)
                return (TResult)lfiFieldInfo.GetValue(obj);
            else
                throw new InvalidOperationException(string.Format("Instance field '{0}' could not be located in object of type '{1}'.",fieldName, obj.GetType().FullName));
        }

        public static IntPtr NativeHandle(this Bitmap Bmp)
        {
            return Bmp.GetPrivateField<IntPtr>("nativeImage");
            /*  用Reflector反编译System.Drawing.Dll可以看到Image类有如下的私有字段
                internal IntPtr nativeImage;
                private byte[] rawData;
                private object userData;
                然后还有一个 SetNativeImage函数
                internal void SetNativeImage(IntPtr handle)
                {
                    if (handle == IntPtr.Zero)
                    {
                        throw new ArgumentException(SR.GetString("NativeHandle0"), "handle");
                    }
                    this.nativeImage = handle;
                }
                这里在看看FromFile等等函数，其实也就是调用一些例如GdipLoadImageFromFile之类的GDIP函数，并把返回的GDIP图像句柄
                通过调用SetNativeImage赋值给变量nativeImage，因此如果我们能获得该值，就可以调用VS2010暂时还没有封装的GDIP函数
                进行相关处理了，并且由于.NET肯定已经初始化过了GDI+，我们也就无需在调用GdipStartup初始化他了。
             */
        }

        /// <summary>
        /// 对图像进行高斯模糊,参考：http://msdn.microsoft.com/en-us/library/ms534057(v=vs.85).aspx
        /// </summary>
        /// <param name="Rect">需要模糊的区域，会对该值进行边界的修正并返回.</param>
        /// <param name="Radius">指定高斯卷积核的半径，有效范围[0，255],半径越大，图像变得越模糊.</param>
        /// <param name="ExpandEdge">指定是否对边界进行扩展，设置为True，在边缘处可获得较为柔和的效果. </param>
            
        public static void GaussianBlur(this Bitmap Bmp, ref Rectangle Rect, float Radius = 10, bool ExpandEdge = false)
        {
            int Result;
            IntPtr BlurEffect;
            BlurParameters BlurPara;
            if ((Radius <0) || (Radius>255)) 
            {
                throw new ArgumentOutOfRangeException("半径必须在[0,255]范围内");
            }
            BlurPara.Radius = Radius ;
            BlurPara.ExpandEdges = ExpandEdge;
            Result = GdipCreateEffect(BlurEffectGuid, out BlurEffect);
            if (Result == 0)
            {
                IntPtr Handle = Marshal.AllocHGlobal(Marshal.SizeOf(BlurPara));
                Marshal.StructureToPtr(BlurPara, Handle, true);
                GdipSetEffectParameters(BlurEffect, Handle, (uint)Marshal.SizeOf(BlurPara));
                GdipBitmapApplyEffect(Bmp.NativeHandle(), BlurEffect, ref Rect, false, IntPtr.Zero, 0);
                // 使用GdipBitmapCreateApplyEffect函数可以不改变原始的图像，而把模糊的结果写入到一个新的图像中
                GdipDeleteEffect(BlurEffect);
                Marshal.FreeHGlobal(Handle);
            }
            else
            {
                throw new ExternalException("不支持的GDI+版本，必须为GDI+1.1及以上版本，且操作系统要求为Win Vista及之后版本.");
            }
        }


        /// <summary>
        /// 对图像进行锐化,参考：http://msdn.microsoft.com/en-us/library/ms534073(v=vs.85).aspx
        /// </summary>
        /// <param name="Rect">需要锐化的区域，会对该值进行边界的修正并返回.</param>
        /// <param name="Radius">指定高斯卷积核的半径，有效范围[0，255],因为这个锐化算法是以高斯模糊为基础的，所以他的速度肯定比高斯模糊妈妈</param>
        /// <param name="ExpandEdge">指定锐化的程度，0表示不锐化。有效范围[0,255]. </param>
        /// 
        public static void UsmSharpen(this Bitmap Bmp, ref Rectangle Rect, float Radius = 10, float Amount = 50f)
        {
            int Result;
            IntPtr UnSharpMaskEffect;
            SharpenParams sharpenParams;
            if ((Radius < 0) || (Radius > 255))
            {
                throw new ArgumentOutOfRangeException("参数Radius必须在[0,255]范围内");
            }
            if ((Amount < 0) || (Amount > 100))
            {
                throw new ArgumentOutOfRangeException("参数Amount必须在[0,255]范围内");
            }
            sharpenParams.Radius = Radius;
            sharpenParams.Amount = Amount;
            Result = GdipCreateEffect(UsmSharpenEffectGuid, out UnSharpMaskEffect);
            if (Result == 0)
            {
                IntPtr Handle = Marshal.AllocHGlobal(Marshal.SizeOf(sharpenParams));
                Marshal.StructureToPtr(sharpenParams, Handle, true);
                GdipSetEffectParameters(UnSharpMaskEffect, Handle, (uint)Marshal.SizeOf(sharpenParams));
                GdipBitmapApplyEffect(Bmp.NativeHandle(), UnSharpMaskEffect, ref Rect, false, IntPtr.Zero, 0);
                GdipDeleteEffect(UnSharpMaskEffect);
                Marshal.FreeHGlobal(Handle);
            }
            else
            {
                throw new ExternalException("不支持的GDI+版本，必须为GDI+1.1及以上版本，且操作系统要求为Win Vista及之后版本.");
            }
        }
    }
}

FastBlur


using System;
using System.Drawing;
using System.Drawing.Imaging;
using System.Runtime.InteropServices;
using System.Threading.Tasks;

public unsafe static class FastBlur
{

    /*
你是一个c# 程序员，请写一个 对bitmap 进行 FastBlur 处理的类。使用的框架是.net framework 4.8 ，使用unsafe 指针提高速度 ,for循环使用并发处理。
处理时使用 先  horizontal pass 。 然后 update the remaining pixels in the row
颜色的处理使用求和再除以数量的快速处理。     
     */
    public static Bitmap Apply(Bitmap image, int radius, int parallelDegree = 4)
    {
        if (radius < 1) return image;

        var srcData = image.LockBits(
            new Rectangle(0, 0, image.Width, image.Height),
            ImageLockMode.ReadOnly,
            PixelFormat.Format32bppArgb);

        Bitmap blurred = new Bitmap(image.Width, image.Height);
        var destData = blurred.LockBits(
            new Rectangle(0, 0, blurred.Width, blurred.Height),
            ImageLockMode.WriteOnly,
            PixelFormat.Format32bppArgb);

        try
        {
            byte* srcPtr = (byte*)srcData.Scan0;
            byte* destPtr = (byte*)destData.Scan0;
            int width = image.Width;
            int height = image.Height;
            int stride = srcData.Stride;

            // 临时缓冲区
            byte* tempBuffer = (byte*)Marshal.AllocHGlobal(stride * height);

            try
            {
                // 水平模糊处理
                Parallel.For(0, height, new ParallelOptions { MaxDegreeOfParallelism = parallelDegree },
                    y => HorizontalPass(srcPtr, tempBuffer, width, height, stride, radius, y));

                // 垂直模糊处理
                Parallel.For(0, width, new ParallelOptions { MaxDegreeOfParallelism = parallelDegree },
                    x => VerticalPass(tempBuffer, destPtr, width, height, stride, radius, x));
            }
            finally
            {
                Marshal.FreeHGlobal((IntPtr)tempBuffer);
            }
        }
        finally
        {
            image.UnlockBits(srcData);
            blurred.UnlockBits(destData);
        }

        return blurred;
    }

    private static unsafe void HorizontalPass(byte* src, byte* dest, int width, int height, int stride,
        int radius, int y)
    {
        int kernelSize = radius * 2 + 1;
        int kernelRadius = radius;

        for (int x = 0; x < width; x++)
        {
            int bSum = 0, gSum = 0, rSum = 0, aSum = 0;
            int count = 0;   
            for (int i = -kernelRadius; i <= kernelRadius; i++)
            {
                int px = Math.Min(Math.Max(x + i, 0), width - 1);
                int offset = y * stride + px * 4;

                bSum += src[offset];
                gSum += src[offset + 1];
                rSum += src[offset + 2];
                aSum += src[offset + 3];
                count++;
            } 
            int destOffset = y * stride + x * 4;
            dest[destOffset] = (byte)(bSum / count);
            dest[destOffset + 1] = (byte)(gSum / count);
            dest[destOffset + 2] = (byte)(rSum / count);
            dest[destOffset + 3] = (byte)(aSum / count);
        }
    }

    private static unsafe void VerticalPass(byte* src, byte* dest, int width, int height, int stride,
        int radius, int x)
    {
        int kernelSize = radius * 2 + 1;
        int kernelRadius = radius;

        for (int y = 0; y < height; y++)
        {
            int bSum = 0, gSum = 0, rSum = 0, aSum = 0;
            int count = 0;

            for (int i = -kernelRadius; i <= kernelRadius; i++)
            {
                int py = Math.Min(Math.Max(y + i, 0), height - 1);
                int offset = py * stride + x * 4;

                bSum += src[offset];
                gSum += src[offset + 1];
                rSum += src[offset + 2];
                aSum += src[offset + 3];
                count++;
            }

            int destOffset = y * stride + x * 4;
            dest[destOffset] = (byte)(bSum / count);
            dest[destOffset + 1] = (byte)(gSum / count);
            dest[destOffset + 2] = (byte)(rSum / count);
            dest[destOffset + 3] = (byte)(aSum / count);
        }
    }
}

FastBlurEx

using System;
using System.Collections.Generic;
using System.Drawing;
using System.Drawing.Imaging;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading.Tasks;

public unsafe static class FastBlurEx
{
    /*
     
你是一个c# 程序员，请写一个 对bitmap 进行 FastBlur 处理的类。使用的框架是.net framework 4.8 ，使用unsafe 指针提高速度 ,for循环使用并发处理。
处理时使用 先  horizontal pass 。 然后 update the remaining pixels in the row
颜色的处理使用求和再除以数量的快速处理。
对于颜色的处理 int* 指针，一次进行4个字节的操作。
for循环的变量也使用指针。
     */
    public static Bitmap Apply(Bitmap image, int radius, int parallelDegree = 4)
    {
        if (radius < 1) return image;

        var srcData = image.LockBits(
            new Rectangle(0, 0, image.Width, image.Height),
            ImageLockMode.ReadOnly,
            PixelFormat.Format32bppArgb);

        Bitmap blurred = new Bitmap(image.Width, image.Height);
        var destData = blurred.LockBits(
            new Rectangle(0, 0, blurred.Width, blurred.Height),
            ImageLockMode.WriteOnly,
            PixelFormat.Format32bppArgb);

        try
        {
            int* srcPtr = (int*)srcData.Scan0;
            int* destPtr = (int*)destData.Scan0;
            int width = image.Width;
            int height = image.Height;
            int stride = srcData.Stride / 4; // 转换为int*的步长

            // 临时缓冲区
            int* tempBuffer = (int*)Marshal.AllocHGlobal(height * stride * 4);

            try
            {
                // 水平模糊处理
                Parallel.For(0, height, new ParallelOptions { MaxDegreeOfParallelism = parallelDegree },
                    y => HorizontalPass(srcPtr, tempBuffer, width, height, stride, radius, y));

                // 垂直模糊处理
                Parallel.For(0, width, new ParallelOptions { MaxDegreeOfParallelism = parallelDegree },
                    x => VerticalPass(tempBuffer, destPtr, width, height, stride, radius, x));
            }
            finally
            {
                Marshal.FreeHGlobal((IntPtr)tempBuffer);
            }
        }
        finally
        {
            image.UnlockBits(srcData);
            blurred.UnlockBits(destData);
        }

        return blurred;
    }

    private static unsafe void HorizontalPass(int* src, int* dest, int width, int height, int stride,
        int radius, int y)
    {
        int kernelSize = radius * 2 + 1;
        int* rowStart = src + y * stride;
        int* rowEnd = rowStart + width;
        int* destRow = dest + y * stride;
        int count = kernelSize;

        for (int* px = rowStart; px < rowStart + radius; px++)
        {
            int bSum = 0, gSum = 0, rSum = 0; 
            int* kernelStart = px - radius;
            int* kernelEnd = px + radius;
            for (int* k = kernelStart; k <= kernelEnd; k++)
            {
                int* sample = k;
                if (sample < rowStart) sample = rowStart;
                if (sample >= rowEnd) sample = rowEnd - 1;
                byte* channels = (byte*)sample;
                bSum += channels[0];
                gSum += channels[1];
                rSum += channels[2];  
            }

            byte* destChannels = (byte*)(destRow + (px - rowStart));
            destChannels[0] = (byte)(bSum / count);
            destChannels[1] = (byte)(gSum / count);
            destChannels[2] = (byte)(rSum / count);
            destChannels[3] = 255;
        }


        for (int* px = rowStart+ radius; px < rowEnd- radius; px++)
        {
            int bSum = 0, gSum = 0, rSum = 0; 
            int* kernelStart = px - radius;
            int* kernelEnd = px + radius;
            for (int* k = kernelStart; k <= kernelEnd; k++)
            {
                byte* channels = (byte*)k;
                bSum += channels[0];
                gSum += channels[1];
                rSum += channels[2];    
            } 
            byte* destChannels = (byte*)(destRow + (px - rowStart));
            destChannels[0] = (byte)(bSum / count);
            destChannels[1] = (byte)(gSum / count);
            destChannels[2] = (byte)(rSum / count);
            destChannels[3] = 255;
        }

        for (int* px = rowEnd - radius; px < rowEnd ; px++)
        {
            int bSum = 0, gSum = 0, rSum = 0;          
            int* kernelStart = px - radius;
            int* kernelEnd = px + radius;
            for (int* k = kernelStart; k <= kernelEnd; k++)
            {
                int* sample = k;
                if (sample < rowStart) sample = rowStart;
                if (sample >= rowEnd) sample = rowEnd - 1;
                byte* channels = (byte*)sample;
                bSum += channels[0];
                gSum += channels[1];
                rSum += channels[2]; 
            }

            byte* destChannels = (byte*)(destRow + (px - rowStart));
            destChannels[0] = (byte)(bSum / count);
            destChannels[1] = (byte)(gSum / count);
            destChannels[2] = (byte)(rSum / count);
            destChannels[3] = 255;
        }


    }

    private static unsafe void VerticalPass(int* src, int* dest, int width, int height, int stride,
        int radius, int x)
    {
        int kernelSize = radius * 2 + 1;
        int* colStart = src + x;
        int* colEnd = src + (height - 1) * stride + x;
        int count = kernelSize;

        for (int y = 0; y < radius; y++)
        {
            int bSum = 0, gSum = 0, rSum = 0;
            int* kernelStart = colStart + (y - radius) * stride;
            int* kernelEnd = colStart + (y + radius) * stride;

            for (int* k = kernelStart; k <= kernelEnd; k += stride)
            {
                int* sample = k;
                if (sample < colStart) sample = colStart;
                if (sample > colEnd) sample = colEnd;

                byte* channels = (byte*)sample;
                bSum += channels[0];
                gSum += channels[1];
                rSum += channels[2];
            }

            byte* destChannels = (byte*)(dest + y * stride + x);
            destChannels[0] = (byte)(bSum / count);
            destChannels[1] = (byte)(gSum / count);
            destChannels[2] = (byte)(rSum / count);
            destChannels[3] = 255;
        }

        for (int y = radius; y < height- radius; y++)
        {
            int bSum = 0, gSum = 0, rSum = 0; 
            int* kernelStart = colStart + (y - radius) * stride;
            int* kernelEnd = colStart + (y + radius) * stride;

            for (int* k = kernelStart; k <= kernelEnd; k += stride)
            {
                byte* channels = (byte*)k;
                bSum += channels[0];
                gSum += channels[1];
                rSum += channels[2];  
            }

            byte* destChannels = (byte*)(dest + y * stride + x);
            destChannels[0] = (byte)(bSum / count);
            destChannels[1] = (byte)(gSum / count);
            destChannels[2] = (byte)(rSum / count);
            destChannels[3] = 255;
        }

        for (int y = height - radius; y < height; y++)
        {
            int bSum = 0, gSum = 0, rSum = 0;
            int* kernelStart = colStart + (y - radius) * stride;
            int* kernelEnd = colStart + (y + radius) * stride;

            for (int* k = kernelStart; k <= kernelEnd; k += stride)
            {
                int* sample = k;
                if (sample < colStart) sample = colStart;
                if (sample > colEnd) sample = colEnd;

                byte* channels = (byte*)sample;
                bSum += channels[0];
                gSum += channels[1];
                rSum += channels[2];
            }

            byte* destChannels = (byte*)(dest + y * stride + x);
            destChannels[0] = (byte)(bSum / count);
            destChannels[1] = (byte)(gSum / count);
            destChannels[2] = (byte)(rSum / count);
            destChannels[3] = 255;
        }
    }
}