C#-FTP路径中文字符编码、解码

m0_72908360

已于 2023-12-21 09:14:49 修改

阅读量1.5k

点赞数 21

文章标签： c#

于 2023-12-21 09:09:29 首次发布

本文链接：https://blog.csdn.net/m0_72908360/article/details/135121794

版权

背景：在访问FTP地址时，英文字符将会维持不变，中文字符将会被编码（格式%XX），具体编码由FTP服务器的语言及其所支持的编码设置有关。例如Windows10语言设置为中文，中文字符将会被编码为GB2312.

一、问题点

读文本之前，以下两个问题点需要了解

1、问题点一

本文中的方法都经过测试，且输出链接可在浏览器、Windows资源管理器测试可用。但是需要注意的是C# 提供的System.Uri 类对文本的格式化有问题：

a、将我主动编码后的链接加入额外字符（比如往中间插了个单引号），public Uri(string uriString, bool dontEscape); 填写true还是存在这个问题

b、填入的链接含有中文字符时，它会自动将中文编码为UTF-8码（格式%xx），而国内大部分系统都设置为中文只支持GB2312码

2、问题点二

本文方法生成的链接，不可用。其实原因不在于我写的方法，而是别的原因。

原因：

首先在 “ftp://192.168.250.50:9001/Folder/测试文件夹1/测试文件夹二” 存在的前提下，在资源管理器直接将其输入，发现可以打开。这时再点击输入栏，发现它已经自动将其编码为ftp://192.168.250.50:9001/Folder/%B2%E2%CA%D4%CE%C4%BC%FE%BC%D01/%B2%E2%CA%D4%CE%C4%BC%FE%BC%D0%B6%FE/，我们复制这条链接，把输入框清空，再把复制的这条链接粘贴回去进行访问，却发现访问不了了，还出现这么一条报错："451 No mapping for the Unicode character exists in the target multi-byte code page."

这是编码设置冲突：链接中文字符编码跟随系统，设置为中文就按照GB2312，但是FTP服务器默认支持UTF-8，这就导致了冲突。

如何关闭：

这是FTP服务器支持的编码设置的问题，控制面板->系统和安全->管理工具->Internet Information Services (IIS)管理器，找到你的FTP服务->高级设置->把"允许 UTF8"设置为false，保存设置后重启服务，然后就可以正常访问了。

二、UTF-8

1、仅中文字符转UTF-8码

        /// <summary>
        /// 解码（针对C# Uri类型仅将中文字符进行了编码的情况，本方法将UTF-8码转中文字符。1、暂只支持UTF-8。2、暂不支持生僻汉字）
        /// </summary>
        /// <param name="url"></param>
        /// <returns></returns>
        //例：ftp://192.168.250.50:9001/2023-12-19/%E5%B0%BA%E5%AF%B8%E6%B5%8B%E9%87%8F/%E7%A8%8B%E5%BA%8F1
        //输出：ftp://192.168.250.50:9001/2023-12-19/尺寸检测/程序1
        public static string Decode(string input)
        {
            try
            {
                //string pattern = @"(?<=%)([A-Z]+\d+)|(?<=%)(\d+[A-Z]+)|(?<=%)(\d+\d+)|(?<=%)([A-Z]+[A-Z]+)";
                string pattern = @"(?<=%)([A-Z]{1}\d{1})|(?<=%)(\d{1}[A-Z]{1})|(?<=%)(\d{1}\d{1})|(?<=%)([A-Z]{1}[A-Z]{1})|(?<=%)([a-z]{1}\d{1})|(?<=%)(\d{1}[a-z]{1})|(?<=%)(\d{1}\d{1})|(?<=%)([a-z]{1}[a-z]{1})";//抓取字节，格式%XX
                MatchCollection matchCollection = Regex.Matches(input, pattern);
                if (matchCollection.Count > 0)
                {
                    //string[] strArr = (from cell in input.Split('%') where !string.IsNullOrWhiteSpace(cell) select cell).ToArray();
                    Match[] matches = new Match[matchCollection.Count];
                    matchCollection.CopyTo(matches, 0);//正则匹配结果填充到matches
                    byte[] byteArr = new byte[matches.Length];
                    for (int index = 0; index < matches.Length; index++)
                    {
                        byteArr[index] = Convert.ToByte(matches[index].Value, 16);//将文本形式展现的字节转byte
                    }
                    string content = System.Text.Encoding.UTF8.GetString(byteArr);//byte[]转中文
                    //return content;


                    //字符替换。中文字符UTF-8码占用3个字节
                    string output = input;
                    string[] temporary = new string[matches.Length / 3];
                    for (int index = 0; index < temporary.Length; index++)
                    {
                        int offset = index * 3;
                        temporary[index] = "%" + matches[offset] + "%" + matches[offset + 1] + "%" + matches[offset + 2];//每3个字节一组。%E5%B0%BA
                    }
                    for (int index = 0; index < temporary.Length; index++)//字符截取替换。比如把 "%E5%B0%BA" 替换 为 "尺"
                    {
                        int startIndex = output.IndexOf(temporary[index]);
                        int Length = temporary[index].Length;
                        output = combine(output.Take(startIndex)) + content[index].ToString() + combine(output.Skip(startIndex + Length));
                    }
                    return output;
                }
                else
                {
                    return "";
                }
            }
            catch (Exception exception)
            {
                return "";
            }
        }
        public static string combine(IEnumerable<char> enumerable)
        {
            string text = "";
            foreach (char c in enumerable)
            {
                text += c.ToString();
            }
            return text;
        }

三、GB2312

1、仅中文字符转GB2312码

注意：C# string 默认的Unicode是UTF-16

        //编码中文字符（UTF-16转GB2312），以应用于Uri
        //仅仅支持2字节的常用汉字，暂不支持4字节及其以上的生僻汉字
        //例：文件夹名：测试A哦-C啊C
        //%B2%E2%CA%D4A%C5%B6-C%B0%A1C（浏览器编码）
        //%b2%e2%ca%d4A%c5%b6-C%b0%a1C（本方法输出编码）
        public static string EncodeChinese(string input)
        {
            Encoding GB2312 = Encoding.GetEncoding("GB2312");
            string output = "";
            foreach (char c in input)//遍历input字符串
            {
                UInt16 utf16code = Convert.ToUInt16(c);//获取字符的UTF-16编码
                if (0x4E00 <= utf16code && utf16code <= 0x9FA5)//[\u4e00-\u9fa5]  //判断是否为常用中文字符的编码范围
                {
                    foreach (byte b in GB2312.GetBytes(c.ToString()))
                    {
                        output += "%" + Convert.ToString(b, 16);//给output字符串追加GB2312码，格式%XX
                    }
                }
                else//非中文编码
                {
                    output += c.ToString();
                }
            }
            return output;
        }

2、将链接中的GB2312编码转中文字符

        //解码为中文（针对语言设置为中文的windows的ftp服务器，将其Uri的中文GB2312编码转中文）
        //例：文件夹名：测试A哦-C啊C
        //%B2%E2%CA%D4A%C5%B6-C%B0%A1C（浏览器编码）
        //测试A哦-C啊C（本方法输出结果）
        public static string DecodeChinese(string input)
        {
            try
            {
                Encoding GB2312 = Encoding.GetEncoding("GB2312");
                //string pattern = @"(?<=%)([A-Z]+\d+)|(?<=%)(\d+[A-Z]+)|(?<=%)(\d+\d+)|(?<=%)([A-Z]+[A-Z]+)";
                string pattern = @"(?<=%)([A-Z]{1}\d{1})|(?<=%)(\d{1}[A-Z]{1})|(?<=%)(\d{1}\d{1})|(?<=%)([A-Z]{1}[A-Z]{1})|(?<=%)([a-z]{1}\d{1})|(?<=%)(\d{1}[a-z]{1})|(?<=%)(\d{1}\d{1})|(?<=%)([a-z]{1}[a-z]{1})";//抓取字节，格式%XX
                MatchCollection matchCollection = Regex.Matches(input, pattern);
                if (matchCollection.Count > 0)
                {
                    //string[] strArr = (from cell in input.Split('%') where !string.IsNullOrWhiteSpace(cell) select cell).ToArray();
                    Match[] matches = new Match[matchCollection.Count];
                    matchCollection.CopyTo(matches, 0);//正则匹配结果填充到matches
                    byte[] byteArr = new byte[matches.Length];
                    for (int index = 0; index < matches.Length; index++)
                    {
                        byteArr[index] = Convert.ToByte(matches[index].Value, 16);//将文本形式展现的字节转byte
                    }
                    string content = GB2312.GetString(byteArr);//byte[]转中文
                    //return content;


                    //字符替换。中文字符GB2312码占用2个字节
                    string output = input;
                    string[] temporary = new string[matches.Length / 2];
                    for (int index = 0; index < temporary.Length; index++)
                    {
                        int offset = index * 2;
                        temporary[index] = "%" + matches[offset] + "%" + matches[offset + 1];//每2个字节一组。%D4%CA 试
                    }
                    for (int index = 0; index < temporary.Length; index++)//字符截取替换。比如把 "%D4%CA" 替换 为 "试"
                    {
                        int startIndex = output.IndexOf(temporary[index]);
                        int Length = temporary[index].Length;
                        output = combine(output.Take(startIndex)) + content[index].ToString() + combine(output.Skip(startIndex + Length));
                    }
                    return output;
                }
                else
                {
                    return "";
                }
            }
            catch (Exception exception)
            {
                return "";
            }
        }

m0_72908360

关注

21
点赞
踩
19

收藏

觉得还不错? 一键收藏
1
评论
C#-FTP路径中文字符编码、解码

背景：在访问FTP地址时，英文字符将会维持不变，中文字符将会被编码（格式%XX），具体编码由FTP服务器的语言及其所支持的编码设置有关。例如Windows10语言设置为中文，中文字符将会被编码为GB2312.
复制链接

扫一扫