代码实现UTF16转UTF8

最新推荐文章于 2023-07-08 16:35:11 发布

ls1300005

最新推荐文章于 2023-07-08 16:35:11 发布

阅读量1.3k

点赞数

分类专栏：日常文章标签： python

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/ls1300005/article/details/131610340

版权

日常专栏收录该内容

4 篇文章 0 订阅

订阅专栏

这是一个使用Python编写的函数，将UTF16编码的字符串转换为UTF8。代码逐个解析UTF16字符，处理代理对，并根据Unicode编码生成相应的UTF8字节序列。

摘要由CSDN通过智能技术生成

下面是一个实现UTF16转UTF8的示例代码，使用了Python编程语言：

```python

def utf16_to_utf8(utf16_str):

utf8_str = ""

i = 0

while i < len(utf16_str):

# 获取当前字符的UTF16编码

unicode_code = ord(utf16_str[i])

# 处理前导代理和后尾代理的情况

if 0xD800 <= unicode_code <= 0xDBFF and i + 1 < len(utf16_str):

surrogate_pair_code = ord(utf16_str[i+1])

if 0xDC00 <= surrogate_pair_code <= 0xDFFF:

unicode_code = ((unicode_code - 0xD800) << 10) + (surrogate_pair_code - 0xDC00) + 0x10000

i += 1

# 将Unicode编码转换成UTF8编码

if unicode_code < 0x80:

utf8_str += chr(unicode_code)

elif unicode_code < 0x800:

utf8_str += chr((unicode_code >> 6) | 0xC0)

utf8_str += chr((unicode_code & 0x3F) | 0x80)

elif unicode_code < 0x10000:

utf8_str += chr((unicode_code >> 12) | 0xE0)

utf8_str += chr(((unicode_code >> 6) & 0x3F) | 0x80)

utf8_str += chr((unicode_code & 0x3F) | 0x80)

else:

utf8_str += chr((unicode_code >> 18) | 0xF0)

utf8_str += chr(((unicode_code >> 12) & 0x3F) | 0x80)

utf8_str += chr(((unicode_code >> 6) & 0x3F) | 0x80)

utf8_str += chr((unicode_code & 0x3F) | 0x80)

i += 1

return utf8_str

# 测试代码

utf16_str = "你好，世界！" # UTF16编码的字符串

utf8_str = utf16_to_utf8(utf16_str)

print("UTF8编码的字符串：", utf8_str)

```

该代码首先将UTF16字符串逐个字符转换为Unicode编码的整数值，然后使用Unicode编码计算得到对应的UTF8编码。详细的算法细节已在代码注释中说明。运行代码后会输出转换后的UTF8编码字符串。

注意：此代码假设输入的UTF16字符串是合法的，并且不包含BOM（字节顺序标记）。

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

博客等级

码龄14年

33
原创

94
点赞

384
收藏

65
粉丝

关注

私信

热门文章

分类专栏

xilinx
stm 2篇
笔记 2篇
QT 2篇
word 3篇
c语言 4篇
matlab 8篇
Bachmann
bat
树莓派 3篇
sqlite3 1篇
日常 4篇

最新评论

win10家庭版如何安装Windows Sandbox
普通网友: 我的电脑下载不了只能显示“发生错误 - Microsoft-UtilityVM-Containers-Shared-vm-Package 错误: 0x800f0922 错误: 0x800f0922 DISM 失败。不执行任何操作。有关详细信息，请查看日志文件。”
QT 用正则表达式从tableView中筛选出某一日期范围内的数据
派山山: 老哥请问这个DateToReg函数里面可以想再增加一个名称筛选实现日期和名称的同时筛选的话，可以直接加在原有的代码后面append进去吗
如何在 Zynq PS 端使用 wiz_clk 动态配置：
ls1300005: #include <stdint.h> #include <math.h> // ... (Include the calculate_clk_wiz_reg_value function here) #define U550_UART_BASE_ADDR 0xXXXXXXXX // Replace with your U550 UART module base address #define UART_BAUD_RATE_REG_OFFSET 0xYY // Replace with the baud rate register offset // Calculate the UART baud rate divisor for a given baud rate and clock frequency uint16_t calculate_uart_baud_rate_divisor(uint32_t clock_freq, uint32_t baud_rate) { return (uint16_t)(clock_freq / (16 * baud_rate)); } // Find the optimal output frequency for the given baud rate double find_optimal_output_freq(double input_freq, uint32_t baud_rate) { double best_freq = 0; double best_error = 1.0; for (int i = MIN_OUTPUT_FREQ; i <= MAX_OUTPUT_FREQ; i++) { uint32_t clock_freq = i * 1000000; // Convert output frequency to Hz uint16_t divisor = calculate_uart_baud_rate_divisor(clock_freq, baud_rate); double actual_baud_rate = (double)clock_freq / (16 * divisor); double error = fabs((double)baud_rate - actual_baud_rate) / baud_rate; if (error < best_error) { best_error = error; best_freq = (double)i; } } return best_freq; } int main() { double input_freq = 100.0; // Input frequency in MHz uint32_t baud_rate = 115200; // Desired UART baud rate (115200 bps) // Find the optimal output frequency for the given baud rate double output_freq_1 = find_optimal_output_freq(input_freq, baud_rate); // Calculate the Clocking Wizard register value for the desired output frequency unsigned int clk_wiz_reg_value_1 = calculate_clk_wiz_reg_value(input_freq, output_freq_1); uint32_t clock_freq = (uint32_t)(output_freq_1 * 1000000); // Convert output frequency to Hz // Calculate the UART baud rate divisor uint16_t baud_rate_divisor = calculate_uart_baud_rate_divisor(clock_freq, baud_rate); // Configure the U550 UART module for the desired baud rate volatile uint32_t *u550_uart = (uint32_t *)U550_UART_BASE_ADDR; u550_uart[UART_BAUD_RATE_REG_OFFSET / sizeof(uint32_t)] = baud_rate_divisor; // ... (The rest of your original code here) }
matlab 报错此类型的变量不支持使用点进行索引
xiaoxiaoa_: 你好，博主，下边这种情况应该怎么解决呢此类型的变量不支持使用点进行索引。出错 untitled (第 17 行) model=libsvmtrain(class(train_id).featuresn(test_id,:).cmd);
FreeRTOS + FATFS 下 sqlite3_mutex_methods实现
ls1300005: sqlite3.c，sqlite3.h

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

打赏作者

ls1300005 你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20

扫码支付：¥1

获取中

扫码支付

您的余额不足，请更换扫码支付或充值

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。