php将二进制转换成utf8,在PHP中将utf-8转换回一字节二进制文件(Convert utf-8 back to one-byte binary in PHP)...

润0713

于 2021-04-13 11:28:49 发布

阅读量505

点赞数

PHP UTF-8 二进制转换编码问题图像数据

关键词由CSDN通过智能技术生成

在PHP中将utf-8转换回一字节二进制文件(Convert utf-8 back to one-byte binary in PHP)

我有很多图像是使用utf-8编码从SQL转储导入的。因此，代替“FF D8 FF E0”，我在jpeg图像的开头看到“C3 BF C3 98 C3 BF C3 A0”。

我已经尝试过iconv('utf-8'，'iso-8859-1'，$ data)但它没有转换整个文件(utf-8中有chars，无法转换为iso-8859-1。

如何将utf-8简单转换为单字节二进制文件而不考虑编码？

I have a lot of images which has been imported from SQL dump with utf-8 encoding. Thus, instead of "FF D8 FF E0" I see "C3 BF C3 98 C3 BF C3 A0" in the beginning of jpeg images.

I've tried iconv('utf-8', 'iso-8859-1', $data) but it not converts whole file (there is chars in utf-8 which can not be converted to iso-8859-1.

How I can to convert utf-8 simple to one-byte binary with unrespect to encoding?

原文：https://stackoverflow.com/questions/20332030

更新时间：2020-01-06 16:35

最满意答案

问题是因为在UTF-8中存在一些相同字符的表示，称为“非最短”形式。这些字符可以通过数学方式转换，但iconv将它们视为错误而不能转换。

我做了一个简短的函数，它将任何utf-8字符的文本转换为Unicode(UTF-16)代码点数组。然后通过简单表将一些非ASCII值重新映射到ASCII(例如0x20ac与0x80相同，等等)。您可以在此处找到完整的代码和重映射表：将具有非最短字符的UTF-8转换为单字节编码

The problem was because there are some representations of the same character in UTF-8, called "non-shortest" form. That characters can be converted mathematically, but iconv counts them as errorneous and not converts.

I've made a short function, which converts text of any utf-8 character to Unicode (UTF-16) codepoints array. And then remap some non-ASCII values to ASCII by simple table (for example 0x20ac is the same as 0x80, etc). You can found complete code and remapping table here: Converting UTF-8 with non-shortest characters to one-byte encoding

2013-12-09

相关问答

欺骗是正确的，即ISO8859-1(AKA Latin-1)编码文本。尝试这个： file_contents = CSV.read("csvfile.csv", col_sep: "$", encoding: "ISO8859-1")

如果这不行，可以使用Iconv来修复单个字符串，如下所示： require 'iconv'

utf8_string = Iconv.iconv('utf-8', 'iso8859-1', latin1_string).first

如果"Non sp\xE9ci

...

给定一个文件对象和一些字符，您可以使用： # build a table mapping lead byte to expected follow-byte count

# bytes 00-BF have 0 follow bytes, F5-FF is not legal UTF8

# C0-DF: 1, E0-EF: 2 and F0-F4: 3 follow bytes.

# leave F5-FF set to 0 to minimize reading broken data.

_le

...

数据存储：在数据库中的所有表和文本列上指定utf8mb4字符集。这使得MySQL在UTF-8中物理存储和检索本地编码的值。请注意，如果指定了utf8mb4_* collation(没有任何显式字符集)，MySQL将隐式使用utf8mb4编码。在旧版本的MySQL(<5.5.3)中，您不幸被迫使用简单的utf8 ，它只支持Unicode字符的一个子集。我希望我在开玩笑。数据访问：在您的应用程序代码(例如PHP)中，无论使用utf8mb4数据库访问方法，都需要将连接字符集设置为utf

...

一个字节有8位。在第一步中，忽略UTF-8问题，只需填写一个字节[]与二进制字符串中的数据。当你有一个byte []数据时，你可以使用新的String(d)来创建一个UTF-8字符串(Java字符串是默认的UTF-8)。 a byte has 8 bits. in a first step, ignore the UTF-8 issue, just fill a byte[] with the data from your binary string. When you have a byte

...

问题是因为在UTF-8中存在一些相同字符的表示，称为“非最短”形式。这些字符可以通过数学方式转换，但iconv将它们视为错误而不能转换。我做了一个简短的函数，它将任何utf-8字符的文本转换为Unicode(UTF-16)代码点数组。然后通过简单表将一些非ASCII值重新映射到ASCII(例如0x20ac与0x80相同，等等)。您可以在此处找到完整的代码和重映射表：将具有非最短字符的UTF-8转换为单字节编码 The problem was because there are some

...

清洁版： >>> test_string = '1101100110000110110110011000001011011000101001111101100010101000'

>>> print ('%x' % int(test_string, 2)).decode('hex').decode('utf-8')

نقاب

反过来(来自@Robᵩ的评论)： >>> '{:b}'.format(int(u'نقاب'.encode('utf-8').encode('hex'), 16))

...

您的代码在我的机器上完美运行： <?php

$im = imagecreatetruecolor(120, 20);

$text_color = imagecolorallocate($im, 233, 14, 91);

imagestring($im, 1, 5, 5, 'A Simple Text String', $text_color);

header('Content-type: image/jpeg');

imagejpeg($im);

imagedestroy($im);

die;

...

您的测试使用$ content = iconv('UTF-16'，'UTF-8'，$ content); 很好，但它不仅是UTF-16而且是UTF-16LE <?php

$content = file_get_contents('ru.txt');

$content = iconv('UTF-16LE', 'UTF-8', $content);

encodage

润0713

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
php将二进制转换成utf8,在PHP中将utf-8转换回一字节二进制文件(Convert utf-8 back to one-byte binary in PHP)...

在PHP中将utf-8转换回一字节二进制文件(Convert utf-8 back to one-byte binary in PHP)我有很多图像是使用utf-8编码从SQL转储导入的。因此，代替“FF D8 FF E0”，我在jpeg图像的开头看到“C3 BF C3 98 C3 BF C3 A0”。我已经尝试过iconv('utf-8'，'iso-8859-1'，$ data)但它没有转换整...
复制链接

扫一扫