PHP,XML,DOM - 如何确保最终的xml文件编码= utf-8?(PHP, XML, DOM — how to make sure that final xml file encoding=utf-8?)
请帮帮我...
这是详细情景..
我有一个包含xml标签的xml文件,例如
data.xml - following its content
-----------------
some text
-------------------
现在我将此文件上传到我的翻译服务。 我的代码加载文件...以下是PHP代码
$dom = new DOMDocument('1.0', 'utf-8');
if ( !$dom->load($target_file) ) {
echo "Cannot load file $target_file";
exit;
}
然后我的逻辑运行并用一些重音字符替换节点值,例如nënë,它工作正常,最后我保存文件
$dom->save($target_file);
现在输出应如下所示
data.xml - following its content
-----------------
nënë
-------------------
但是当我打开文件输出如下
-------------------
nënë
-------------------
请帮助我...我应该如何确保xml文件编码应该是UTF-8 ?????
等候......
Please help me...
here is the detail scenario..
I have an xml file containing xml tags e.g.
data.xml - following its content
-----------------
some text
-------------------
Now I uploaded this file to my translations service. my code loads the file... following is the php code
$dom = new DOMDocument('1.0', 'utf-8');
if ( !$dom->load($target_file) ) {
echo "Cannot load file $target_file";
exit;
}
then my logic operates and replaces the node value with some accented characters e.g. nënë and it works fine and finally i save the file
$dom->save($target_file);
Now the output should be like as follow
data.xml - following its content
-----------------
nënë
-------------------
BUT When i open the file the output as follow
-------------------
nënë
-------------------
please Help me ... How should I make sure that xml file encoding should be UTF-8?????
Waiting......
原文:https://stackoverflow.com/questions/9998131
更新时间:2020-02-20 14:27
最满意答案
不知道你是否已经解决了:
如果您的数据是UTF-8编码的,并且您发现saveXML()将所有非ASCII字符转换为数字实体(例如ä - >ö):
加载源数据时可能缺少XML声明。 在使用load()或loadXML()读取之前,尝试将添加到文档的开头。 然后非ASCII字符应保持不变。 为我工作。
Don't know if you already solved it or not:
If your data is UTF-8-encoded and you discover that saveXML() turned all non-ASCII characters into numeric entities (e.g. ä -> ö):
Chances are that the XML declaration has been missing when you loaded the source data. Try adding <?xml version="1.0" encoding="UTF-8"?> to the beginning of the document before you read it with load() or loadXML(). Then the non-ASCII characters should remain untouched. Worked for me.
2012-04-21
相关问答
我相信您在chrome中遇到了流式XML解析错误 。 该错误将指向XML标记的开头,但实际上“错误”是内容中的某个位置。 这是因为服务器以块的形式响应,其中一个块在多字节UTF字符的中间被分割。 I believe you encountered a streamed XML parsing bug in chrome. The error will point to the beginning of the XML tag, but in fact the „error” is somewher
...
什么时候ç 是“ç”,那么你的编码是Windows-1252(或者可能是ISO-8859-1),但不是UTF-8。 When ç is "ç", then your encoding is Windows-1252 (or maybe ISO-8859-1), but not UTF-8.
使用项目功能,您可以获得您正在寻找的元素 $路径 - >项(0) - >的nodeValue; Using the item functions you can get the element you are looking for $path->item(0)->nodeValue;
在Vim中有很多与编码有关的混淆。 有两种编码设置, 'encoding'和'fileencoding' 。 'encoding'是与当前vim会话相关的一个 - 我将它始终保留为'utf-8',但后来我只使用gVim或启用了unicode的终端。 'fileencoding'是文件本身的编码,可以自动检测到,或者可以用设置( ++enc )或模式行覆盖,我相信。 它基于'fileencodings'选项进行检测。 尝试这个: vim
:set encoding=utf-8
:e ++enc=ut
...
问题是你的php.ini中的short_open_tag = On the problem is short_open_tag = On in your php.ini
我认为你做的一切都正确,除了你的终端是拉丁语-1。 ä的UTF-8序列是C3 A4,如果显示为Latin-1则为ä。 I think you did everything correctly, except that your terminal is in Latin-1. The UTF-8 sequence for ä is C3 A4, which is ä if displayed as Latin-1.
您正在将FileWriter传递给XMLWriter 。 Writer已经处理了String或char[]数据,因此它已经处理了编码,这意味着XMLWriter没有机会影响它。 另外, FileWriter是一个特别有问题的Writer类型,因为你永远不能指定它应该使用哪种编码,而是它总是使用平台默认编码(在Windows上通常类似于ISO-8859-1,在Linux上则是UTF-8)。 因此基本上不应该使用它。 为了让XMLWriter应用它作为配置给出的内容,传递一个OutputStream
...
直接看起来问题在于您的响应中的XML编码。 URL url = new URL("http://myurl.com");
InputSource is = new InputSource(url.openStream());
is.setEncoding("ISO-8859-1"); // Also Try UTF-8 or UTF-16
BufferedReader br = new BufferedReader(new InputStreamReader(is.getByteStream())
...
不知道你是否已经解决了: 如果您的数据是UTF-8编码的,并且您发现saveXML()将所有非ASCII字符转换为数字实体(例如ä - >&#xF6;): 加载源数据时可能缺少XML声明。 在使用load()或loadXML()读取之前,尝试将添加到文档的开头。 然后非ASCII字符应保持不变。 为我工作。 资料来源: http : //www.php.net/manual/en/domdocument.savexml.
...
确保您的输入数据尚未编码为UTF-8因为如果是,则通过调用utf8_encode()对其进行双重编码。 如果您希望遇到编码为UTF-8字符串并且还使用其他字符集(我猜是ISO-8859-9 ),那么我认为用这样的函数替换utf8_encode()更好: function encode_to_utf8_if_needed($string)
{
$encoding = mb_detect_encoding($string, 'UTF-8, ISO-8859-9, ISO-8859-1');
...