西欧文字入oracle,sql中的西欧字符到土耳其语(Western European Characterset to Turkish in sql)...

博客讨论了在数据库(ISO-8859-1字符集)与土耳其语字符显示不一致的问题,包括在Unix和TOAD中正确显示,但在提取和传输过程中丢失字符。解决方案可能涉及更改数据库或接口的字符集设置,如迁移到ISO-8859-9或Unicode(UTF-8)以支持全部土耳其字符。
摘要由CSDN通过智能技术生成

sql中的西欧字符到土耳其语(Western European Characterset to Turkish in sql)

我遇到了严重的字符编码问题。 给出一些背景知识:

我有土耳其语的商业用户,他们在土耳其语的Unix屏幕上输入一些数据。

我的数据库NLS参数设置为AMERICAN , WE8ISO8859P1和Unix NLS_LANG为AMERICAN_AMERICA.WE8ISO8859P1 。

土耳其企业能够在UNIX屏幕和TOAD上看到所有土耳其人物,而我却不是。 我只能在西欧字符集中看到它们。

在业务结束时: ÖZER İNŞAAT TAAHHÜT VE

在我们的结尾: ÖZER ÝNÞAAT TAAHHÜT VE

如果您注意到土耳其字符İ和Ş正在转换为ISO 8859-1字符集。 但是,所有设置(db和unix中的NLS参数)在两端都是相同的 - ISO8859-1 (西欧)

通过一些研究,我可以理解 - 土耳其机器可以通过实时转换显示土耳其数据(DB NLS设置被本地NLS设置覆盖)。

现在,我有一个在我的db中运行的接口 - 有一些PL / SQL脚本(通过shell脚本运行)从数据库中提取一些数据并将它们假脱机到unix路径上的.csv文件。 然后通过MFT(托管文件传输)将.csv文件传输到外部系统。

问题是 - Exract永远不会有任何土耳其人的性格。 每个土耳其人的角色都被转换成西欧角色,并且像这样对待外部系统,这被视为数据转换/丢失的情况,我的生意真的很不开心。

谁能告诉我 - 我怎么能保留所有土耳其人物?

PS:外部系统的字符集可以设置为ISP8859-9 charcterset。

提前谢谢了。

I am having a serious issue with character encoding. To give some background:

I have turkish business users who enter some data on Unix screens in Turkish language.

My database NLS parameter is set to AMERICAN, WE8ISO8859P1 and Unix NLS_LANG to AMERICAN_AMERICA.WE8ISO8859P1.

Turkey business is able to see all the turkish characters on UNIX screens and TOAD while I'm not. I can only see them in Western European Character set.

At business end: ÖZER İNŞAAT TAAHHÜT VE

At our end : ÖZER ÝNÞAAT TAAHHÜT VE

If you notice the turkish characters İ and Ş are getting converted to ISO 8859-1 character set. However, all the settings(NLS paramaters in db and unix) are same at both end- ISO8859-1(Western European)

With some study, I can understand - Turkish machines can display turkish data by doing conversion in real-time(DB NLS settings are overridden by local NLS settings).

Now, I have a interface running in my db- have some PL/SQL scripts(run through shell script) that extracts some data from database and spool them to a .csv file on a unix path. Then that .csv file is transferred to an external system via MFT(Managed File transfer).

The problem is- Exract never conains any turkish character. Every turkish character is getting converted into Western European Characterset and goes like this to the external system which is treated as a case of data conversion/loss and my business is really unhappy.

Could anyone tell me - How could I retain all the turkish characters?

P.S. : External System's characterset could be set to ISP8859-9 charcterset.

Many thanks in advance.

原文:https://stackoverflow.com/questions/12627868

更新时间:2020-03-19 17:03

最满意答案

如果您说您的数据库字符集是ISO-8859-1,即

SELECT parameter, value

FROM v$nls_parameters

WHERE parameter = 'NLS_CHARACTERSET'

返回value WE8ISO8859P1并且您将数据存储在CHAR , VARCHAR或VARCHAR2列中,问题是数据库字符集不支持完整的土耳其字符集。 如果某个字符不在ISO-8859-1代码页布局中 ,则无法将其正确存储在由数据库字符集管理的数据库列中。 如果要将土耳其数据存储在ISO-8859-1数据库中,则可以改为使用变通方法字符 (即将S替换为Ş)。 但是,如果要支持所有土耳其字符,则需要移动到支持所有这些字符的字符集 - ISO-8859-9或UTF-8相对常见。

但是,更改现有数据库的字符集是一件非常重要的事情。 “全球化支持指南”中有一章介绍了您正在使用的涵盖字符集迁移的任何Oracle版本。 如果要移动到Unicode字符集(这通常是首选方法而不是坚持使用其中一个单字节ISO字符集),则可以使用Oracle数据库迁移助手进行Unicode 。

此时,您通常会看到异议,即至少某些应用程序正在“正确”查看数据,因此数据库必须支持土耳其语字符。 问题是,如果您错误地设置了NLS_LANG ,则可以完全绕过字符集转换,这意味着字符在客户端上具有的任何二进制表示都会被持久化,而无需修改数据库。 只要每个读取数据的进程都以相同和错误的方式配置其NLS_LANG ,事情就会起作用。 但是,您很快就会发现其他一些应用程序无法正确配置其NLS_LANG 。 例如,Java应用程序总是希望在内部将数据库中的数据转换为Unicode字符串。 因此,如果您将数据错误地存储在数据库中,就像听起来一样,那么就无法让这些应用程序正确读取它们。 如果您只是在shell脚本中使用SQL * Plus来生成文件,那么几乎可以肯定的是,您的客户端配置不正确,因此数据文件看起来是正确的。 但是让现有的错误配置持续存在是一个非常糟糕的主意。 您将来会遇到更大的问题(如果您还没有),不同的客户端将不同字符集中的数据插入到数据库中,这使得当您找到像Oracle导出实用程序这样的工具时更难以解开已损坏导出的数据或者您希望使用无法正确配置的工具来查看数据。 你可以更早地解决问题。

If you are saying that your database character set is ISO-8859-1, i.e.

SELECT parameter, value

FROM v$nls_parameters

WHERE parameter = 'NLS_CHARACTERSET'

returns a value of WE8ISO8859P1 and you are storing the data in CHAR, VARCHAR, or VARCHAR2 columns, the problem is that the database character set does not support the full set of Turkish characters. If a character is not in the ISO-8859-1 codepage layout, it cannot be stored properly in database columns governed by the database character set. If you want to store Turkish data in an ISO-8859-1 database, you could potentially use the workaround characters instead (i.e. substituting S for Ş). If you want to support the full range of Turkish characters, however, you would need to move to a character set that supported all those characters-- either ISO-8859-9 or UTF-8 would be relatively common.

Changing the character set of your existing database is a non-trivial undertaking, however. There is a chapter in the Globalization Support Guide for whatever version of Oracle you are using that covers character set migration. If you want to move to a Unicode character set (which is generally the preferred approach rather than sticking with one of the single-byte ISO character sets), you can potentially leverage the Oracle Database Migration Assistant for Unicode.

At this point, you'll commonly see the objection that at least some applications are seeing the data "correctly" so the database must support the Turkish characters. The problem is that if you set up your NLS_LANG incorrectly, it is possible to bypass character set conversion entirely meaning that whatever binary representation a character has on the client gets persisted without modification to the database. As long as every process that reads the data configures their NLS_LANG identically and incorrectly, things may appear to work. However, you will very quickly find that some other application won't be able to configure their NLS_LANG identically incorrectly. A Java application, for example, will always want to convert the data from the database into a Unicode string internally. So if you're storing the data incorrectly in the database, as it sounds like you are, there is no way to get those applications to read it correctly. If you are simply using SQL*Plus in a shell script to generate the file, it is almost certainly possible to get your client configured incorrectly so that the data file appears to be correct. But it would be a very bad idea to let the existing misconfiguration persist. You open yourself up to much bigger problems in the future (if you're not already there) where different clients insert data in different character sets into the database making it much more difficult to disentangle, when you find that tools like the Oracle export utility have corrupted the data that is exported or when you want to use a tool that can't be configured incorrectly to view the data. You're much better served getting the problem corrected early.

2012-09-27

相关问答

老实说,如果使用URLComponents进行分解,它是100%好多了。 这将有助于你和下一个开发者阅读代码的工作。 let host = "analytics.abc.io"

let scheme = "http"

let path = "/acct"

var urlQueryItems : [URLQueryItem] = []

for (key, value) in [("start_date", "2017-11-15"), ("aff", "ABC Telecom")] {

ur

...

您需要更改数据库的排序顺序。 为此,请在“访问选项”对话框的“常规”选项卡上将“新数据库排序顺序”更改为“土耳其语”... ...然后在现有数据库上执行“压缩和修复数据库”以更新排序顺序。 You need to change the sort order of the database. To do that, change the "New database sort order" to "Turkish" on the "General" tab of the Access Options

...

N表示后续字符串是Unicode 更多信息... 纯SQL代码将是 SELECT returnCol1 FROM tableName WHERE lookupCol1 = N'turkeyCharactor';

--OR

SELECT returnCol1 FROM tableName WHERE lookupCol1 LIKE N'%turkeyCharactor%';

从Java传递(对不起,如果有一些语法问题,因为我没有太多关于它) "select "+ returnCol1 + " fr

...

SQL Server不以任何字符串格式存储DateTime - 它存储为8字节数值, DATETIME是DATETIME是DATETIME 。 各种设置(语言,日期格式)仅影响在SQL Server Management Studio中向您显示DateTime方式 - 或者在您尝试将字符串转换为DateTime时如何解析它。 SQL Server支持许多格式 - 请参阅CDN和CONVERT上的MSDN联机丛书 。 因此,如果您想以美国格式查看DateTime时间,请使用 SELECT CONVE

...

是否使用正确的排序规则声明了数据库中的列Block ? 介绍土耳其语I问题 。 请注意, 即使对于Unicode Nchars,也必须声明归类。 Is the column Block in the database declared with the proper collation? Introduce the Turkish I issue. Note that the collation must be declared even for Unicode Nchars.

事实证明,最好的解决方案实际上是重构所有SQL和代码。 在过去几天里,我编写了一个重构应用程序来修复所有存储过程,函数,视图,表名以保持一致并使用正确的大小写,例如: select unitid from dbo.unit

将改为 select UnitId from dbo.Unit

然后,应用程序还会遍历代码并替换存储过程及其参数的所有出现,并更正它们以匹配DB中定义的大小写。 应用程序中的所有数据表都设置为不变的区域设置(感谢FXCop指出所有数据表..),这可以防止代码中的调用必须区

...

我终于找到了答案。 Insert语句应该具有如下所示的N前缀,这实际上是在标题静态字符串下描述的。 我希望这可以帮助其他人有同样的问题 INSERT INTO table VALUES(N'え', N'え'), ('え', 'え')

I found the answer finally. Insert statement should have N-prefix as below which is actually described here under title static

...

一般来说,所有要点都是正确的。 NLS_NCHAR_CHARACTERSET定义NVARCHAR2等的字符集。 人。 列,而NLS_CHARACTERSET用于VARCHAR2 。 你为什么可以看到US7ASCII中文字符? 原因是,您的数据库字符集和客户端字符集(即参见NLS_LANG值)都是US7ASCII 。 您的数据库使用US7ASCII ,它“认为”客户端也使用US7ASCII发送数据。 因此,它不对字符串进行任何转换,数据从客户端一对一地传输到服务器,反之亦然。 由于这个原因,您可以使

...

如果您说您的数据库字符集是ISO-8859-1,即 SELECT parameter, value

FROM v$nls_parameters

WHERE parameter = 'NLS_CHARACTERSET'

返回value WE8ISO8859P1并且您将数据存储在CHAR , VARCHAR或VARCHAR2列中,问题是数据库字符集不支持完整的土耳其字符集。 如果某个字符不在ISO-8859-1代码页布局中 ,则无法将其正确存储在由数据库字符集管理的数据库列中。 如果要将土耳其

...

您可以在查询中指定排序Turkish_CI_AI ,例如Turkish_CI_AI ,或者在字符串中使用“N”字符表示它们是Unicode,如下所示: select * from Product where name like N'%GALVANİZ%'

You can specify a collation in your query such as Turkish_CI_AI, or alternatively use the 'N' character with your strings

...

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值