python多个条件组合,在列表理解中组合多个条件表达式

I utf-8 encode characters like \u2013 before inserting them into SQLite.

When I pull them out with a SELECT, they are back in their unencoded form, so I need to re-encode them if I want to do anything with them. In this case, I want to write the rows to a CSV.Before writing the rows to CSV, I want to first add hyperlink to any row whose value starts with 'http'. Some values will be ints, dates etc, so I do the folliowing conditional expression - list comprehension combo:

row = ['=HYPERLINK("%s")' % cell if 'http' in str(cell) else cell for cell in row].

The str() operation then results in the well-known:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in

position 15: ordinal not in range(128) error.

What I need then is to perform the .encode('utf-8') encoding again, but only on those elements in the lists that are strings to begin with. The following won't work (since not all elements are strings):

['=HYPERLINK("%s")' % cell if 'http' in str(cell).encode('utf8') else cell.encode('utf8') for cell in row]

TLDR: How do I expand /modify the list comprehension to only encode an element if it's a string?

解决方案

In general, work in terms of unicode as long as possible, and encoded unicode to

bytes (i.e. strs) only when necessary, such as writing output to a network

socket or file.

Do not mix strs with unicode -- although this is permitted in Python2,

it causes Python2 to implicitly convert str to unicode or vice versa as necessary using the ascii codec. If the implicit encoding or decoding fails, then you get a UnicodeEncodingError or UnicodedDecodingError, respectively, such as the one you are seeing.

Since cell is unicode, use u'=HYPERLINK("{}")'.format(cell) or u'=HYPERLINK("%s")' % cell instead of '=HYPERLINK("%s")' % cell. (Note that you may want to url-encode cell in case cell contains a double quote).

row = [u'=HYPERLINK("{}")'.format(cell)

if isinstance(cell, unicode) and cell.startswith(u'http') else cell

for cell in row]

Later, when/if you need to convert row to strs, you could use

row = [cell.encode('utf-8') if isinstance(cell, unicode) else str(cell)

for cell in row]

Alternatively, convert everything in row to strs first:

row = [cell.encode('utf-8') if isinstance(cell, unicode) else str(cell)

for cell in row]

and then you could use

row = ['=HYPERLINK("{}")'.format(cell) if cell.startswith('http') else cell

for cell in row]

Similarly, since row contains cells which are unicode, perform the test

if u'http' in cell

using the unicode u'http' instead of the str 'http', or better yet,

if isinstance(cell, unicode) and cell.startswith(u'http')

Although no error arises if you keep 'http' here (since the ascii codec can decode bytes in the 0-127 range), it is a good practice to use u'http' anyway since conforms to the rule never mix str and unicode, and supports mental clarity.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值