php中文url编码,PHP: rawurldecode - Manual

Hi everybody =) My name is Javier and I'm from Argentina.

I've had a little issue with latin characters like ñ","Ñ","á","é","í", etc.

They are not decoded with rawurlencode(), so I've made this:

{# Hex conversion table$hex_table= array(0=>0x00,1=>0x01,2=>0x02,3=>0x03,4=>0x04,5=>0x05,6=>0x06,7=>0x07,8=>0x08,9=>0x09,"A"=>0x0a,"B"=>0x0b,"C"=>0x0c,"D"=>0x0d,"E"=>0x0e,"F"=>0x0f);# Fixin' latin character problemif(preg_match_all("/\%C3\%([A-Z0-9]{2})/i",$raw_url_encoded,$res))

{$res=array_unique($res=$res[1]);$arr_unicoded= array();

foreach($resas$key=>$value){$arr_unicoded[] =chr(

(0xc0| ($hex_table[substr($value,0,1)]<<4))

| (0x03&$hex_table[substr($value,1,1)])

);$res[$key] ="%C3%".$value;

}$raw_url_encoded=str_replace($res,$arr_unicoded,$raw_url_encoded);

}# Return decoded  raw url encoded datareturnrawurldecode($raw_url_encoded);

}

printurlRawDecode("%C3%A1%C3%B1");// output:

// áñ?>For example, you have the character "ñ" encoded like this "%C3%B1".

This is nothing more and nothing less than 0xc3 and 0xb1,

they are binary numbers, (HHHH LLLL, where HHHH=High and LLLL=Low).

0xc3 = 1100 0011 (binary 8 bit word), 0xb1 = 1011 0001 (binary 8 bit word),

To convert a raw encoded character to ascii we have to make boolean operations

between this two operands (0xc3 and 0xb1), boolean algebra were defined by George

Boole, we need to use them here. The first one we going to use is the

logical OR ("|" or "pipe") and logical AND ("&" or "and person").

A logical OR implies the following truth table:

a b (a OR b)

0 0     0

0 1     1   (a OR b or Both, a and b, must be true to get a true result)

1 0     1

1 1     1

A logical AND implies the following truth table:

a b (a AND b)

0 0     0

0 1     0

1 0     0

1 1     1   (Both a AND b, must be true to get a true result)

So, here we have to make a logical OR with both 0xc3 and 0xb1 HIGH nibble,

a nibble is a half byte (4 bits), so we have to make a logical OR between

1100 (0xc) and 1011 (0xb), we going to get this: 1111 (0xf), then we have to make

a logical AND between both LOW nibble, 0011 (0x3) and 0001 (0x1), we going to get

this: 0001, so, if we want to see the final result, we have to put HIGH and LOW

nibble on his Byte position, like this: 1111 0001 (0xf1) and that is nothing

more and nothing less than "ñ" (to check this out, try the following: print(chr(0xf1));).

This "<

0001 << 2 we'll get 0100 (4) right bits are filled with 0's.

(0xc0|0x0b<<4) | (0x03&0x01)

)

);// Output will be:

// ñ

// 1100 0000 OR 1011 0000 = 1111 0000 (0xf0)

// 0000 0011 AND 0000 0001 = 0000 0001 (0x01)

// 1111 0000 OR 0000 0001 = 1111 0001 (0xf1)?>

PS: I'm so sorry about my english, I know, is horrible :P

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值