［已解决］Python中如何给http的get请求中所使用的request库中设置字符编码为UTF-8

折腾：

［已解决］Python给unicode去编码出错：UnicodeEncodeError gbk codec can’t encode character u’\xe6′ in position 0 illegal multibyte sequence

期间，希望去：

在使用request去执行http的get请求时，

设置字符编码为UTF-8

否则此处默认就是ISO-8859-1

返回来就是看起来是乱码了。

搜：

Python request 设置字符编码

python – 使用requests库抓取页面的时候的编码问题 – SegmentFault

Python + Requests 编码问题 | Chu’s BLoG

Developer Interface — Requests 2.11.1 documentation

Python requests 字符编码

去看代码：

感觉像是：

手动设置了

charset=UTF-8

这个参数，就可以了？

快速上手 — Requests 2.10.0 文档

可以设置：

r.encoding = ‘UTF-8′

貌似就可以了？

Python+Requests编码识别Bug

Python requests header charset

高级用法 — Requests 2.10.0 文档

去设置：

‘Content-Type’ : ‘application/json; charset=UTF-8’

结果去：

/Users/crifan/dev/dev_root/daryun/SIPEvents/sourcecode/flask/wechat_sdk/lib/request.py

修改：

def request(self, method, url, access_token=None, **kwargs):

为：

r = requests.request(

method=method,

headers = {

‘Content-Type’ : ‘application/json; charset=UTF-8’

url=url,

**kwargs

)

和：

headers = {

‘Content-Type’ : ‘application/json; charset=UTF-8’,

‘Accept-Type’ : ‘application/json; charset=UTF-8’

问题依旧。

flask charset utf-8

flask mimetype

Python Flask, how to set content type – Stack Overflow

python – Forcing application/json MIME type in a view (Flask) – Stack Overflow

Python http request charset

此处通过加上print：

try:

print ‘r.encoding’,r.encoding

print ‘r.apparent_encoding’,r.apparent_encoding

print ‘r.headers’,r.headers

print ‘r.url’,r.url

print ‘r.text’,r.text

print ‘type(r)=’,type(r)

print ‘r=’,r

response_json = r.json()

输出了内容是：

r.encoding ISO-8859-1

r.apparent_encoding utf-8

r.headers {‘date’: ‘Sun, 21 Aug 2016 09:20:45 GMT’, ‘connection’: ‘keep-alive’, ‘content-type’: ‘text/plain’, ‘content-length’: ‘298’}

r.url https://api.weixin.qq.com/sns/userinfo?access_token=nW9PbJSaEL7nI2PUupnHWzNB7zip1bwDlbxbx5TwhO32uRiDLJ2ft7kQcmFARMnjFFMn2h0iaHTIAXi_MuxjqnJC9rqgPuR3oTV3x8vXDQw&openid=oswjmv4X0cCXcfkIwjoDfCkeTVVY&lang=zh_CN

r.text {"openid":"oswjmv4X0cCXcfkIwjoDfCkeTVVY","nickname":"ç¤¼è²","sex":1,"language":"zh_CN","city":"èå·","province":"æ±è","country":"ä¸å½","headimgurl":"http:\/\/wx.qlogo.cn\/mmopen\/ajNVdqHZLLDYtIJicNl7MjwZK5c1lxAJZ253c9v3JzDib7GeE5OFrWiaRqsK1ruW1HmGaziaYETV5vQhIIbic6wHKFQ\/0","privilege":[]}

type(r)= <class ‘requests.models.Response’>

r= <Response [200]>

所以去设置encoding为UTF-8试试：

r.encoding = ‘UTF-8’

response_json = r.json()

结果是：

就可以正常输出UTF-8解码后的内容了：

DEBUG in sipevents [/usr/share/nginx/html/SIPEvents/sipevents.py:90]:

type(respUserInfoDict)=<type ‘dict’>, respUserInfoDict={u’province’: u’\u6c5f\u82cf’, u’openid’: u’oswjmv4X0cCXcfkIwjoDfCkeTVVY’, u’headimgurl’: u’http://wx.qlogo.cn/mmopen/ajNVdqHZLLDYtIJicNl7MjwZK5c1lxAJZ253c9v3JzDib7GeE5OFrWiaRqsK1ruW1HmGaziaYETV5vQhIIbic6wHKFQ/0′, u’language’: u’zh_CN’, u’city’: u’\u82cf\u5dde’, u’country’: u’\u4e2d\u56fd’, u’sex’: 1, u’privilege’: [], u’nickname’: u’\u793c\u8c8c’}

<div–<——————————————————————————

DEBUG in sipevents [/usr/share/nginx/html/SIPEvents/sipevents.py:101]:

province=江苏, city=苏州, country=中国, nickname=礼貌

<div–<——————————————————————————

DEBUG in sipevents [/usr/share/nginx/html/SIPEvents/sipevents.py:102]:

type(province)=<type ‘unicode’>

此处，就懒得，也不会，去参考：

Python+Requests编码识别Bug

去弄什么monkey patch

确保了解编码即可。

有机会，再去确保：

是不是最新的requests，就解决了此问题了？

由于之前看log看到有：

r.encoding ISO-8859-1

r.apparent_encoding ascii

然后最终此处暂时先去改为：

#r.encoding = ‘UTF-8’

r.encoding = r.apparent_encoding

response_json = r.json()

暂时确保：

当encoding和apparent_encoding不一致时，都使用apparent_encoding去解码

从而得到正确的unicode字符串

这样就可以正常显示中文了。

［总结］

此处的问题：

调用request的request返回的内容为乱码

的原因是：

requests库中，http的response中，对于返回的数据data去解码，所使用的编码，“从响应头文件的Content-Type里获取，如果存在charset，则可以正确识别，如果不存在charset但是存在text就认为是ISO-8859-1”

而此处，返回的内容是UTF-8

对应的：

r.encoding ISO-8859-1

r.apparent_encoding utf-8

但是request内部还是用r.encoding的ISO-8859-1去解码的，导致内容是乱码了

解决办法：

1.笨办法，也不算解决办法的办法：

如果只是为了能把乱码解码成正常的文字，则可以通过：

把错用ISO-8859-1的解码后的unicode，即可得到正常的Unicode类型的字符串

然后再去用UTF-8再去编码，即可得到正常的UTF-8的字符串了。

代码为：

encodedProvinceIso8859_1 = province.encode(‘ISO-8859-1’)

app.logger.debug(‘encodedProvinceIso8859_1=%s’, encodedProvinceIso8859_1)

decodedProvinceUnicode = encodedProvinceIso8859_1.decode(‘utf-8’)

app.logger.debug(‘decodedProvinceUnicode=%s’, decodedProvinceUnicode)

即可把乱码的province解码得到正常的字符串了：

province=江苏

了。

2.标准的办法：

去让requests对于response（的json）去解码的时候，就使用正确的，此处为UTF-8的编码：

r.encoding = ‘UTF-8’

或：

先对更智能一些的：

r.encoding = r.apparent_encoding

response_json = r.json()

这样就可以用真正的编码去解码，得到正常的字符串了。

此处相关代码为：

vim /root/Envs/SIPEvents/lib/python2.7/site-packages/wechat_sdk-0.6.4-py2.7.egg/wechat_sdk/lib/request.py

中的：

wechat_sdk-0.6.4-py2.7.egg/wechat_sdk/lib/request.py

的：

def request(self, method, url, access_token=None, **kwargs):

try:

#r.encoding = ‘UTF-8’

r.encoding = r.apparent_encoding

response_json = r.json()

转载请注明：在路上 » ［已解决］Python中如何给http的get请求中所使用的request库中设置字符编码为UTF-8

Post Views: 2,570

与本文相关的文章