【已解决】Python脚本运行出错：libs/thirdparty\chardet\universaldetector.py:90: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode – interpreting them as being unequal

【问题】

在折腾自己的一个python脚本，把所使用的chardet库，从1.0.1升级到1.1后，结果运行我的的python脚本，却出错：

LINE 810  : INFO     [0001] http://againinput4.blog.163.com/blog/static/172799491201091513711591
LINE 886  : INFO       Title = intro
libs/thirdparty\chardet\universaldetector.py:90: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - inte
rpreting them as being unequal
  if aBuf[:len(chunk)] == chunk:
LINE 1617 : INFO     Exporting items at last ...

【解决过程】

1.后来换为旧的1.0.1版本的chardet，就可以正常运行了，就没了此警告。

2。去比较了一下，两者之间的代码的区别：

（1）1.0.1的chardet的universaldetector.py的相关的代码：

    def feed(self, aBuf):
        if self.done: return

        charmap = (
            # EF BB BF  UTF-8 with BOM
            ('\xEF\xBB\xBF', {'encoding': "UTF-8", 'confidence': 1.0}),
            # FF FE 00 00  UTF-32, little-endian BOM
            ('\xFF\xFE\x00\x00', {'encoding': "UTF-32LE", 'confidence': 1.0}),
            # 00 00 FE FF  UTF-32, big-endian BOM
            ('\x00\x00\xFE\xFF', {'encoding': "UTF-32BE", 'confidence': 1.0}),
            # FE FF 00 00  UCS-4, unusual octet order BOM (3412)
            (u'\xFE\xFF\x00\x00', {'encoding': "X-ISO-10646-UCS-4-3412", 'confidence': 1.0}),
            # 00 00 FF FE  UCS-4, unusual octet order BOM (2143)
            (u'\x00\x00\xFF\xFE', {'encoding': "X-ISO-10646-UCS-4-2143", 'confidence': 1.0}),
            # FF FE  UTF-16, little endian BOM
            ('\xFF\xFE', {'encoding': "UTF-16LE", 'confidence': 1.0}),
            # FE FF  UTF-16, big endian BOM
            ('\xFE\xFF', {'encoding': "UTF-16BE", 'confidence': 1.0}),
        )

        aLen = len(aBuf)
        if not aLen: return

        if not self._mGotData:
            # If the data starts with BOM, we know it is UTF
            for chunk, result in charmap:
                if aBuf[:len(chunk)] == chunk:
                    self.result = result
                    break

（2）1.1的chardet的universaldetector.py的相关的代码：

    def feed(self, aBuf):
        if self.done: return

        aLen = len(aBuf)
        if not aLen: return
        
        if not self._mGotData:
            # If the data starts with BOM, we know it is UTF
            if aBuf[:3] == '\xEF\xBB\xBF':
                # EF BB BF  UTF-8 with BOM
                self.result = {'encoding': "UTF-8", 'confidence': 1.0}
            elif aBuf[:4] == '\xFF\xFE\x00\x00':
                # FF FE 00 00  UTF-32, little-endian BOM
                self.result = {'encoding': "UTF-32LE", 'confidence': 1.0}
            elif aBuf[:4] == '\x00\x00\xFE\xFF': 
                # 00 00 FE FF  UTF-32, big-endian BOM
                self.result = {'encoding': "UTF-32BE", 'confidence': 1.0}
            elif aBuf[:4] == '\xFE\xFF\x00\x00':
                # FE FF 00 00  UCS-4, unusual octet order BOM (3412)
                self.result = {'encoding': "X-ISO-10646-UCS-4-3412", 'confidence': 1.0}
            elif aBuf[:4] == '\x00\x00\xFF\xFE':
                # 00 00 FF FE  UCS-4, unusual octet order BOM (2143)
                self.result = {'encoding': "X-ISO-10646-UCS-4-2143", 'confidence': 1.0}
            elif aBuf[:2] == '\xFF\xFE':
                # FF FE  UTF-16, little endian BOM
                self.result = {'encoding': "UTF-16LE", 'confidence': 1.0}
            elif aBuf[:2] == '\xFE\xFF':
                # FE FF  UTF-16, big endian BOM
                self.result = {'encoding': "UTF-16BE", 'confidence': 1.0}

从代码中，其实好像也看不出太多问题。

3. 而网上找了些相关的错误的解释，很多说是，由于使用chardet之前，需要先用yourStr.decode("utf16-be")之类的先去解码，然后就可以解决问题了。

而我此处，懒得去详细折腾了。毕竟出现的警告是由于chardet库从1.0.1升级到1.1所导致的。

【总结】

而此处，或许是需要我调用chardet前，也需要处理编码/解码等事情，或许是chardet库本身写的不够好。

但是我的此处的需求是，chardet够用，也就好了，暂时没兴趣去详细调试，等以后有需要时再说。

所以此处的解决办法是:

暂时不把chardet升级到1.1了，还是使用旧的1.0.1的chardet就够用了，也不会有此警告出现。

转载请注明：在路上 » 【已解决】Python脚本运行出错：libs/thirdparty\chardet\universaldetector.py:90: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode – interpreting them as being unequal

Post Views: 1,605

与本文相关的文章