如何修复:"UnicodeDecodeError: 'ascii'编解码器不能解码字节"

as3:~/ngokevin-site# nano content/blog/20140114_test-chinese.mkd
as3:~/ngokevin-site# wok
Traceback (most recent call last):
  File "/usr/local/bin/wok", line 4, in
    Engine()
  File "/usr/local/lib/python2.7/site-packages/wok/engine.py", line 104, in init
    self.load_pages()
  File "/usr/local/lib/python2.7/site-packages/wok/engine.py", line 238, in load_pages
    p = Page.from_file(os.path.join(root, f), self.options, self, renderer)
  File "/usr/local/lib/python2.7/site-packages/wok/page.py", line 111, in from_file
    page.meta['content'] = page.renderer.render(page.original)
  File "/usr/local/lib/python2.7/site-packages/wok/renderers.py", line 46, in render
    return markdown(plain, Markdown.plugins)
  File "/usr/local/lib/python2.7/site-packages/markdown/init.py", line 419, in markdown
    return md.convert(text)
  File "/usr/local/lib/python2.7/site-packages/markdown/init.py", line 281, in convert
    source = unicode(source)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 1: ordinal not in range(128). -- Note: Markdown only accepts unicode input!

如何解决?

在其他一些基于python的静态博客应用中，中文帖子可以成功发布。比如这个应用:http://github.com/vrypan/bucket3。在我的网站http://bc3.brite.biz/，中文帖子可以成功发布。

当前回答

我在Python2.7中遇到了这个错误。我在尝试运行许多python程序时遇到了这种情况，但我设法用这个简单的脚本重现了它:

#!/usr/bin/env python

import subprocess
import sys

result = subprocess.Popen([u'svn', u'info'])
if not callable(getattr(result, "__enter__", None)) and not callable(getattr(result, "__exit__", None)):
    print("foo")
print("bar")

在成功的情况下，它应该打印出'foo'和'bar'，如果你不在svn文件夹中，可能会有一个错误消息。

在失败时，它应该打印'UnicodeDecodeError: 'ascii' codec不能解码字节0xc4在位置39:序号不在范围(128)'。

在尝试重新生成区域设置和这个问题中发布的许多其他解决方案后，我了解到发生了错误，因为我的PATH环境变量中编码了一个特殊字符(ĺ)。在` ~/中固定PATH后。Bashrc '，然后退出我的会话并再次进入，(显然是在查找'~/。Bashrc’没有起作用)，问题就消失了。

2021-01-25 14:23:05

其他回答

这是我的解决方案，只需添加编码。用open(file, encoding='utf8')作为f

因为读取glove文件需要很长时间，所以我建议将glove文件转换为numpy文件。当你读取嵌入权重时，它将节省你的时间。

import numpy as np
from tqdm import tqdm


def load_glove(file):
    """Loads GloVe vectors in numpy array.
    Args:
        file (str): a path to a glove file.
    Return:
        dict: a dict of numpy arrays.
    """
    embeddings_index = {}
    with open(file, encoding='utf8') as f:
        for i, line in tqdm(enumerate(f)):
            values = line.split()
            word = ''.join(values[:-300])
            coefs = np.asarray(values[-300:], dtype='float32')
            embeddings_index[word] = coefs

    return embeddings_index

# EMBEDDING_PATH = '../embedding_weights/glove.840B.300d.txt'
EMBEDDING_PATH = 'glove.840B.300d.txt'
embeddings = load_glove(EMBEDDING_PATH)

np.save('glove_embeddings.npy', embeddings)

Gist链接:https://gist.github.com/BrambleXu/634a844cdd3cd04bb2e3ba3c83aef227

2018-09-11 06:06:40

这是典型的“统一码问题”。我相信，解释这个问题已经超出了StackOverflow回答的范围，无法完全解释正在发生的事情。

这里有很好的解释。

简单地说，您已经将一个被解释为字节字符串的内容传递给了需要将其解码为Unicode字符的内容，但是默认的编解码器(ascii)失败了。

我给你们看的演示提供了避免这种情况的建议。让你的代码成为“unicode三明治”。在Python 2中，使用from __future__ import unicode_literals会有所帮助。

更新:如何修复代码:

OK - in your variable "source" you have some bytes. It is not clear from your question how they got in there - maybe you read them from a web form? In any case, they are not encoded with ascii, but python is trying to convert them to unicode assuming that they are. You need to explicitly tell it what the encoding is. This means that you need to know what the encoding is! That is not always easy, and it depends entirely on where this string came from. You could experiment with some common encodings - for example UTF-8. You tell unicode() the encoding as a second parameter:

source = unicode(source, 'utf-8')

2014-01-15 05:04:19

在Django (1.9.10)/Python 2.7.5项目中，我经常出现UnicodeDecodeError异常;主要是当我试图向日志记录提供unicode字符串时。我为任意对象创建了一个辅助函数，基本上格式化为8位ascii字符串，并将表中不包含的任何字符替换为'?'。我认为这不是最好的解决方案，但由于默认编码是ascii(我不想改变它)，它会这样做:

encode_for_logging(c, encoding='ascii'): 如果isinstance(c, basestring): 返回c.encode(encoding， 'replace') elif isinstance(c, Iterable): C_ = [] 对于v (c) c_。追加(encode_for_logging (v,编码) 返回c_ 其他: 返回encode_for_logging (unicode (c)) ｀

2017-01-13 09:44:28

当字符串中有一些非ASCII字符，并且我们在没有正确解码的情况下对该字符串执行任何操作时，就会发生此错误。这帮我解决了我的问题。我正在阅读一个列ID，文本和解码字符的CSV文件，如下所示:

train_df = pd.read_csv("Example.csv")
train_data = train_df.values
for i in train_data:
    print("ID :" + i[0])
    text = i[1].decode("utf-8",errors="ignore").strip().lower()
    print("Text: " + text)

2018-07-26 06:47:12

最后我明白了:

as3:/usr/local/lib/python2.7/site-packages# cat sitecustomize.py
# encoding=utf8  
import sys  

reload(sys)  
sys.setdefaultencoding('utf8')

让我查一下:

as3:~/ngokevin-site# python
Python 2.7.6 (default, Dec  6 2013, 14:49:02)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.getdefaultencoding()
'utf8'
>>>

上面显示了python的默认编码是utf8。那么错误就不再存在了。

2014-01-17 16:03:41

如何修复:"UnicodeDecodeError: 'ascii'编解码器不能解码字节"

推荐文章

最新文章

标签