如何修复:"UnicodeDecodeError: 'ascii'编解码器不能解码字节"

as3:~/ngokevin-site# nano content/blog/20140114_test-chinese.mkd
as3:~/ngokevin-site# wok
Traceback (most recent call last):
  File "/usr/local/bin/wok", line 4, in
    Engine()
  File "/usr/local/lib/python2.7/site-packages/wok/engine.py", line 104, in init
    self.load_pages()
  File "/usr/local/lib/python2.7/site-packages/wok/engine.py", line 238, in load_pages
    p = Page.from_file(os.path.join(root, f), self.options, self, renderer)
  File "/usr/local/lib/python2.7/site-packages/wok/page.py", line 111, in from_file
    page.meta['content'] = page.renderer.render(page.original)
  File "/usr/local/lib/python2.7/site-packages/wok/renderers.py", line 46, in render
    return markdown(plain, Markdown.plugins)
  File "/usr/local/lib/python2.7/site-packages/markdown/init.py", line 419, in markdown
    return md.convert(text)
  File "/usr/local/lib/python2.7/site-packages/markdown/init.py", line 281, in convert
    source = unicode(source)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 1: ordinal not in range(128). -- Note: Markdown only accepts unicode input!

如何解决?

在其他一些基于python的静态博客应用中，中文帖子可以成功发布。比如这个应用:http://github.com/vrypan/bucket3。在我的网站http://bc3.brite.biz/，中文帖子可以成功发布。

当前回答

为了在Ubuntu安装的操作系统层面上解决这个问题，请检查以下内容:

$ locale charmap

如果你得到

locale: Cannot set LC_CTYPE to default locale: No such file or directory

而不是

UTF-8

然后像这样设置LC_CTYPE和LC_ALL:

$ export LC_ALL="en_US.UTF-8"
$ export LC_CTYPE="en_US.UTF-8"

2019-02-20 15:53:11

其他回答

我得到了字符串“PastelerÃ-a Mallorca”同样的问题，我用:

unicode("PastelerÃa Mallorca", 'latin-1')

2017-05-29 08:47:33

在Django (1.9.10)/Python 2.7.5项目中，我经常出现UnicodeDecodeError异常;主要是当我试图向日志记录提供unicode字符串时。我为任意对象创建了一个辅助函数，基本上格式化为8位ascii字符串，并将表中不包含的任何字符替换为'?'。我认为这不是最好的解决方案，但由于默认编码是ascii(我不想改变它)，它会这样做:

encode_for_logging(c, encoding='ascii'): 如果isinstance(c, basestring): 返回c.encode(encoding， 'replace') elif isinstance(c, Iterable): C_ = [] 对于v (c) c_。追加(encode_for_logging (v,编码) 返回c_ 其他: 返回encode_for_logging (unicode (c)) ｀

2017-01-13 09:44:28

当字符串中有一些非ASCII字符，并且我们在没有正确解码的情况下对该字符串执行任何操作时，就会发生此错误。这帮我解决了我的问题。我正在阅读一个列ID，文本和解码字符的CSV文件，如下所示:

train_df = pd.read_csv("Example.csv")
train_data = train_df.values
for i in train_data:
    print("ID :" + i[0])
    text = i[1].decode("utf-8",errors="ignore").strip().lower()
    print("Text: " + text)

2018-07-26 06:47:12

这是我的解决方案，只需添加编码。用open(file, encoding='utf8')作为f

因为读取glove文件需要很长时间，所以我建议将glove文件转换为numpy文件。当你读取嵌入权重时，它将节省你的时间。

import numpy as np
from tqdm import tqdm


def load_glove(file):
    """Loads GloVe vectors in numpy array.
    Args:
        file (str): a path to a glove file.
    Return:
        dict: a dict of numpy arrays.
    """
    embeddings_index = {}
    with open(file, encoding='utf8') as f:
        for i, line in tqdm(enumerate(f)):
            values = line.split()
            word = ''.join(values[:-300])
            coefs = np.asarray(values[-300:], dtype='float32')
            embeddings_index[word] = coefs

    return embeddings_index

# EMBEDDING_PATH = '../embedding_weights/glove.840B.300d.txt'
EMBEDDING_PATH = 'glove.840B.300d.txt'
embeddings = load_glove(EMBEDDING_PATH)

np.save('glove_embeddings.npy', embeddings)

Gist链接:https://gist.github.com/BrambleXu/634a844cdd3cd04bb2e3ba3c83aef227

2018-09-11 06:06:40

我发现最好的方法是始终转换为unicode -但这很难实现，因为在实践中，您必须检查并将每个参数转换为您编写的包含某种形式的字符串处理的每个函数和方法。

因此，我提出了以下方法，以保证从任何一个输入中获得unicode或字节字符串。简而言之，包括并使用以下lambdas:

# guarantee unicode string
_u = lambda t: t.decode('UTF-8', 'replace') if isinstance(t, str) else t
_uu = lambda *tt: tuple(_u(t) for t in tt) 
# guarantee byte string in UTF8 encoding
_u8 = lambda t: t.encode('UTF-8', 'replace') if isinstance(t, unicode) else t
_uu8 = lambda *tt: tuple(_u8(t) for t in tt)

例子:

text='Some string with codes > 127, like Zürich'
utext=u'Some string with codes > 127, like Zürich'
print "==> with _u, _uu"
print _u(text), type(_u(text))
print _u(utext), type(_u(utext))
print _uu(text, utext), type(_uu(text, utext))
print "==> with u8, uu8"
print _u8(text), type(_u8(text))
print _u8(utext), type(_u8(utext))
print _uu8(text, utext), type(_uu8(text, utext))
# with % formatting, always use _u() and _uu()
print "Some unknown input %s" % _u(text)
print "Multiple inputs %s, %s" % _uu(text, text)
# but with string.format be sure to always work with unicode strings
print u"Also works with formats: {}".format(_u(text))
print u"Also works with formats: {},{}".format(*_uu(text, text))
# ... or use _u8 and _uu8, because string.format expects byte strings
print "Also works with formats: {}".format(_u8(text))
print "Also works with formats: {},{}".format(*_uu8(text, text))

这里有更多关于这个的推理。

2015-01-02 17:20:06

如何修复:"UnicodeDecodeError: 'ascii'编解码器不能解码字节"

推荐文章

最新文章

标签