as3:~/ngokevin-site# nano content/blog/20140114_test-chinese.mkd
as3:~/ngokevin-site# wok
Traceback (most recent call last):
  File "/usr/local/bin/wok", line 4, in
    Engine()
  File "/usr/local/lib/python2.7/site-packages/wok/engine.py", line 104, in init
    self.load_pages()
  File "/usr/local/lib/python2.7/site-packages/wok/engine.py", line 238, in load_pages
    p = Page.from_file(os.path.join(root, f), self.options, self, renderer)
  File "/usr/local/lib/python2.7/site-packages/wok/page.py", line 111, in from_file
    page.meta['content'] = page.renderer.render(page.original)
  File "/usr/local/lib/python2.7/site-packages/wok/renderers.py", line 46, in render
    return markdown(plain, Markdown.plugins)
  File "/usr/local/lib/python2.7/site-packages/markdown/init.py", line 419, in markdown
    return md.convert(text)
  File "/usr/local/lib/python2.7/site-packages/markdown/init.py", line 281, in convert
    source = unicode(source)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 1: ordinal not in range(128). -- Note: Markdown only accepts unicode input!

如何解决?

在其他一些基于python的静态博客应用中,中文帖子可以成功发布。 比如这个应用:http://github.com/vrypan/bucket3。在我的网站http://bc3.brite.biz/,中文帖子可以成功发布。


当前回答

Got a same error and this solved my error. Thanks! python 2 and python 3 differing in unicode handling is making pickled files quite incompatible to load. So Use python pickle's encoding argument. Link below helped me solve the similar problem when I was trying to open pickled data from my python 3.7, while my file was saved originally in python 2.x version. https://blog.modest-destiny.com/posts/python-2-and-3-compatible-pickle-save-and-load/ I copy the load_pickle function in my script and called the load_pickle(pickle_file) while loading my input_data like this:

input_data = load_pickle("my_dataset.pkl")

load_pickle函数在这里:

def load_pickle(pickle_file):
    try:
        with open(pickle_file, 'rb') as f:
            pickle_data = pickle.load(f)
    except UnicodeDecodeError as e:
        with open(pickle_file, 'rb') as f:
            pickle_data = pickle.load(f, encoding='latin1')
    except Exception as e:
        print('Unable to load data ', pickle_file, ':', e)
        raise
    return pickle_data

其他回答

这是典型的“统一码问题”。我相信,解释这个问题已经超出了StackOverflow回答的范围,无法完全解释正在发生的事情。

这里有很好的解释。

简单地说,您已经将一个被解释为字节字符串的内容传递给了需要将其解码为Unicode字符的内容,但是默认的编解码器(ascii)失败了。

我给你们看的演示提供了避免这种情况的建议。让你的代码成为“unicode三明治”。在Python 2中,使用from __future__ import unicode_literals会有所帮助。

更新:如何修复代码:

OK - in your variable "source" you have some bytes. It is not clear from your question how they got in there - maybe you read them from a web form? In any case, they are not encoded with ascii, but python is trying to convert them to unicode assuming that they are. You need to explicitly tell it what the encoding is. This means that you need to know what the encoding is! That is not always easy, and it depends entirely on where this string came from. You could experiment with some common encodings - for example UTF-8. You tell unicode() the encoding as a second parameter:

source = unicode(source, 'utf-8')

我发现最好的方法是始终转换为unicode -但这很难实现,因为在实践中,您必须检查并将每个参数转换为您编写的包含某种形式的字符串处理的每个函数和方法。

因此,我提出了以下方法,以保证从任何一个输入中获得unicode或字节字符串。简而言之,包括并使用以下lambdas:

# guarantee unicode string
_u = lambda t: t.decode('UTF-8', 'replace') if isinstance(t, str) else t
_uu = lambda *tt: tuple(_u(t) for t in tt) 
# guarantee byte string in UTF8 encoding
_u8 = lambda t: t.encode('UTF-8', 'replace') if isinstance(t, unicode) else t
_uu8 = lambda *tt: tuple(_u8(t) for t in tt)

例子:

text='Some string with codes > 127, like Zürich'
utext=u'Some string with codes > 127, like Zürich'
print "==> with _u, _uu"
print _u(text), type(_u(text))
print _u(utext), type(_u(utext))
print _uu(text, utext), type(_uu(text, utext))
print "==> with u8, uu8"
print _u8(text), type(_u8(text))
print _u8(utext), type(_u8(utext))
print _uu8(text, utext), type(_uu8(text, utext))
# with % formatting, always use _u() and _uu()
print "Some unknown input %s" % _u(text)
print "Multiple inputs %s, %s" % _uu(text, text)
# but with string.format be sure to always work with unicode strings
print u"Also works with formats: {}".format(_u(text))
print u"Also works with formats: {},{}".format(*_uu(text, text))
# ... or use _u8 and _uu8, because string.format expects byte strings
print "Also works with formats: {}".format(_u8(text))
print "Also works with formats: {},{}".format(*_uu8(text, text))

这里有更多关于这个的推理。

Encode将unicode对象转换为字符串对象。我认为你正在尝试编码一个字符串对象。首先将结果转换为unicode对象,然后将该unicode对象编码为'utf-8'。 例如

    result = yourFunction()
    result.decode().encode('utf-8')

我遇到了同样的问题,但它不适用于Python 3。我遵循了这个方法,解决了我的问题:

enc = sys.getdefaultencoding()
file = open(menu, "r", encoding = enc)

在读取/写入文件时,必须设置编码。

最后我明白了:

as3:/usr/local/lib/python2.7/site-packages# cat sitecustomize.py
# encoding=utf8  
import sys  

reload(sys)  
sys.setdefaultencoding('utf8')

让我查一下:

as3:~/ngokevin-site# python
Python 2.7.6 (default, Dec  6 2013, 14:49:02)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.getdefaultencoding()
'utf8'
>>>

上面显示了python的默认编码是utf8。那么错误就不再存在了。