导致UnicodeDecodeError: 'utf-8' codec不能解码字节

这是我的代码，

for line in open('u.item'):
# Read each line

每当我运行这段代码，它给出以下错误:

UnicodeDecodeError: 'utf-8' codec无法解码字节0xe9在位置2892:无效的延续字节

我试图解决这个问题，并在open()中添加了一个额外的参数。代码如下:

for line in open('u.item', encoding='utf-8'):
# Read each line

但是它又给出了同样的错误。那我该怎么办呢?

当前回答

为了让网页在类似问题(关于UTF-8错误)的google请求中搜索得更快，我把我的解决方法留给其他人。

我有问题。csv文件打开的描述:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 150: invalid continuation byte

我用记事本打开文件，数了数第150位:那是一个西里尔字母的符号。我用“另存为”重新保存了那个文件。'命令与编码'UTF-8'和我的程序开始工作。

2021-08-03 05:39:11

其他回答

用notepad++打开文件，选择“编码”或“编码”菜单来识别或从ANSI转换为UTF-8或ISO 8859-1代码页。

2021-01-22 09:46:39

试着用Pandas来阅读:

pd.read_csv('u.item', sep='|', names=m_cols, encoding='latin-1')

2017-01-31 20:35:31

基于Stackoverflow上的另一个问题和本文之前的回答，我想添加一个帮助来找到正确的编码。

如果你的脚本运行在Linux操作系统上，你可以通过file命令获取编码:

file --mime-encoding <filename>

下面是一个python脚本来为你做这件事:

import sys
import subprocess

if len(sys.argv) < 2:
    print("Usage: {} <filename>".format(sys.argv[0]))
    sys.exit(1)

def find_encoding(fname):
    """Find the encoding of a file using file command
    """

    # find fullname of file command
    which_run = subprocess.run(['which', 'file'], stdout=subprocess.PIPE)
    if which_run.returncode != 0:
        print("Unable to find 'file' command ({})".format(which_run.returncode))
        return None

    file_cmd = which_run.stdout.decode().replace('\n', '')

    # run file command to get MIME encoding
    file_run = subprocess.run([file_cmd, '--mime-encoding', fname],
                               stdout=subprocess.PIPE,
                               stderr=subprocess.PIPE)
    if file_run.returncode != 0:
        print(file_run.stderr.decode(), file=sys.stderr)

    # return  encoding name only
    return file_run.stdout.decode().split()[1]

# test
print("Encoding of {}: {}".format(sys.argv[1], find_encoding(sys.argv[1])))

2021-08-30 05:19:54

你可以试试这种方法:

open('u.item', encoding='utf8', errors='ignore')

2020-05-23 19:53:03

如果你使用的是python2，下面是解决方案:

import io
for line in io.open("u.item", encoding="ISO-8859-1"):
    # Do something

因为encoding参数对open()不起作用，你将得到以下错误:

'encoding'是此函数的无效关键字参数

2017-03-03 17:32:48

导致UnicodeDecodeError: 'utf-8' codec不能解码字节

推荐文章

最新文章

标签