为什么下面的项目失败了?为什么它成功与“拉丁-1”编解码器?
o = "a test of \xe9 char" #I want this to remain a string as this is what I am receiving
v = o.decode("utf-8")
结果是:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\utf_8.py",
line 16, in decode
return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError:
'utf8' codec can't decode byte 0xe9 in position 10: invalid continuation byte
TLDR:我建议在切换编码器以消除错误之前深入调查问题的根源。
我得到这个错误,因为我正在处理大量的zip文件,其中有额外的zip文件。
我的工作流程如下:
读取zip
读取子zip
读取子zip中的文本
At some point I was hitting the encoding error above. Upon closer inspection, it turned out that some child zips erroneously contained further zips. Reading these zips as text lead to some funky character representation that I could silence with encoding="latin-1", but which in turn caused issues further down the line. Since I was working with international data it was not completely foolish to assume it was an encoding problem (I had problems with 0xc2: Â), but in the end it was not the actual issue.
Utf-8代码错误通常发生在数值范围超过0到127时。
引发这个异常的原因是:
1)如果编码点< 128,则每个字节都与编码点的值相同。
2)如果编码点为128或更大,则Unicode字符串不能在此编码中表示。(Python在这种情况下会引发UnicodeEncodeError异常。)
为了克服这个问题我们有了一套编码,使用最广泛的是“Latin-1,也称为ISO-8859-1”
因此,ISO-8859-1 Unicode点0-255与Latin-1值相同,因此转换为这种编码只需将代码点转换为字节值;如果遇到大于255的码位,则不能将字符串编码为Latin-1
当您试图加载数据集时发生此异常时,请尝试使用此格式
df=pd.read_csv("top50.csv",encoding='ISO-8859-1')
在语法的末尾添加编码技术,然后接受加载数据集。
在这种情况下,我尝试执行一个激活路径/file.sql的.py。
我的解决方案是修改文件的编码。sql到“UTF-8没有BOM”,它的工作!
你可以用notepad++来做。
我会留下一部分代码。
con = psycopg2.connect(host = sys.argv[1],
port = sys.argv[2],dbname = sys.argv[3],user = sys.argv[4], password = sys.argv[5])
cursor = con.cursor()
sqlfile = open(path, 'r')