Excel到CSV的UTF8编码

我有一个Excel文件，其中有一些西班牙字符(波浪号等)，我需要将其转换为CSV文件作为导入文件使用。然而，当我将另存为CSV时，它会破坏不是ASCII字符的“特殊”西班牙字符。它似乎也这样做的左右引号和长破折号，似乎是来自最初的用户在Mac中创建Excel文件。

由于CSV只是一个文本文件，我确信它可以处理UTF8编码，所以我猜这是Excel的限制，但我正在寻找一种方法，从Excel到CSV，并保持非ascii字符完整。

当前回答

我写了一个小的Python脚本，可以导出UTF-8格式的工作表。

您只需要提供Excel文件作为第一个参数，然后是要导出的表。如果不提供工作表，脚本将导出Excel文件中存在的所有工作表。

#!/usr/bin/env python

# export data sheets from xlsx to csv

from openpyxl import load_workbook
import csv
from os import sys

reload(sys)
sys.setdefaultencoding('utf-8')

def get_all_sheets(excel_file):
    sheets = []
    workbook = load_workbook(excel_file,use_iterators=True,data_only=True)
    all_worksheets = workbook.get_sheet_names()
    for worksheet_name in all_worksheets:
        sheets.append(worksheet_name)
    return sheets

def csv_from_excel(excel_file, sheets):
    workbook = load_workbook(excel_file,use_iterators=True,data_only=True)
    for worksheet_name in sheets:
        print("Export " + worksheet_name + " ...")

        try:
            worksheet = workbook.get_sheet_by_name(worksheet_name)
        except KeyError:
            print("Could not find " + worksheet_name)
            sys.exit(1)

        your_csv_file = open(''.join([worksheet_name,'.csv']), 'wb')
        wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
        for row in worksheet.iter_rows():
            lrow = []
            for cell in row:
                lrow.append(cell.value)
            wr.writerow(lrow)
        print(" ... done")
    your_csv_file.close()

if not 2 <= len(sys.argv) <= 3:
    print("Call with " + sys.argv[0] + " <xlxs file> [comma separated list of sheets to export]")
    sys.exit(1)
else:
    sheets = []
    if len(sys.argv) == 3:
        sheets = list(sys.argv[2].split(','))
    else:
        sheets = get_all_sheets(sys.argv[1])
    assert(sheets != None and len(sheets) > 0)
    csv_from_excel(sys.argv[1], sheets)

2016-07-07 10:00:02

其他回答

我发现OpenOffice的电子表格应用程序Calc非常擅长处理CSV数据。

在“另存为…”对话框中，单击“格式选项”可获得CSV的不同编码。LibreOffice的工作原理与AFAIK相同。

2010-11-19 00:59:34

I needed to automate this process on my Mac. I originally tried using catdoc/xls2csv as suggested by mpowered, but xls2csv had trouble detecting the original encoding of the document and not all documents were the same. What I ended up doing was setting the default webpage output encoding to be UTF-8 and then providing the files to Apple's Automator, applying the Convert Format of Excel Files action to convert to Web Page (HTML). Then using PHP, DOMDocument and XPath, I queried the documents and formatted them to CSV.

这是PHP脚本(process.php):

<?php
$pi = pathinfo($argv[1]);
$file = $pi['dirname'] . '/' . $pi['filename'] . '.csv';
$fp = fopen($file,'w+');
$doc = new DOMDocument;
$doc->loadHTMLFile($argv[1]);
$xpath = new DOMXPath($doc);
$table = [];
foreach($xpath->query('//tr') as $row){
    $_r = [];
    foreach($xpath->query('td',$row) as $col){
        $_r[] = trim($col->textContent);
    }
    fputcsv($fp,$_r);
}
fclose($fp);
?>

这是我用来将HTML文档转换为csv的shell命令:

find . -name '*.htm' | xargs -I{} php ./process.php {}

这是一种非常非常迂回的方法，但这是我发现的最可靠的方法。

2016-06-07 18:46:01

将Excel表格保存为“Unicode Text (.txt)”。好消息是所有的国际字符都是UTF16(注意，不是UTF8)。但是，新的“*.txt”文件是TAB分隔符，而不是逗号分隔符，因此不是真正的CSV。 (可选)除非您可以使用制表符分隔的文件进行导入，否则请使用您最喜欢的文本编辑器并将制表符替换为逗号“，”。在目标应用程序中导入*.txt文件。确保它可以接受UTF16格式。

如果UTF-16已经正确实现，并且支持非bmp代码点，那么您就可以将UTF-16文件转换为UTF-8而不会丢失信息。我把它留给你去寻找你最喜欢的方法。

我使用这个过程从Excel导入数据到Moodle。

2013-03-19 12:51:59

一个简单的解决方法是使用谷歌电子表格。粘贴(只有当您有复杂公式时才使用值)或导入工作表，然后下载CSV。我只是试了几个字符，效果相当不错。

注意:谷歌表在导入时有限制。在这里看到的。

注意:小心使用谷歌表的敏感数据。

编辑:另一种选择-基本上他们使用VB宏或插件强制保存为UTF8。我没有尝试过这些解决方案，但它们听起来很合理。

2010-11-19 01:08:33

我也遇到了同样的问题，于是谷歌了这篇文章。以上这些方法对我都没用。最后，我将我的Unicode .xls转换为.xml(选择另存为…XML电子表格2003)，它产生了正确的字符。然后我编写代码来解析xml并提取内容供我使用。

2015-09-01 15:57:16

Excel到CSV的UTF8编码

推荐文章

最新文章

标签