熊猫read_csv: low_memory和dtype选项

df = pd.read_csv('somefile.csv')

.．.给出一个错误:

熊猫…/网站/ / io / parsers.py: 1130: DtypeWarning:列(4,5,7,16)为混合类型。指定dtype 选项导入或设置low_memory=False。

为什么dtype选项与low_memory相关，为什么low_memory=False帮助?

当前回答

它为我工作与low_memory = False同时导入一个数据帧。这就是所有对我有效的改变:

df = pd.read_csv('export4_16.csv',low_memory=False)

2019-04-17 14:40:40

其他回答

正如fireynx前面提到的，如果显式指定了dtype，并且存在与该dtype不兼容的混合数据，则加载将崩溃。我使用了这样的转换器作为变通方法来更改数据类型不兼容的值，这样数据仍然可以加载。

def conv(val):
    if not val:
        return 0    
    try:
        return np.float64(val)
    except:        
        return np.float64(0)

df = pd.read_csv(csv_file,converters={'COL_A':conv,'COL_B':conv})

2016-09-02 18:17:01

在处理一个巨大的csv文件(600万行)时，我也遇到过类似的问题。我有三个问题:

文件包含奇怪字符(使用编码修复) 未指定数据类型(使用dtype属性修复) 使用上面的方法，我仍然面临一个问题，这与无法基于文件名定义的file_format有关(使用try ..除了. .)

    df = pd.read_csv(csv_file,sep=';', encoding = 'ISO-8859-1',
                     names=['permission','owner_name','group_name','size','ctime','mtime','atime','filename','full_filename'],
                     dtype={'permission':str,'owner_name':str,'group_name':str,'size':str,'ctime':object,'mtime':object,'atime':object,'filename':str,'full_filename':str,'first_date':object,'last_date':object})
    
    try:
        df['file_format'] = [Path(f).suffix[1:] for f in df.filename.tolist()]
    except:
        df['file_format'] = ''

2020-05-18 11:16:12

正如错误所示，在使用read_csv()方法时应该指定数据类型。所以，你应该写

file = pd.read_csv('example.csv', dtype='unicode')

2020-08-15 16:01:11

这对我很管用!

dashboard_df = pd.read_csv(p_file, sep=';', error_bad_lines=False, index_col=False, dtype='unicode')

2022-11-17 12:12:21

它为我工作与low_memory = False同时导入一个数据帧。这就是所有对我有效的改变:

df = pd.read_csv('export4_16.csv',low_memory=False)

2019-04-17 14:40:40

熊猫read_csv: low_memory和dtype选项

推荐文章

最新文章

标签