你可以使用str.split by空格(默认分隔符)和参数expand=True为DataFrame赋值给新列:
df = pd.DataFrame({'row': ['00000 UNITED STATES', '01000 ALABAMA',
'01001 Autauga County, AL', '01003 Baldwin County, AL',
'01005 Barbour County, AL']})
print (df)
row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
df[['a','b']] = df['row'].str.split(n=1, expand=True)
print (df)
row a b
0 00000 UNITED STATES 00000 UNITED STATES
1 01000 ALABAMA 01000 ALABAMA
2 01001 Autauga County, AL 01001 Autauga County, AL
3 01003 Baldwin County, AL 01003 Baldwin County, AL
4 01005 Barbour County, AL 01005 Barbour County, AL
修改,如果需要删除原始列datafframe .pop
df[['a','b']] = df.pop('row').str.split(n=1, expand=True)
print (df)
a b
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
什么是一样的:
df[['a','b']] = df['row'].str.split(n=1, expand=True)
df = df.drop('row', axis=1)
print (df)
a b
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
如果get错误:
#remove n=1 for split by all whitespaces
df[['a','b']] = df['row'].str.split(expand=True)
ValueError:列的长度必须与键的长度相同
你可以检查,它返回4列DataFrame,而不是只有2:
print (df['row'].str.split(expand=True))
0 1 2 3
0 00000 UNITED STATES None
1 01000 ALABAMA None None
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
那么解决方案是通过join追加新的DataFrame:
df = pd.DataFrame({'row': ['00000 UNITED STATES', '01000 ALABAMA',
'01001 Autauga County, AL', '01003 Baldwin County, AL',
'01005 Barbour County, AL'],
'a':range(5)})
print (df)
a row
0 0 00000 UNITED STATES
1 1 01000 ALABAMA
2 2 01001 Autauga County, AL
3 3 01003 Baldwin County, AL
4 4 01005 Barbour County, AL
df = df.join(df['row'].str.split(expand=True))
print (df)
a row 0 1 2 3
0 0 00000 UNITED STATES 00000 UNITED STATES None
1 1 01000 ALABAMA 01000 ALABAMA None None
2 2 01001 Autauga County, AL 01001 Autauga County, AL
3 3 01003 Baldwin County, AL 01003 Baldwin County, AL
4 4 01005 Barbour County, AL 01005 Barbour County, AL
与删除原始列(如果还有其他列):
df = df.join(df.pop('row').str.split(expand=True))
print (df)
a 0 1 2 3
0 0 00000 UNITED STATES None
1 1 01000 ALABAMA None None
2 2 01001 Autauga County, AL
3 3 01003 Baldwin County, AL
4 4 01005 Barbour County, AL