要列出Pandas DataFrame列

我正在根据另一列中的条件从一列中提取数据子集。

我可以得到正确的值，但它在pandas.core.frame.DataFrame中。我怎么把它转换成列表?

import pandas as pd

tst = pd.read_csv('C:\\SomeCSV.csv')

lookupValue = tst['SomeCol'] == "SomeValue"
ID = tst[lookupValue][['SomeCol']]
#How To convert ID to a list

你可以使用级数。to_list方法。

例如:

import pandas as pd

df = pd.DataFrame({'a': [1, 3, 5, 7, 4, 5, 6, 4, 7, 8, 9],
                   'b': [3, 5, 6, 2, 4, 6, 7, 8, 7, 8, 9]})

print(df['a'].to_list())

输出:

[1, 3, 5, 7, 4, 5, 6, 4, 7, 8, 9]

要删除副本，您可以执行以下操作之一:

>>> df['a'].drop_duplicates().to_list()
[1, 3, 5, 7, 4, 6, 8, 9]
>>> list(set(df['a'])) # as pointed out by EdChum
[1, 3, 4, 5, 6, 7, 8, 9]

2014-05-20 00:09:04

如果所有数据都是相同的dtype，那么上面的解决方案是很好的。Numpy数组是同构容器。当你计算df的时候。值，输出为numpy数组。因此，如果数据中有int型和float型，那么输出将是int型或float型，列将失去它们原始的dtype。考虑df

a  b 
0  1  4
1  2  5 
2  3  6 

a    float64
b    int64

如果你想保持原始的dtype，你可以这样做

row_list = df.to_csv(None, header=False, index=False).split('\n')

这将以字符串的形式返回每一行。

['1.0,4', '2.0,5', '3.0,6', '']

然后拆分每一行得到list of list。拆分后的每个元素都是unicode。我们需要转换它所需的数据类型。

def f(row_str): 
  row_list = row_str.split(',')
  return [float(row_list[0]), int(row_list[1])]

df_list_of_list = map(f, row_list[:-1])

[[1.0, 4], [2.0, 5], [3.0, 6]]

2016-04-21 22:10:58

你可以使用pandas.Series.tolist

例如:

import pandas as pd
df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})

Run:

>>> df['a'].tolist()

你会得到

>>> [1, 2, 3]

2016-08-20 11:57:05

我想澄清几件事:

As other answers have pointed out, the simplest thing to do is use pandas.Series.tolist(). I'm not sure why the top voted answer leads off with using pandas.Series.values.tolist() since as far as I can tell, it adds syntax/confusion with no added benefit. tst[lookupValue][['SomeCol']] is a dataframe (as stated in the question), not a series (as stated in a comment to the question). This is because tst[lookupValue] is a dataframe, and slicing it with [['SomeCol']] asks for a list of columns (that list that happens to have a length of 1), resulting in a dataframe being returned. If you remove the extra set of brackets, as in tst[lookupValue]['SomeCol'], then you are asking for just that one column rather than a list of columns, and thus you get a series back. You need a series to use pandas.Series.tolist(), so you should definitely skip the second set of brackets in this case. FYI, if you ever end up with a one-column dataframe that isn't easily avoidable like this, you can use pandas.DataFrame.squeeze() to convert it to a series. tst[lookupValue]['SomeCol'] is getting a subset of a particular column via chained slicing. It slices once to get a dataframe with only certain rows left, and then it slices again to get a certain column. You can get away with it here since you are just reading, not writing, but the proper way to do it is tst.loc[lookupValue, 'SomeCol'] (which returns a series). Using the syntax from #4, you could reasonably do everything in one line: ID = tst.loc[tst['SomeCol'] == 'SomeValue', 'SomeCol'].tolist()

演示代码:

import pandas as pd
df = pd.DataFrame({'colA':[1,2,1],
                   'colB':[4,5,6]})
filter_value = 1

print "df"
print df
print type(df)

rows_to_keep = df['colA'] == filter_value
print "\ndf['colA'] == filter_value"
print rows_to_keep
print type(rows_to_keep)

result = df[rows_to_keep]['colB']
print "\ndf[rows_to_keep]['colB']"
print result
print type(result)

result = df[rows_to_keep][['colB']]
print "\ndf[rows_to_keep][['colB']]"
print result
print type(result)

result = df[rows_to_keep][['colB']].squeeze()
print "\ndf[rows_to_keep][['colB']].squeeze()"
print result
print type(result)

result = df.loc[rows_to_keep, 'colB']
print "\ndf.loc[rows_to_keep, 'colB']"
print result
print type(result)

result = df.loc[df['colA'] == filter_value, 'colB']
print "\ndf.loc[df['colA'] == filter_value, 'colB']"
print result
print type(result)

ID = df.loc[rows_to_keep, 'colB'].tolist()
print "\ndf.loc[rows_to_keep, 'colB'].tolist()"
print ID
print type(ID)

ID = df.loc[df['colA'] == filter_value, 'colB'].tolist()
print "\ndf.loc[df['colA'] == filter_value, 'colB'].tolist()"
print ID
print type(ID)

结果:

df
   colA  colB
0     1     4
1     2     5
2     1     6
<class 'pandas.core.frame.DataFrame'>

df['colA'] == filter_value
0     True
1    False
2     True
Name: colA, dtype: bool
<class 'pandas.core.series.Series'>

df[rows_to_keep]['colB']
0    4
2    6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>

df[rows_to_keep][['colB']]
   colB
0     4
2     6
<class 'pandas.core.frame.DataFrame'>

df[rows_to_keep][['colB']].squeeze()
0    4
2    6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>

df.loc[rows_to_keep, 'colB']
0    4
2    6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>

df.loc[df['colA'] == filter_value, 'colB']
0    4
2    6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>

df.loc[rows_to_keep, 'colB'].tolist()
[4, 6]
<type 'list'>

df.loc[df['colA'] == filter_value, 'colB'].tolist()
[4, 6]
<type 'list'>

2017-02-16 18:08:44

要列出Pandas DataFrame列

推荐文章

最新文章

标签