是否有一种方法可以在交互或脚本执行模式下扩大输出的显示?
具体来说,我在Pandas DataFrame上使用了describe()函数。当DataFrame是五列(标签)宽时,我得到了我想要的描述性统计数据。然而,如果DataFrame有更多的列,统计数据将被抑制,并返回如下内容:
>> Index: 8 entries, count to max
>> Data columns:
>> x1 8 non-null values
>> x2 8 non-null values
>> x3 8 non-null values
>> x4 8 non-null values
>> x5 8 non-null values
>> x6 8 non-null values
>> x7 8 non-null values
无论有6列还是7列,都给出“8”值。“8”指什么?
我已经尝试过将IDLE窗口拖大,以及增加“配置IDLE”宽度选项,但无济于事。
import pandas as pd
pd.set_option('display.max_columns', 100)
pd.set_option('display.width', 1000)
SentenceA = "William likes Piano and Piano likes William"
SentenceB = "Sara likes Guitar"
SentenceC = "Mamoosh likes Piano"
SentenceD = "William is a CS Student"
SentenceE = "Sara is kind"
SentenceF = "Mamoosh is kind"
bowA = SentenceA.split(" ")
bowB = SentenceB.split(" ")
bowC = SentenceC.split(" ")
bowD = SentenceD.split(" ")
bowE = SentenceE.split(" ")
bowF = SentenceF.split(" ")
# Creating a set consisting of all words
wordSet = set(bowA).union(set(bowB)).union(set(bowC)).union(set(bowD)).union(set(bowE)).union(set(bowF))
print("Set of all words is: ", wordSet)
# Initiating dictionary with 0 value for all BOWs
wordDictA = dict.fromkeys(wordSet, 0)
wordDictB = dict.fromkeys(wordSet, 0)
wordDictC = dict.fromkeys(wordSet, 0)
wordDictD = dict.fromkeys(wordSet, 0)
wordDictE = dict.fromkeys(wordSet, 0)
wordDictF = dict.fromkeys(wordSet, 0)
for word in bowA:
wordDictA[word] += 1
for word in bowB:
wordDictB[word] += 1
for word in bowC:
wordDictC[word] += 1
for word in bowD:
wordDictD[word] += 1
for word in bowE:
wordDictE[word] += 1
for word in bowF:
wordDictF[word] += 1
# Printing term frequency
print("SentenceA TF: ", wordDictA)
print("SentenceB TF: ", wordDictB)
print("SentenceC TF: ", wordDictC)
print("SentenceD TF: ", wordDictD)
print("SentenceE TF: ", wordDictE)
print("SentenceF TF: ", wordDictF)
print(pd.DataFrame([wordDictA, wordDictB, wordDictB, wordDictC, wordDictD, wordDictE, wordDictF]))
输出:
CS Guitar Mamoosh Piano Sara Student William a and is kind likes
0 0 0 0 2 0 0 2 0 1 0 0 2
1 0 1 0 0 1 0 0 0 0 0 0 1
2 0 1 0 0 1 0 0 0 0 0 0 1
3 0 0 1 1 0 0 0 0 0 0 0 1
4 1 0 0 0 0 1 1 1 0 1 0 0
5 0 0 0 0 1 0 0 0 0 1 1 0
6 0 0 1 0 0 0 0 0 0 1 1 0
这不是严格意义上的答案,但是让我们记住我们可以df.describe().transpose()或者df.head(n).transpose(),或者df.tail(n).transpose()。
我还发现,当标题是结构化的时,将它们作为列来阅读更容易:
header1_xxx,
header2_xxx,
header3_xxx,
我认为终端和应用程序处理垂直滚动更自然,如果这是必要的转置后。
标头通常比它们的值大,将它们全部放在一列(索引)中可以最大限度地减少它们对总表宽度的影响。
最后,其他的df描述也可以合并,这里有一个可能的想法:
def df_overview(df: pd.DataFrame, max_colwidth=25, head=3, tail=3):
return(
df.describe([0.5]).transpose()
.merge(df.dtypes.rename('dtypes'), left_index=True, right_index=True)
.merge(df.head(head).transpose(), left_index=True, right_index=True)
.merge(df.tail(tail).transpose(), left_index=True, right_index=True)
.to_string(max_colwidth=max_colwidth, float_format=lambda x: "{:.4G}".format(x))
)