我有一个很大的数据集,我想阅读特定的列或放弃所有其他列。

data <- read.dta("file.dta")

我选择我不感兴趣的列:

var.out <- names(data)[!names(data) %in% c("iden", "name", "x_serv", "m_serv")]

然后我想做的事情是:

for(i in 1:length(var.out)) {
   paste("data$", var.out[i], sep="") <- NULL
}

删除所有不需要的列。这是最优解吗?


当前回答

我试图在使用包数据时删除一列。表并得到了意想不到的结果。我觉得下面的内容可能值得发表。只是一个小警告。

[编辑:Matthew…]

DF = read.table(text = "
     fruit state grade y1980 y1990 y2000
     apples Ohio   aa    500   100   55
     apples Ohio   bb      0     0   44
     apples Ohio   cc    700     0   33
     apples Ohio   dd    300    50   66
", sep = "", header = TRUE, stringsAsFactors = FALSE)

DF[ , !names(DF) %in% c("grade")]   # all columns other than 'grade'
   fruit state y1980 y1990 y2000
1 apples  Ohio   500   100    55
2 apples  Ohio     0     0    44
3 apples  Ohio   700     0    33
4 apples  Ohio   300    50    66

library('data.table')
DT = as.data.table(DF)

DT[ , !names(dat4) %in% c("grade")]    # not expected !! not the same as DF !!
[1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE

DT[ , !names(DT) %in% c("grade"), with=FALSE]    # that's better
    fruit state y1980 y1990 y2000
1: apples  Ohio   500   100    55
2: apples  Ohio     0     0    44
3: apples  Ohio   700     0    33
4: apples  Ohio   300    50    66

基本上就是数据的语法。table与data.frame并不完全相同。实际上有很多不同之处,参见FAQ 1.1和FAQ 2.17。我警告过你!

其他回答

我试图在使用包数据时删除一列。表并得到了意想不到的结果。我觉得下面的内容可能值得发表。只是一个小警告。

[编辑:Matthew…]

DF = read.table(text = "
     fruit state grade y1980 y1990 y2000
     apples Ohio   aa    500   100   55
     apples Ohio   bb      0     0   44
     apples Ohio   cc    700     0   33
     apples Ohio   dd    300    50   66
", sep = "", header = TRUE, stringsAsFactors = FALSE)

DF[ , !names(DF) %in% c("grade")]   # all columns other than 'grade'
   fruit state y1980 y1990 y2000
1 apples  Ohio   500   100    55
2 apples  Ohio     0     0    44
3 apples  Ohio   700     0    33
4 apples  Ohio   300    50    66

library('data.table')
DT = as.data.table(DF)

DT[ , !names(dat4) %in% c("grade")]    # not expected !! not the same as DF !!
[1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE

DT[ , !names(DT) %in% c("grade"), with=FALSE]    # that's better
    fruit state y1980 y1990 y2000
1: apples  Ohio   500   100    55
2: apples  Ohio     0     0    44
3: apples  Ohio   700     0    33
4: apples  Ohio   300    50    66

基本上就是数据的语法。table与data.frame并不完全相同。实际上有很多不同之处,参见FAQ 1.1和FAQ 2.17。我警告过你!

如果你确切地知道原始数据框架df中的列的名称:

cols_to_drop <- c("A", "B", "C")
df_clean = df[,!(names(df) %in% cols_to_drop)]

Src: https://www.listendata.com/2015/06/r-keep-drop-columns-from-data-frame.html

你也可以尝试dplyr包:

R> df <- data.frame(x=1:5, y=2:6, z=3:7, u=4:8)
R> df
  x y z u
1 1 2 3 4
2 2 3 4 5
3 3 4 5 6
4 4 5 6 7
5 5 6 7 8
R> library(dplyr)
R> dplyr::select(df2, -c(x, y))  # remove columns x and y
  z u
1 3 4
2 4 5
3 5 6
4 6 7
5 7 8
df = mtcars 
remove vs and am because they are categorical. In the dataset vs is in column number 8, am is in column number 9

Dfnum = df[,-c(8,9)]

我将代码更改为:

# read data
dat<-read.dta("file.dta")

# vars to delete
var.in<-c("iden", "name", "x_serv", "m_serv")

# what I'm keeping
var.out<-setdiff(names(dat),var.in)

# keep only the ones I want       
dat <- dat[var.out]

无论如何,朱巴的答案是我问题的最佳解决方案!