如何将数据帧列转换为数字类型?
当前回答
要将字符转换为数字,您必须通过应用将其转换为因数
BankFinal1 <- transform(BankLoan, LoanApproval=as.factor(LoanApproval))
BankFinal1 <- transform(BankFinal1, LoanApp=as.factor(LoanApproval))
您必须用相同的数据创建两列,因为一列不能转换为数字。如果你做一次转换,它会给出如下错误
transform(BankData, LoanApp=as.numeric(LoanApproval))
警告信息: 在eval(替代(列表 (...)), `_ 数据”,parent.frame ()): 胁迫引入的NAs
所以,在做了两列相同的数据应用后
BankFinal1 <- transform(BankFinal1, LoanApp = as.numeric(LoanApp),
LoanApproval = as.numeric(LoanApproval))
它将成功地将字符转换为数字
其他回答
与hablar::转换
要轻松地将多个列转换为不同的数据类型,可以使用hablar::convert。简单的语法:df %>% convert(num(a))将列a从df转换为数值。
详细的例子
让我们将mtcars的所有列转换为字符。
df <- mtcars %>% mutate_all(as.character) %>% as_tibble()
> df
# A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 21 6 160 110 3.9 2.62 16.46 0 1 4 4
2 21 6 160 110 3.9 2.875 17.02 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
与hablar::转换:
library(hablar)
# Convert columns to integer, numeric and factor
df %>%
convert(int(cyl, vs),
num(disp:wt),
fct(gear))
结果:
# A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<chr> <int> <dbl> <dbl> <dbl> <dbl> <chr> <int> <chr> <fct> <chr>
1 21 6 160 110 3.9 2.62 16.46 0 1 4 4
2 21 6 160 110 3.9 2.88 17.02 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
4 21.4 6 258 110 3.08 3.22 19.44 1 0 3 1
如果数据帧有多种类型的列,一些字符,一些数字尝试以下转换列包含数值为数值:
for (i in 1:length(data[1,])){
if(length(as.numeric(data[,i][!is.na(data[,i])])[!is.na(as.numeric(data[,i][!is.na(data[,i])]))])==0){}
else {
data[,i]<-as.numeric(data[,i])
}
}
虽然其他人已经很好地讨论了这个话题,但我想补充一个额外的快速思考/提示。可以使用regexp提前检查字符是否可能仅由数字组成。
for(i in seq_along(names(df)){
potential_numcol[i] <- all(!grepl("[a-zA-Z]",d[,i]))
}
# and now just convert only the numeric ones
d <- sapply(d[,potential_numcol],as.numeric)
想要了解更多复杂的正则表达式,以及为什么要学习/体验它们的力量,请访问这个非常好的网站:http://regexr.com/
要将数据帧列转换为数字,你只需要做:-
因数转换为数字:-
data_frame$column <- as.numeric(as.character(data_frame$column))
虽然你的问题严格是关于数字的,但在开始r时,有许多转换是难以理解的。我将致力于解决帮助的方法。这个问题和这个问题类似。
在R中,类型转换可能是一种痛苦,因为(1)因子不能直接转换为数字,它们需要首先转换为字符类,(2)日期是一种特殊情况,通常需要单独处理,(3)跨数据帧列的循环可能很棘手。幸运的是,“潮流宇宙”已经解决了大部分问题。
This solution uses mutate_each() to apply a function to all columns in a data frame. In this case, we want to apply the type.convert() function, which converts strings to numeric where it can. Because R loves factors (not sure why) character columns that should stay character get changed to factor. To fix this, the mutate_if() function is used to detect columns that are factors and change to character. Last, I wanted to show how lubridate can be used to change a timestamp in character class to date-time because this is also often a sticking block for beginners.
library(tidyverse)
library(lubridate)
# Recreate data that needs converted to numeric, date-time, etc
data_df
#> # A tibble: 5 × 9
#> TIMESTAMP SYMBOL EX PRICE SIZE COND BID BIDSIZ OFR
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2012-05-04 09:30:00 BAC T 7.8900 38538 F 7.89 523 7.90
#> 2 2012-05-04 09:30:01 BAC Z 7.8850 288 @ 7.88 61033 7.90
#> 3 2012-05-04 09:30:03 BAC X 7.8900 1000 @ 7.88 1974 7.89
#> 4 2012-05-04 09:30:07 BAC T 7.8900 19052 F 7.88 1058 7.89
#> 5 2012-05-04 09:30:08 BAC Y 7.8900 85053 F 7.88 108101 7.90
# Converting columns to numeric using "tidyverse"
data_df %>%
mutate_all(type.convert) %>%
mutate_if(is.factor, as.character) %>%
mutate(TIMESTAMP = as_datetime(TIMESTAMP, tz = Sys.timezone()))
#> # A tibble: 5 × 9
#> TIMESTAMP SYMBOL EX PRICE SIZE COND BID BIDSIZ OFR
#> <dttm> <chr> <chr> <dbl> <int> <chr> <dbl> <int> <dbl>
#> 1 2012-05-04 09:30:00 BAC T 7.890 38538 F 7.89 523 7.90
#> 2 2012-05-04 09:30:01 BAC Z 7.885 288 @ 7.88 61033 7.90
#> 3 2012-05-04 09:30:03 BAC X 7.890 1000 @ 7.88 1974 7.89
#> 4 2012-05-04 09:30:07 BAC T 7.890 19052 F 7.88 1058 7.89
#> 5 2012-05-04 09:30:08 BAC Y 7.890 85053 F 7.88 108101 7.90