如何将数据帧列转换为数字类型?


当前回答

要将字符转换为数字,您必须通过应用将其转换为因数

BankFinal1 <- transform(BankLoan,   LoanApproval=as.factor(LoanApproval))
BankFinal1 <- transform(BankFinal1, LoanApp=as.factor(LoanApproval))

您必须用相同的数据创建两列,因为一列不能转换为数字。如果你做一次转换,它会给出如下错误

transform(BankData, LoanApp=as.numeric(LoanApproval))

警告信息: 在eval(替代(列表 (...)), `_ 数据”,parent.frame ()): 胁迫引入的NAs

所以,在做了两列相同的数据应用后

BankFinal1 <- transform(BankFinal1, LoanApp      = as.numeric(LoanApp), 
                                    LoanApproval = as.numeric(LoanApproval))

它将成功地将字符转换为数字

其他回答

要将字符转换为数字,您必须通过应用将其转换为因数

BankFinal1 <- transform(BankLoan,   LoanApproval=as.factor(LoanApproval))
BankFinal1 <- transform(BankFinal1, LoanApp=as.factor(LoanApproval))

您必须用相同的数据创建两列,因为一列不能转换为数字。如果你做一次转换,它会给出如下错误

transform(BankData, LoanApp=as.numeric(LoanApproval))

警告信息: 在eval(替代(列表 (...)), `_ 数据”,parent.frame ()): 胁迫引入的NAs

所以,在做了两列相同的数据应用后

BankFinal1 <- transform(BankFinal1, LoanApp      = as.numeric(LoanApp), 
                                    LoanApproval = as.numeric(LoanApproval))

它将成功地将字符转换为数字

与hablar::转换

要轻松地将多个列转换为不同的数据类型,可以使用hablar::convert。简单的语法:df %>% convert(num(a))将列a从df转换为数值。

详细的例子

让我们将mtcars的所有列转换为字符。

df <- mtcars %>% mutate_all(as.character) %>% as_tibble()

> df
# A tibble: 32 x 11
   mpg   cyl   disp  hp    drat  wt    qsec  vs    am    gear  carb 
   <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1 21    6     160   110   3.9   2.62  16.46 0     1     4     4    
 2 21    6     160   110   3.9   2.875 17.02 0     1     4     4    
 3 22.8  4     108   93    3.85  2.32  18.61 1     1     4     1    

与hablar::转换:

library(hablar)

# Convert columns to integer, numeric and factor
df %>% 
  convert(int(cyl, vs),
          num(disp:wt),
          fct(gear))

结果:

# A tibble: 32 x 11
   mpg     cyl  disp    hp  drat    wt qsec     vs am    gear  carb 
   <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr> <int> <chr> <fct> <chr>
 1 21        6  160    110  3.9   2.62 16.46     0 1     4     4    
 2 21        6  160    110  3.9   2.88 17.02     0 1     4     4    
 3 22.8      4  108     93  3.85  2.32 18.61     1 1     4     1    
 4 21.4      6  258    110  3.08  3.22 19.44     1 0     3     1   

如果x是dataframe dat的列名,x的类型是factor,使用:

as.numeric(as.character(dat$x))

蒂姆是对的,谢恩有个遗漏。以下是其他例子:

R> df <- data.frame(a = as.character(10:15))
R> df <- data.frame(df, num = as.numeric(df$a), 
                        numchr = as.numeric(as.character(df$a)))
R> df
   a num numchr
1 10   1     10
2 11   2     11
3 12   3     12
4 13   4     13
5 14   5     14
6 15   6     15
R> summary(df)
  a          num           numchr    
 10:1   Min.   :1.00   Min.   :10.0  
 11:1   1st Qu.:2.25   1st Qu.:11.2  
 12:1   Median :3.50   Median :12.5  
 13:1   Mean   :3.50   Mean   :12.5  
 14:1   3rd Qu.:4.75   3rd Qu.:13.8  
 15:1   Max.   :6.00   Max.   :15.0  
R> 

我们的data.frame现在有了因子列的摘要(counts)和as.numeric()的数值摘要(这是错误的,因为它得到了数值因子级别)以及as.numeric(as.character())的(正确的)摘要。

虽然你的问题严格是关于数字的,但在开始r时,有许多转换是难以理解的。我将致力于解决帮助的方法。这个问题和这个问题类似。

在R中,类型转换可能是一种痛苦,因为(1)因子不能直接转换为数字,它们需要首先转换为字符类,(2)日期是一种特殊情况,通常需要单独处理,(3)跨数据帧列的循环可能很棘手。幸运的是,“潮流宇宙”已经解决了大部分问题。

This solution uses mutate_each() to apply a function to all columns in a data frame. In this case, we want to apply the type.convert() function, which converts strings to numeric where it can. Because R loves factors (not sure why) character columns that should stay character get changed to factor. To fix this, the mutate_if() function is used to detect columns that are factors and change to character. Last, I wanted to show how lubridate can be used to change a timestamp in character class to date-time because this is also often a sticking block for beginners.


library(tidyverse) 
library(lubridate)

# Recreate data that needs converted to numeric, date-time, etc
data_df
#> # A tibble: 5 × 9
#>             TIMESTAMP SYMBOL    EX  PRICE  SIZE  COND   BID BIDSIZ   OFR
#>                 <chr>  <chr> <chr>  <chr> <chr> <chr> <chr>  <chr> <chr>
#> 1 2012-05-04 09:30:00    BAC     T 7.8900 38538     F  7.89    523  7.90
#> 2 2012-05-04 09:30:01    BAC     Z 7.8850   288     @  7.88  61033  7.90
#> 3 2012-05-04 09:30:03    BAC     X 7.8900  1000     @  7.88   1974  7.89
#> 4 2012-05-04 09:30:07    BAC     T 7.8900 19052     F  7.88   1058  7.89
#> 5 2012-05-04 09:30:08    BAC     Y 7.8900 85053     F  7.88 108101  7.90

# Converting columns to numeric using "tidyverse"
data_df %>%
    mutate_all(type.convert) %>%
    mutate_if(is.factor, as.character) %>%
    mutate(TIMESTAMP = as_datetime(TIMESTAMP, tz = Sys.timezone()))
#> # A tibble: 5 × 9
#>             TIMESTAMP SYMBOL    EX PRICE  SIZE  COND   BID BIDSIZ   OFR
#>                <dttm>  <chr> <chr> <dbl> <int> <chr> <dbl>  <int> <dbl>
#> 1 2012-05-04 09:30:00    BAC     T 7.890 38538     F  7.89    523  7.90
#> 2 2012-05-04 09:30:01    BAC     Z 7.885   288     @  7.88  61033  7.90
#> 3 2012-05-04 09:30:03    BAC     X 7.890  1000     @  7.88   1974  7.89
#> 4 2012-05-04 09:30:07    BAC     T 7.890 19052     F  7.88   1058  7.89
#> 5 2012-05-04 09:30:08    BAC     Y 7.890 85053     F  7.88 108101  7.90