我使用函数ifelse()来操作一个日期向量。我期望的结果是类Date,并惊讶地得到一个数字向量。这里有一个例子:

dates <- as.Date(c('2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04', '2011-01-05'))
dates <- ifelse(dates == '2011-01-01', dates - 1, dates)
str(dates)

这尤其令人惊讶,因为在整个向量上执行操作将返回一个Date对象。

dates <- as.Date(c('2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04','2011-01-05'))
dates <- dates - 1
str(dates)

我是否应该使用一些其他函数来操作日期向量?如果有,是什么功能?如果不是,我如何强制ifelse返回与输入相同类型的向量?

ifelse的帮助页面表明这是一个功能,而不是一个错误,但我仍然在努力为我发现的令人惊讶的行为寻找解释。


它与ifelse的文档值有关:

长度和属性(包括维度和“类”)与测试值和数据值相同的向量,这些值来自yes或no。答案的模式将从逻辑上被强制,以容纳首先从“是”获得的任何值,然后从“否”获得的任何值。

归结为它的含义,ifelse使因素失去它们的级别,日期失去它们的类,只有它们的模式(“数字”)被恢复。试试这个吧:

dates[dates == '2011-01-01'] <- dates[dates == '2011-01-01'] - 1
str(dates)
# Date[1:5], format: "2010-12-31" "2011-01-02" "2011-01-03" "2011-01-04" "2011-01-05"

你可以创建一个safe.ifelse:

safe.ifelse <- function(cond, yes, no){ class.y <- class(yes)
                                  X <- ifelse(cond, yes, no)
                                  class(X) <- class.y; return(X)}

safe.ifelse(dates == '2011-01-01', dates - 1, dates)
# [1] "2010-12-31" "2011-01-02" "2011-01-03" "2011-01-04" "2011-01-05"

后面的注释:我看到Hadley在数据整形包的magrittr/dplyr/tidyr复合物中构建了一个if_else。


杜温的解释是正确的。在我意识到我可以简单地在ifelse语句之后强制类之前,我摆弄了一段时间:

dates <- as.Date(c('2011-01-01','2011-01-02','2011-01-03','2011-01-04','2011-01-05'))
dates <- ifelse(dates=='2011-01-01',dates-1,dates)
str(dates)
class(dates)<- "Date"
str(dates)

起初,我觉得这有点“粗鄙”。但现在我只是把它看作是为从ifelse()中获得的性能回报所付出的一个小代价。另外,它仍然比循环简洁得多。


建议的方法不适用于因子列。我想提出这样的改进建议:

safe.ifelse <- function(cond, yes, no) {
  class.y <- class(yes)
  if (class.y == "factor") {
    levels.y = levels(yes)
  }
  X <- ifelse(cond,yes,no)
  if (class.y == "factor") {
    X = as.factor(X)
    levels(X) = levels.y
  } else {
    class(X) <- class.y
  }
  return(X)
}

顺便说一句:ifelse糟透了……强大的能力带来巨大的责任,即1x1矩阵和/或数字的类型转换(例如当它们应该被添加时)对我来说是可以的,但ifelse中的这种类型转换显然是不需要的。我现在多次碰到同一个ifelse“bug”,它一直在偷我的时间:-(

FW


@fabian-werner提供的答案很好,但对象可以有多个类,“factor”可能不一定是类返回的第一个(yes),所以我建议进行这个小修改来检查所有的类属性:

safe.ifelse <- function(cond, yes, no) {
      class.y <- class(yes)
      if ("factor" %in% class.y) {  # Note the small condition change here
        levels.y = levels(yes)
      }
      X <- ifelse(cond,yes,no)
      if ("factor" %in% class.y) {  # Note the small condition change here
        X = as.factor(X)
        levels(X) = levels.y
      } else {
        class(X) <- class.y
      }
      return(X)
    }

I have also submitted a request with the R Development team to add a documented option to have base::ifelse() preserve attributes based on user selection of which attributes to preserve. The request is here: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16609 - It has already been flagged as "WONTFIX" on the grounds that it has always been the way it is now, but I have provided a follow-up argument on why a simple addition might save a lot of R users headaches. Perhaps your "+1" in that bug thread will encourage the R Core team to take a second look.

编辑:这是一个更好的版本,允许用户指定哪些属性要保留,“cond”(默认的ifelse()行为),“yes”,按照上面的代码的行为,或“no”,在“no”值的属性更好的情况下:

safe_ifelse <- function(cond, yes, no, preserved_attributes = "yes") {
    # Capture the user's choice for which attributes to preserve in return value
    preserved           <- switch(EXPR = preserved_attributes, "cond" = cond,
                                                               "yes"  = yes,
                                                               "no"   = no);
    # Preserve the desired values and check if object is a factor
    preserved_class     <- class(preserved);
    preserved_levels    <- levels(preserved);
    preserved_is_factor <- "factor" %in% preserved_class;

    # We have to use base::ifelse() for its vectorized properties
    # If we do our own if() {} else {}, then it will only work on first variable in a list
    return_obj <- ifelse(cond, yes, no);

    # If the object whose attributes we want to retain is a factor
    # Typecast the return object as.factor()
    # Set its levels()
    # Then check to see if it's also one or more classes in addition to "factor"
    # If so, set the classes, which will preserve "factor" too
    if (preserved_is_factor) {
        return_obj          <- as.factor(return_obj);
        levels(return_obj)  <- preserved_levels;
        if (length(preserved_class) > 1) {
          class(return_obj) <- preserved_class;
        }
    }
    # In all cases we want to preserve the class of the chosen object, so set it here
    else {
        class(return_obj)   <- preserved_class;
    }
    return(return_obj);

} # End safe_ifelse function

你可以使用数据。表::fifelse(数据。表>= 1.12.3)或dplyr::if_else。


data.table::fifelse

与ifelse不同,fifse保留输入的类型和类。

library(data.table)
dates <- fifelse(dates == '2011-01-01', dates - 1, dates)
str(dates)
# Date[1:5], format: "2010-12-31" "2011-01-02" "2011-01-03" "2011-01-04" "2011-01-05"

dplyr: if_else

来自dplyr 0.5.0发行说明:

[if_else]有更严格的语义ifelse(): true和false参数必须是同一类型。这给出了一个不那么令人惊讶的返回类型,并保留了像日期这样的S3向量”。

library(dplyr)
dates <- if_else(dates == '2011-01-01', dates - 1, dates)
str(dates)
# Date[1:5], format: "2010-12-31" "2011-01-02" "2011-01-03" "2011-01-04" "2011-01-05" 

这不能工作的原因是,ifelse()函数将值转换为因子。一个很好的解决方法是在计算之前将其转换为字符。

dates <- as.Date(c('2011-01-01','2011-01-02','2011-01-03','2011-01-04','2011-01-05'))
dates_new <- dates - 1
dates <- as.Date(ifelse(dates =='2011-01-01',as.character(dates_new),as.character(dates)))

这将不需要任何库,除了基底R。


为什么不在这里使用索引?

> dates <- as.Date(c('2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04', '2011-01-05'))
> dates[dates == '2011-01-01'] <- NA
> str(dates)
 Date[1:5], format: NA "2011-01-02" "2011-01-03" "2011-01-04" "2011-01-05"