测试字符是否在字符串中

我试图确定一个字符串是否是另一个字符串的子集。例如:

chars <- "test"
value <- "es"

如果“value”作为字符串“chars”的一部分出现，我想返回TRUE。在下面的场景中，我想返回false:

chars <- "test"
value <- "et"

当前回答

使用grep或grepl，但要注意是否要使用正则表达式。

默认情况下，grep和相关的匹配采用正则表达式，而不是文字子字符串。如果你没有预料到，并且你试图匹配一个无效的正则表达式，它就不起作用:

> grep("[", "abc[")
Error in grep("[", "abc[") : 
  invalid regular expression '[', reason 'Missing ']''

要做一个真子字符串测试，使用fixed = true。

> grep("[", "abc[", fixed = TRUE)
[1] 1

如果你确实想要正则表达式，很好，但这并不是OP所要求的。

2016-04-13 21:59:25

其他回答

使用stringi package中的这个函数:

> stri_detect_fixed("test",c("et","es"))
[1] FALSE  TRUE

一些基准:

library(stringi)
set.seed(123L)
value <- stri_rand_strings(10000, ceiling(runif(10000, 1, 100))) # 10000 random ASCII strings
head(value)

chars <- "es"
library(microbenchmark)
microbenchmark(
   grepl(chars, value),
   grepl(chars, value, fixed=TRUE),
   grepl(chars, value, perl=TRUE),
   stri_detect_fixed(value, chars),
   stri_detect_regex(value, chars)
)
## Unit: milliseconds
##                               expr       min        lq    median        uq       max neval
##                grepl(chars, value) 13.682876 13.943184 14.057991 14.295423 15.443530   100
##  grepl(chars, value, fixed = TRUE)  5.071617  5.110779  5.281498  5.523421 45.243791   100
##   grepl(chars, value, perl = TRUE)  1.835558  1.873280  1.956974  2.259203  3.506741   100
##    stri_detect_fixed(value, chars)  1.191403  1.233287  1.309720  1.510677  2.821284   100
##    stri_detect_regex(value, chars)  6.043537  6.154198  6.273506  6.447714  7.884380   100

2014-03-14 09:46:12

你想要grepl:

> chars <- "test"
> value <- "es"
> grepl(value, chars)
[1] TRUE
> chars <- "test"
> value <- "et"
> grepl(value, chars)
[1] FALSE

2012-04-12 17:28:40

这里有类似的问题:给定一个字符串和一个关键字列表，检测字符串中包含哪些关键字(如果有的话)。

这个线程的建议是stringr的str_detect和grepl。下面是微基准测试包中的基准测试:

使用

map_keywords = c("once", "twice", "few")
t = "yes but only a few times"

mapper1 <- function (x) {
  r = str_detect(x, map_keywords)
}

mapper2 <- function (x) {
  r = sapply(map_keywords, function (k) grepl(k, x, fixed = T))
}

然后

microbenchmark(mapper1(t), mapper2(t), times = 5000)

我们发现

Unit: microseconds
       expr    min     lq     mean  median      uq      max neval
 mapper1(t) 26.401 27.988 31.32951 28.8430 29.5225 2091.476  5000
 mapper2(t) 19.289 20.767 24.94484 23.7725 24.6220 1011.837  5000

可以看到，使用str_detect和grepl对关键字的实际字符串和向量进行了超过5000次的关键字搜索，grepl的性能要比str_detect好得多。

结果是布尔向量r，它标识字符串中包含哪些关键字(如果有)。

因此，我建议使用grepl来确定字符串中是否有关键字。

2020-06-07 16:34:43

使用grepl函数

grepl( needle, haystack, fixed = TRUE)

像这样:

grepl(value, chars, fixed = TRUE)
# TRUE

使用?grepl获取更多信息。

2012-04-12 17:28:43

同样，可以使用"stringr"库:

> library(stringr)
> chars <- "test"
> value <- "es"
> str_detect(chars, value)
[1] TRUE

### For multiple value case:
> value <- c("es", "l", "est", "a", "test")
> str_detect(chars, value)
[1]  TRUE FALSE  TRUE FALSE  TRUE

2017-06-14 02:38:39

测试字符是否在字符串中

推荐文章

最新文章

标签