在R中,mean()和median()是标准函数,它们执行您所期望的功能。Mode()告诉您对象的内部存储模式,而不是参数中出现次数最多的值。但是是否存在一个标准库函数来实现向量(或列表)的统计模式?
当前回答
基于@Chris的函数来计算模态或相关指标,但是使用Ken Williams的方法来计算频率。这个方法修复了根本没有模式(所有元素频率相等)的情况,并提供了一些更易读的方法名。
Mode <- function(x, method = "one", na.rm = FALSE) {
x <- unlist(x)
if (na.rm) {
x <- x[!is.na(x)]
}
# Get unique values
ux <- unique(x)
n <- length(ux)
# Get frequencies of all unique values
frequencies <- tabulate(match(x, ux))
modes <- frequencies == max(frequencies)
# Determine number of modes
nmodes <- sum(modes)
nmodes <- ifelse(nmodes==n, 0L, nmodes)
if (method %in% c("one", "mode", "") | is.na(method)) {
# Return NA if not exactly one mode, else return the mode
if (nmodes != 1) {
return(NA)
} else {
return(ux[which(modes)])
}
} else if (method %in% c("n", "nmodes")) {
# Return the number of modes
return(nmodes)
} else if (method %in% c("all", "modes")) {
# Return NA if no modes exist, else return all modes
if (nmodes > 0) {
return(ux[which(modes)])
} else {
return(NA)
}
}
warning("Warning: method not recognised. Valid methods are 'one'/'mode' [default], 'n'/'nmodes' and 'all'/'modes'")
}
由于它使用Ken的方法来计算频率,性能也得到了优化,使用AkselA的帖子,我对之前的一些答案进行了基准测试,以显示我的函数在性能上是如何接近Ken的,各种输出选项的条件只导致很小的开销:
其他回答
这个黑客应该工作良好。给你的值以及模式的计数:
Mode <- function(x){
a = table(x) # x is a vector
return(a[which.max(a)])
}
这里有另一个解决方案:
freq <- tapply(mySamples,mySamples,length)
#or freq <- table(mySamples)
as.numeric(names(freq)[which.max(freq)])
下面的函数有三种形式:
method = "mode"[默认值]:计算单模态向量的模式,否则返回NA Method = "nmodes":计算vector中模式的个数 Method = "modes":列出单模态或多模态向量的所有模态
modeav <- function (x, method = "mode", na.rm = FALSE)
{
x <- unlist(x)
if (na.rm)
x <- x[!is.na(x)]
u <- unique(x)
n <- length(u)
#get frequencies of each of the unique values in the vector
frequencies <- rep(0, n)
for (i in seq_len(n)) {
if (is.na(u[i])) {
frequencies[i] <- sum(is.na(x))
}
else {
frequencies[i] <- sum(x == u[i], na.rm = TRUE)
}
}
#mode if a unimodal vector, else NA
if (method == "mode" | is.na(method) | method == "")
{return(ifelse(length(frequencies[frequencies==max(frequencies)])>1,NA,u[which.max(frequencies)]))}
#number of modes
if(method == "nmode" | method == "nmodes")
{return(length(frequencies[frequencies==max(frequencies)]))}
#list of all modes
if (method == "modes" | method == "modevalues")
{return(u[which(frequencies==max(frequencies), arr.ind = FALSE, useNames = FALSE)])}
#error trap the method
warning("Warning: method not recognised. Valid methods are 'mode' [default], 'nmodes' and 'modes'")
return()
}
有一个包谦和提供单变量单模态(有时是多模态)数据的模态估计和通常概率分布的模态值。
mySamples <- c(19, 4, 5, 7, 29, 19, 29, 13, 25, 19)
library(modeest)
mlv(mySamples, method = "mfv")
Mode (most likely value): 19
Bickel's modal skewness: -0.1
Call: mlv.default(x = mySamples, method = "mfv")
欲了解更多信息,请参阅本页
你也可以在CRAN任务视图:概率分布中寻找“模式估计”。已经提出了两个新的一揽子计划。
基于@Chris的函数来计算模态或相关指标,但是使用Ken Williams的方法来计算频率。这个方法修复了根本没有模式(所有元素频率相等)的情况,并提供了一些更易读的方法名。
Mode <- function(x, method = "one", na.rm = FALSE) {
x <- unlist(x)
if (na.rm) {
x <- x[!is.na(x)]
}
# Get unique values
ux <- unique(x)
n <- length(ux)
# Get frequencies of all unique values
frequencies <- tabulate(match(x, ux))
modes <- frequencies == max(frequencies)
# Determine number of modes
nmodes <- sum(modes)
nmodes <- ifelse(nmodes==n, 0L, nmodes)
if (method %in% c("one", "mode", "") | is.na(method)) {
# Return NA if not exactly one mode, else return the mode
if (nmodes != 1) {
return(NA)
} else {
return(ux[which(modes)])
}
} else if (method %in% c("n", "nmodes")) {
# Return the number of modes
return(nmodes)
} else if (method %in% c("all", "modes")) {
# Return NA if no modes exist, else return all modes
if (nmodes > 0) {
return(ux[which(modes)])
} else {
return(NA)
}
}
warning("Warning: method not recognised. Valid methods are 'one'/'mode' [default], 'n'/'nmodes' and 'all'/'modes'")
}
由于它使用Ken的方法来计算频率,性能也得到了优化,使用AkselA的帖子,我对之前的一些答案进行了基准测试,以显示我的函数在性能上是如何接近Ken的,各种输出选项的条件只导致很小的开销: