我需要在一个图表中绘制一个显示计数的柱状图和一个显示率的折线图,我可以分别做这两个,但当我把它们放在一起时,我的第一层(即geom_bar)的比例被第二层(即geom_line)重叠。

我可以将geom_line的轴向右移动吗?


当前回答

我们当然可以用R函数图来建立一个双y轴的图。

# pseudo dataset
df <- data.frame(x = seq(1, 1000, 1), y1 = sample.int(100, 1000, replace=T), y2 = sample(50, 1000, replace = T))

# plot first plot 
with(df, plot(y1 ~ x, col = "red"))

# set new plot
par(new = T) 

# plot second plot, but without axis
with(df, plot(y2 ~ x, type = "l", xaxt = "n", yaxt = "n", xlab = "", ylab = ""))

# define y-axis and put y-labs
axis(4)
with(df, mtext("y2", side = 4))

其他回答

It seemingly appears to be a simple question but it boggles around 2 fundamental questions. A) How to deal with a multi-scalar data while presenting in a comparative chart, and secondly, B) whether this can be done without some thumb rule practices of R programming such as i) melting data, ii) faceting, iii) adding another layer to existing one. The solution given below satisfies both the above conditions as it deals data without having to rescale it and secondly, the techniques mentioned are not used.

这是结果,

如果有兴趣了解更多关于此方法的信息,请点击下面的链接。 如何绘制一个2 y轴图表与条形并排而不重新缩放数据

我承认并同意哈德利(和其他人)的观点,即单独的y量表“存在根本缺陷”。说到这里,我经常希望ggplot2有这个特性——特别是当数据是宽格式的,并且我想快速地可视化或检查数据时(即仅供个人使用)。

虽然tidyverse库可以很容易地将数据转换为长格式(这样facet_grid()就可以工作),但这个过程仍然不是简单的,如下所示:

library(tidyverse)
df.wide %>%
    # Select only the columns you need for the plot.
    select(date, column1, column2, column3) %>%
    # Create an id column – needed in the `gather()` function.
    mutate(id = n()) %>%
    # The `gather()` function converts to long-format. 
    # In which the `type` column will contain three factors (column1, column2, column3),
    # and the `value` column will contain the respective values.
    # All the while we retain the `id` and `date` columns.
    gather(type, value, -id, -date) %>%
    # Create the plot according to your specifications
    ggplot(aes(x = date, y = value)) +
        geom_line() +
        # Create a panel for each `type` (ie. column1, column2, column3).
        # If the types have different scales, you can use the `scales="free"` option.
        facet_grid(type~., scales = "free")

有时客户想要两个y刻度。给他们“有缺陷”的演讲通常是毫无意义的。但是我喜欢ggplot2坚持以正确的方式做事。我确信ggplot实际上是在向普通用户传授正确的可视化技术。

也许你可以使用面形和无比例来比较两个数据序列?看这里:https://github.com/hadley/ggplot2/wiki/Align-two-plots-on-a-page

Hadley的回答参考了Stephen Few的报告《双缩放轴在图中是最好的解决方案吗?》

我不知道OP中的“counts”和“rate”是什么意思,但快速搜索会给我counts和Rates,所以我得到了一些关于北美登山事故的数据:

Years<-c("1998","1999","2000","2001","2002","2003","2004")
Persons.Involved<-c(281,248,301,276,295,231,311)
Fatalities<-c(20,17,24,16,34,18,35)
rate=100*Fatalities/Persons.Involved
df<-data.frame(Years=Years,Persons.Involved=Persons.Involved,Fatalities=Fatalities,rate=rate)
print(df,row.names = FALSE)

 Years Persons.Involved Fatalities      rate
  1998              281         20  7.117438
  1999              248         17  6.854839
  2000              301         24  7.973422
  2001              276         16  5.797101
  2002              295         34 11.525424
  2003              231         18  7.792208
  2004              311         35 11.254019

然后,我尝试按照Few在上述报告第7页建议的那样绘制图表(并按照OP的要求将计数绘制为柱状图,将率绘制为折线图):

The other less obvious solution, which works only for time series, is to convert all sets of values to a common quantitative scale by displaying percentage differences between each value and a reference (or index) value. For instance, select a particular point in time, such as the first interval that appears in the graph, and express each subsequent value as the percentage difference between it and the initial value. This is done by dividing the value at each point in time by the value for the initial point in time and then multiplying it by 100 to convert the rate to a percentage, as illustrated below.

df2<-df
df2$Persons.Involved <- 100*df$Persons.Involved/df$Persons.Involved[1]
df2$rate <- 100*df$rate/df$rate[1]
plot(ggplot(df2)+
  geom_bar(aes(x=Years,weight=Persons.Involved))+
  geom_line(aes(x=Years,y=rate,group=1))+
  theme(text = element_text(size=30))
  )

这就是结果:

但我不是很喜欢它,我不能轻易地给它加上一个传奇……

1 威廉森,杰德,等人。2005年北美登山事故。The Mountaineers Books, 2005。

这是我对如何做二次轴变换的两种看法。首先,您希望将主数据和辅助数据的范围耦合起来。这通常是混乱的,因为您不想要的变量污染了全局环境。

为了简化这一点,我们将创建一个生成两个函数的函数工厂,其中scales::rescale()完成所有繁重的工作。因为这些是闭包,所以它们知道创建它们的环境,所以它们“有”创建之前生成的to和from参数的“内存”。

一个函数进行正向转换:将辅助数据转换为主要尺度。 第二个函数进行反向转换:将主要单位中的数据转换为次要单位。

library(ggplot2)
library(scales)

# Function factory for secondary axis transforms
train_sec <- function(primary, secondary, na.rm = TRUE) {
  # Thanks Henry Holm for including the na.rm argument!
  from <- range(secondary, na.rm = na.rm)
  to   <- range(primary, na.rm = na.rm)
  # Forward transform for the data
  forward <- function(x) {
    rescale(x, from = from, to = to)
  }
  # Reverse transform for the secondary axis
  reverse <- function(x) {
    rescale(x, from = to, to = from)
  }
  list(fwd = forward, rev = reverse)
}

这看起来相当复杂,但是创建函数工厂会使其余的一切变得更简单。现在,在绘制图形之前,我们将通过向工厂显示主要和次要数据来生成相关函数。我们将使用经济学数据集,它的失业列和pasavert列的范围非常不同。

sec <- with(economics, train_sec(unemploy, psavert))

然后我们使用y = sec$fwd(psavert)将辅助数据重新缩放到主轴,并指定~ sec$rev(.)作为辅助轴的转换参数。这给了我们一个主要范围和次要范围在图上占据相同空间的图。

ggplot(economics, aes(date)) +
  geom_line(aes(y = unemploy), colour = "blue") +
  geom_line(aes(y = sec$fwd(psavert)), colour = "red") +
  scale_y_continuous(sec.axis = sec_axis(~sec$rev(.), name = "psavert"))

工厂比这稍微灵活一些,因为如果您只是想重新调整最大值,您可以传入下限为0的数据。

# Rescaling the maximum
sec <- with(economics, train_sec(c(0, max(unemploy)),
                                 c(0, max(psavert))))

ggplot(economics, aes(date)) +
  geom_line(aes(y = unemploy), colour = "blue") +
  geom_line(aes(y = sec$fwd(psavert)), colour = "red") +
  scale_y_continuous(sec.axis = sec_axis(~sec$rev(.), name = "psavert"))

由reprex包于2021-02-05创建(v0.3.0)

我承认这个例子中的区别不是很明显,但如果你仔细观察,你会发现最大值是相同的,红线比蓝色的线低。

编辑:

这种方法现在已经在ggh4x包中的help_secondary()函数中被捕获和扩展。声明:我是ggh4x的作者。