我需要在一个图表中绘制一个显示计数的柱状图和一个显示率的折线图,我可以分别做这两个,但当我把它们放在一起时,我的第一层(即geom_bar)的比例被第二层(即geom_line)重叠。
我可以将geom_line的轴向右移动吗?
我需要在一个图表中绘制一个显示计数的柱状图和一个显示率的折线图,我可以分别做这两个,但当我把它们放在一起时,我的第一层(即geom_bar)的比例被第二层(即geom_line)重叠。
我可以将geom_line的轴向右移动吗?
当前回答
常见的用例有双y轴,例如,显示每月温度和降水的气体图。这里是一个简单的解决方案,从威震天的解决方案中推广,允许你设置变量的下限为零:
示例数据:
climate <- tibble(
Month = 1:12,
Temp = c(-4,-4,0,5,11,15,16,15,11,6,1,-3),
Precip = c(49,36,47,41,53,65,81,89,90,84,73,55)
)
将以下两个值设置为接近数据限制的值(您可以使用这些值来调整图形的位置;坐标轴仍然是正确的):
ylim.prim <- c(0, 180) # in this example, precipitation
ylim.sec <- c(-4, 18) # in this example, temperature
下面根据这些极限进行必要的计算,并制作出图本身:
b <- diff(ylim.prim)/diff(ylim.sec)
a <- ylim.prim[1] - b*ylim.sec[1]) # there was a bug here
ggplot(climate, aes(Month, Precip)) +
geom_col() +
geom_line(aes(y = a + Temp*b), color = "red") +
scale_y_continuous("Precipitation", sec.axis = sec_axis(~ (. - a)/b, name = "Temperature")) +
scale_x_continuous("Month", breaks = 1:12) +
ggtitle("Climatogram for Oslo (1961-1990)")
如果你想确保红线对应右边的y轴,你可以在代码中添加一个主题句:
ggplot(climate, aes(Month, Precip)) +
geom_col() +
geom_line(aes(y = a + Temp*b), color = "red") +
scale_y_continuous("Precipitation", sec.axis = sec_axis(~ (. - a)/b, name = "Temperature")) +
scale_x_continuous("Month", breaks = 1:12) +
theme(axis.line.y.right = element_line(color = "red"),
axis.ticks.y.right = element_line(color = "red"),
axis.text.y.right = element_text(color = "red"),
axis.title.y.right = element_text(color = "red")
) +
ggtitle("Climatogram for Oslo (1961-1990)")
右轴的颜色:
其他回答
Hadley的回答参考了Stephen Few的报告《双缩放轴在图中是最好的解决方案吗?》
我不知道OP中的“counts”和“rate”是什么意思,但快速搜索会给我counts和Rates,所以我得到了一些关于北美登山事故的数据:
Years<-c("1998","1999","2000","2001","2002","2003","2004")
Persons.Involved<-c(281,248,301,276,295,231,311)
Fatalities<-c(20,17,24,16,34,18,35)
rate=100*Fatalities/Persons.Involved
df<-data.frame(Years=Years,Persons.Involved=Persons.Involved,Fatalities=Fatalities,rate=rate)
print(df,row.names = FALSE)
Years Persons.Involved Fatalities rate
1998 281 20 7.117438
1999 248 17 6.854839
2000 301 24 7.973422
2001 276 16 5.797101
2002 295 34 11.525424
2003 231 18 7.792208
2004 311 35 11.254019
然后,我尝试按照Few在上述报告第7页建议的那样绘制图表(并按照OP的要求将计数绘制为柱状图,将率绘制为折线图):
The other less obvious solution, which works only for time series, is to convert all sets of values to a common quantitative scale by displaying percentage differences between each value and a reference (or index) value. For instance, select a particular point in time, such as the first interval that appears in the graph, and express each subsequent value as the percentage difference between it and the initial value. This is done by dividing the value at each point in time by the value for the initial point in time and then multiplying it by 100 to convert the rate to a percentage, as illustrated below.
df2<-df
df2$Persons.Involved <- 100*df$Persons.Involved/df$Persons.Involved[1]
df2$rate <- 100*df$rate/df$rate[1]
plot(ggplot(df2)+
geom_bar(aes(x=Years,weight=Persons.Involved))+
geom_line(aes(x=Years,y=rate,group=1))+
theme(text = element_text(size=30))
)
这就是结果:
但我不是很喜欢它,我不能轻易地给它加上一个传奇……
1 威廉森,杰德,等人。2005年北美登山事故。The Mountaineers Books, 2005。
这在ggplot2中是不可能的,因为我认为具有单独y尺度的图(不是相互转换的y尺度)从根本上是有缺陷的。一些问题:
The are not invertible: given a point on the plot space, you can not uniquely map it back to a point in the data space. They are relatively hard to read correctly compared to other options. See A Study on Dual-Scale Data Charts by Petra Isenberg, Anastasia Bezerianos, Pierre Dragicevic, and Jean-Daniel Fekete for details. They are easily manipulated to mislead: there is no unique way to specify the relative scales of the axes, leaving them open to manipulation. Two examples from the Junkcharts blog: one, two They are arbitrary: why have only 2 scales, not 3, 4 or ten?
你也可能想要阅读Stephen Few关于双缩放轴在图形中的主题的冗长讨论,它们是最好的解决方案吗?
您可以创建一个缩放因子,应用于第二个geom和右y轴。这是从塞巴斯蒂安的解推导出来的。
library(ggplot2)
scaleFactor <- max(mtcars$cyl) / max(mtcars$hp)
ggplot(mtcars, aes(x=disp)) +
geom_smooth(aes(y=cyl), method="loess", col="blue") +
geom_smooth(aes(y=hp * scaleFactor), method="loess", col="red") +
scale_y_continuous(name="cyl", sec.axis=sec_axis(~./scaleFactor, name="hp")) +
theme(
axis.title.y.left=element_text(color="blue"),
axis.text.y.left=element_text(color="blue"),
axis.title.y.right=element_text(color="red"),
axis.text.y.right=element_text(color="red")
)
注意:使用ggplot2 v3.0.0
有时客户想要两个y刻度。给他们“有缺陷”的演讲通常是毫无意义的。但是我喜欢ggplot2坚持以正确的方式做事。我确信ggplot实际上是在向普通用户传授正确的可视化技术。
也许你可以使用面形和无比例来比较两个数据序列?看这里:https://github.com/hadley/ggplot2/wiki/Align-two-plots-on-a-page
以下内容结合了Dag Hjermann的基本数据和编程,改进了user4786271创建“转换函数”的策略,以优化组合图和数据轴,并响应了浸信会的提示,这样的函数可以在R中创建。
#Climatogram for Oslo (1961-1990)
climate <- tibble(
Month = 1:12,
Temp = c(-4,-4,0,5,11,15,16,15,11,6,1,-3),
Precip = c(49,36,47,41,53,65,81,89,90,84,73,55))
#y1 identifies the position, relative to the y1 axis,
#the locations of the minimum and maximum of the y2 graph.
#Usually this will be the min and max of y1.
#y1<-(c(max(climate$Precip), 0))
#y1<-(c(150, 55))
y1<-(c(max(climate$Precip), min(climate$Precip)))
#y2 is the Minimum and maximum of the secondary axis data.
y2<-(c(max(climate$Temp), min(climate$Temp)))
#axis combines y1 and y2 into a dataframe used for regressions.
axis<-cbind(y1,y2)
axis<-data.frame(axis)
#Regression of Temperature to Precipitation:
T2P<-lm(formula = y1 ~ y2, data = axis)
T2P_summary <- summary(lm(formula = y1 ~ y2, data = axis))
T2P_summary
#Identifies the intercept and slope of regressing Temperature to Precipitation:
T2PInt<-T2P_summary$coefficients[1, 1]
T2PSlope<-T2P_summary$coefficients[2, 1]
#Regression of Precipitation to Temperature:
P2T<-lm(formula = y2 ~ y1, data = axis)
P2T_summary <- summary(lm(formula = y2 ~ y1, data = axis))
P2T_summary
#Identifies the intercept and slope of regressing Precipitation to Temperature:
P2TInt<-P2T_summary$coefficients[1, 1]
P2TSlope<-P2T_summary$coefficients[2, 1]
#Create Plot:
ggplot(climate, aes(Month, Precip)) +
geom_col() +
geom_line(aes(y = T2PSlope*Temp + T2PInt), color = "red") +
scale_y_continuous("Precipitation", sec.axis = sec_axis(~.*P2TSlope + P2TInt, name = "Temperature")) +
scale_x_continuous("Month", breaks = 1:12) +
theme(axis.line.y.right = element_line(color = "red"),
axis.ticks.y.right = element_line(color = "red"),
axis.text.y.right = element_text(color = "red"),
axis.title.y.right = element_text(color = "red")) +
ggtitle("Climatogram for Oslo (1961-1990)")
Most noteworthy is that a new "transformation function" works better with just two data points from the data set of each axes—usually the maximum and minimum values of each set. The resulting slopes and intercepts of the two regressions enable ggplot2 to exactly pair the plots of the minimums and maximums of each axis. As user4786271 pointed out, the two regressions transform each data set and plot to the other. One transforms the break points of the first y axis to the values of the second y axis. The second transforms the data of the secondary y axis to be "normalized" according to the first y axis. The following output shows how the axis align the minimums and maximums of each dataset:
使最大值和最小值匹配可能是最合适的;但是,这种方法的另一个好处是,如果需要,可以通过更改与主轴数据相关的编程行轻松地移动与次要轴相关的绘图。下面的输出只是将y1编程行中输入的最小降水量更改为“0”,从而将最小温度水平与“0”降水水平对齐。
从:y1<-(c(max(气候$ precp), min(气候$ precp)))
到:y1<-(c(max(气候$ precp), 0))
请注意,生成的新回归和ggplot2如何自动调整绘图和轴,以正确地将最低温度与“0”降水水平的新“基数”对齐。同样,可以很容易地提升Temperature图,使其更加明显。下面的图是通过简单地将上面提到的线更改为:
“日元<——(c(150年,55岁))”
上面的线表示温度曲线的最大值与“150”降水水平相吻合,温度曲线的最小值与“55”降水水平相吻合。再次注意,ggplot2和由此产生的新的回归输出如何使图保持与轴的正确对齐。
以上可能不是理想的输出;然而,这是一个例子,说明了如何容易地操纵图形,并且在图和轴之间仍然有正确的关系。 Dag Hjermann的主题的结合提高了与情节对应的轴的识别。