Friday, July 18, 2014

R ggplot coord_cartesian(ylim()) vs ylim()

R ggplot coord_cartesian(ylim()) vs ylim()

Former is limited to plot display only, while the latter is on the analysis result.

Let's use boxplot as an example.

require("ggplot2")
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.0.3
df = data.frame(y = c(0, 50, 100, 501, 600))
median(df$y)
## [1] 100
p <- ggplot(df, aes(y = y)) + geom_boxplot(aes(x = factor(1)))

coord_cartesian(ylim) will

  • analyze the entire dataset to produce the plot
  • then, limit the plot to the range specified
p + coord_cartesian(ylim = c(0, 200))

plot of chunk unnamed-chunk-2

ylim alone will

  • limit the data set within the range
  • then, analyze the leftover to produce the plot
p + ylim(c(0, 200))
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).

plot of chunk unnamed-chunk-3

as shown above,

  • the warning tells you that 2 rows were left out, and
  • the median with ylim alone is 50, rather 100.

No comments:

Post a Comment