Fall 2016

Why visualize data?

Why visualize data?

R base graphics

Don't use barplots

R base graphics

  • stats::heatmap() - basic heatmap

Alternatives:

  • gplots::heatmap.2() - an extension of heatmap
  • heatmap3::heatmap3() - another extension of heatmap
  • ComplexHeatmap::Heatmap() - highly customizable, interactive heatmap

Other options:

  • pheatmap::pheatmap() - grid-based heatmap
  • NMF::aheatmap() - another grid-based heatmap

More heatmaps

Other useful plots

Special plots

Saving plots

  • Save to PDF
pdf("filename.pdf", width = 7, height = 5)
plot(1:10, 1:10)
dev.off()
  • Other formats: bmp(), jpg(), pdf(), png(), or tiff()

  • Learn more ?Devices

R base graphic cheat-sheet

Data manipulation

dplyr: data manipulation with R

dplyr: data manipulation with R

Dplyr: A grammar of data manipulation

The pipe %>% operator

  • Pipe output of one command into an input of another command - chain commands together
  • Think about the "|" operator in Linux
  • Read as "then". Take the dataset, then do …
library(dplyr)
library(ggplot2)
data(diamonds)
head(diamonds)
diamonds %>% head
summary(diamonds$price)
diamonds$price %>% summary(object = .)

dplyr::filter()

  • Filter (select) rows based on the condition of a column
diamonds %>% head
df.diamonds_ideal <- filter(diamonds, cut == "Ideal")
df.diamonds_ideal <- diamonds %>% filter(cut == "Ideal")

dplyr::select()

  • Select columns from the dataset by names
df.diamonds_ideal %>% head
select(df.diamonds_ideal, carat, cut, color, price, clarity)
df.diamonds_ideal <- df.diamonds_ideal %>% select(., carat, cut, color, price, clarity)

dplyr::mutate()

  • Add columns to your dataset
df.diamonds_ideal %>% head
mutate(df.diamonds_ideal, price_per_carat = price/carat)
df.diamonds_ideal <- df.diamonds_ideal %>% mutate(price_per_carat = price/carat)

dplyr::arrange()

  • Sort your data by columns
df.diamonds_ideal %>% head
arrange(df.diamonds_ideal, price)
df.diamonds_ideal %>% arrange(price, price_per_carat)

dplyr::summarize()

  • Summarize columns by custom summary statistics
summarize(df.diamonds_ideal, length = n(), avg_price = mean(price))
df.diamonds_ideal %>% summarize(length = n(), avg_price = mean(price))

dplyr::group_by()

  • Summarize subsets of columns by custom summary statistics
group_by(diamonds, cut) %>% summarize(mean(price))
group_by(diamonds, cut, color) %>% summarize(mean(price))

The power of pipe %>%

  • Summarize subsets of columns by custom summary statistics
arrange(mutate(arrange(filter(tbl_df(diamonds), cut == "Ideal"), price), price_per_carat = price/carat), price_per_carat)
arrange(
  mutate(
    arrange(
      filter(tbl_df(diamonds), cut == "Ideal"), 
    price), 
  price_per_carat = price/carat), 
price_per_carat)
diamonds %>% filter(cut == "Ideal") %>% arrange(price) %>% mutate(price_per_carat = price/carat) %>% arrange(price_per_carat)

ggplot2 - the grammar of graphics

ggplot2 package

The basics of ggplot2 graphics

  • Data mapped to graphical elements
  • Add graphical layers and transformations
  • Commands are chained with "+" sign
Object Description
Data The raw data that you want to plot
Aethetics aes() How to map your data on x, y axis, color, size, shape (aesthetics)
Geometries geom_ The geometric shapes that will represent the data

data +

aesthetic mappings of data to plot coordinates +

geometry to represent the data

Examples of ggplot2 graphics

diamonds %>% filter(cut == "Good", color == "E") %>% 
  ggplot(aes(x = price, y = carat)) +
  geom_point()  # aes(size = price) +

Try other geoms

  geom_smooth() # method = lm
  geom_line()
  geom_boxplot()
  geom_bar(stat="identity")
  geom_histogram()

Fine tuning ggplot2 graphics

Parameter Description
Facets facet_ Split one plot into multiple plots based on a grouping variable
Scales scale_ Maps between the data ranges and the dimensions of the plot
Visual Themes theme The overall visual defaults of a plot: background, grids, axe, default typeface, sizes, colors, etc.
Statistical transformations stat_ Statistical summaries of the data that can be plotted, such as quantiles, fitted curves (loess, linear models, etc.), sums etc.
Coordinate systems coord_ Expressing coordinates in a system other than Cartesian

Putting it all together

diamonds %>%                 # Start with the 'diamonds' dataset
  filter(cut == "Ideal") %>% # Then, filter rows where cut == Ideal
  ggplot(aes(price)) +       # Then, plot using ggplot
  geom_histogram() +         # and plot histograms
  facet_wrap(~ color) +      # in a 'small multiple' plot, broken out by 'color' 
  ggtitle("Diamond price distribution per color") +
  labs(x="Price", y="Count") +
  theme(panel.background = element_rect(fill="lightblue")) +
  theme(plot.title = element_text(family="Trebuchet MS", size=28, face="bold", hjust=0, color="#777777")) +
  theme(axis.title.y = element_text(angle=0)) +
  theme(panel.grid.minor = element_blank())

Other resources