There are many ways to display data. The fundamental idea is that the graphical depiction of data should communicate the truth the data has to offer about the situation of interest.

### Histograms

1 Quantitative Variable

#### Overview

Great for showing the distribution of data for a single quantitative variable when the sample size is large. Dotplots are a good alternative for smaller sample sizes. Gives a good feel for the mean and standard deviation of the data.

#### Explanation

Histograms group data that are close to each other into “bins” (the vertical bars in the plot). The height of a bin is determined by the number of data points that are contained within the bin. For example, if we group together all the sections of the book of scripture known as the Doctrine and Covenants that occurred in a given year (Jan. 1st - Dec. 31st) then we get the following counts.

Year Number of Sections
1823 1
1824 0
1825 0
1826 0
1827 0
1828 1
1829 16
1830 19
1831 37
1832 16
1833 12
1834 5
1835 3
1836 4
1837 1
1838 8
1839 3
1840 0
1841 3
1842 2
1843 4
1844 1
1845 0
1846 0
1847 1

*Note that Section 138 occurred in 1918 and is removed from this example.

In this example, each “bin” spans 365 days (Jan. 1 - Dec. 31 of each year). Since “dates” can be used as quantitative data, it makes sense to make a histogram of these data. (Remember, histograms are only for quantitative data.)

Notice in the bins above that the left edge of the bin is on the year the data corresponds with. The right edge of the bin lands on the following year. For example, the first bin has left edge on 1823 and right edge on 1824. Since there was one revelation in 1823, this bin has a height of 1. The bin that has 1831 on the left and 1832 on the right shows that 37 revelations occurred in 1831. It is powerful to notice the amount of revelations occurring around 1830, the year the Church of Jesus Christ of Latter-day Saints was organized.

### Boxplots

1 Quantitative Variable | 2+ Groups

#### Overview

Graphical depiction of the five-number summary. Great for comparing the distributions of data across several groups or categories. Provides a quick visual understanding of the location of the median as well as the range of the data. Can be useful in showing outliers. Sample size should be larger than at least five, or computing the five-number summary is not very meaningful. Side-by-side dotplots are a good alternative for smaller sample sizes.

#### Explanation

Understanding how a boxplot is created is the best way to understand what the boxplot shows.

##### How Boxplots are Made
1. The five-number summary is computed.
2. A box is drawn with one edge located at the first quartile and the opposite edge located at the third quartile.
3. This box is then divided into two boxes by placing another line inside the box at the location of the median.
4. The maximum value and minimum value are marked on the plot.
5. Whiskers are drawn from the first quartile out towards the minimum and from the third quartile out towards the maximum.
6. If the minimum or maximum is too far away, then the whisker is ended early.
7. Any points beyond the line ending the whisker are marked on the plot as dots. This helps identify possible outliers in the data.

### Custom Plots

Creativity Required

#### Overview

Sometimes no standard plot sufficiently describes the data. In these cases, the only guideline is the one stated originally, “the graphical depiction of data should communicate the truth the data has to offer about the situation of interest.”

#### R Examples

You should add links to examples you find of interesting plots made in R.

Here is the R Code for the graphic to the left:

plot(density(CO2$uptake[CO2$Type=="Quebec"]),
main="", col='skyblue4',
xlab="", ylab="", xaxt='n', yaxt='n')
lines(density(CO2$uptake[CO2$Type=="Mississippi"]),
col='firebrick')