A box plot or box-and-whisker diagram is a method for organizing numerical data along a single number line, which can be either horizontal or vertical. The actual box, when the plot is horizontal, sits slightly above the number line and is comprised of three vertical lines, connected together by horizontal lines. The horizontal boundaries of the box represent the first and third quartiles (25th and 75th percentiles), separated by the middle line, which is the data’s median or 50th percentile. On either side of the box plot from the middle of the horizontal lines, vertical lines, sometimes called whiskers, extend. When these reach minimum and maximum numbers of the data set, they end in smaller horizontal lines, though this may vary slightly depending on data spread.
There are some important elements that make up a good box plot, and some numbers that people need to know when they create these charts. The first of these is called the five number summary, often abbreviated as five num. sum. This is a listing of first and third quartiles, median, and minimum and maximum numbers of the data. In some applications, people will need to list these near the plot, though analysis of a plot with a good number line can also derive these numbers by looking at the three horizontal lines and the ending whiskers. It’s not a chicken/egg question for the person drawing a plot because the five num. sum. must be used to create the plot.
People also need to know a number called the interquartile range (IQR). Subtracting the first quartile from the third quartile derives the IQR, and using different software or scientific calculators can also get this number and the five number summary by inputting all data. The IQR is important because lines extending from the box usually only extend to 1.5 times the IQR. Data beyond that point is indicated by dots instead of a continuous line. These dots often suggest the data has outliers.
A variety of uses exist for the box plot. Several plots can be drawn above one number line, and could compare similar sets of data differentiated by some important factor. For example, scientists or statisticians might record heart rate of men and women, and then construct two stacked box plots to look for significant differences in range and quartiles.
Box plots don’t address data frequency. The lack of an additional scale (vertical or horizontal) omits information about repeating numbers, data set size, and most individual numbers. The person looking at a box plot will most understand the five number summary, range, and whether the data has any outliers. Box size, relation of median to quartiles, and length of whiskers can show whether data is skewed, but it can’t speak to things like mean, mode, or standard deviation. Other charts like histograms may be more useful when people want to represent things like frequency or derive better visuals about data distribution.