Histogram distribution in statistics refers to the patterns, shapes and locations of univariate data bars on a histogram. How and where the bars are distributed can be used to analyze and draw conclusions about the data. Histogram distribution analysis is important in identifying traits such as data normality, multimodal distributions and skewed data.
A histogram is a univariate data display that uses rectangles proportional in area to class or bin frequencies to visually show features of data. The data points in the histogram are organized into bins and the histogram distribution itself is a visual approximation of the data's frequency distribution or probability density function. The shape of the distribution can change based on the number of bins.
Histogram distribution analysis is often used as a qualitative check for data normality. Although analytical methods for determining normality exist, histograms can be used to provide a quick, common sense check to save time. If the histogram data appears roughly even and centered on the mean, the data are assumed to be normal. Although fast and relatively easy, this kind of qualitative check is subjective and analytical methods should be used if a higher standard of accuracy is required.
Determining whether a data set exhibits skewness is another way histogram distribution analysis can be used. Data skewness is defined as pronounced asymmetry in the data. Negative skew, or skewing to the left, is seen in data sets with very few low values. Positive skew, or skewing to the right, occurs in data sets with few high values. Observing the histogram distribution can reveal outliers and skewed data.
In addition to revealing the characteristics of data with a single mode, the shape of a histogram also can reveal characteristics of multimodal data. Multimodal data sets contain more than one mode and are characterized by frequency distributions that have more than one peak or maxima. Political affiliations in a town, approval opinion polls, and body sizes of bees are examples of data sets that may be multimodal. Observing the shape of the histogram and noting the various peaks in multimodal data can often provide a researcher with more insight than simple univariate statistical calculations would.
The analysis of histograms and the distribution of data are highly dependent on the chosen bin sizes. In practice, the number of bins can be estimated by taking the square root of the number of observations, although other bin sizes may be used. For example, a teacher may choose to analyze test grades by choosing bin sizes that reflect letter grades.