Frequencies and Distributions
Whether a study involves 10 or 10,000 subjects, the researcher eventually ends up with a collection of numbers, often glorified by names such as “data set” or “database.” In the case of nominal or ordinal variables, the numbers are an indicator of the category to which each subject belongs (for example, the sex, the religion, or the diagnosis of each subject). For interval and ratio variables, the researchers will have the actual numerical value of the variable for each subject—the subject’s height, blood pressure, pulse rate, or number of cigarettes smoked. There is a subtle difference among these latter variables, by the way, even though all are ratio variables. Things like height and blood pressure are continuous variables; they can be measured to as many decimal places as the measuring instrument allows. By contrast, although the average American family has 2.1 children, no one has ever found a family with one-tenth of a child, making counts such as these discrete variables.
In either case, these numbers are distributed in some manner among the various categories or throughout the various possible values. If you plotted the numbers, you would end up with distributions similar to those shown in Fig.1098 and Fig.1100.
Figure 1098 – Figure 2-1: Distribution of ice cream preferences in 10,000 children.

Some figures may not display clearly when rendered as a PDF or printed.
Note that there are a couple of things that can be done to make these figures more understandable. If we divide the numbers in each category by the total number of people studied, we are then displaying the proportion of the total sample in each category, as shown on the right side of each graph. Some manipulations can be performed on these proportions. For example, to find the probability in Fig.1100 that one of our folks is 69 or 70 years old, we must add up the probabilities in these 2-year categories. We can also address questions such as “What is the probability that a senior citizen is more than 71 years of age?” by adding the categories above age 71 years.
The basic notion is that we can view the original distribution of numbers as an expression of the probability that any individual chosen at random from the original sample may fall in a particular category or within a range of categories. This transformation from an original frequency distribution to a distribution of probability is a recurrent and fundamental notion in statistics.
Although a frequency distribution is a convenient way to summarize data, it has certain disadvantages. It is difficult to compare two distributions derived from different samples because the information is buried in the number of responses in each category. It’s also tedious to draw graphs and a lot easier to get the computer to blurt out a series of numbers. As a result, some way to summarize the information is necessary. The conventional approach is to develop standard methods that describe where the center of the distribution lies and how wide the distribution is.
Figure 1100 – Figure 2-2: Age distribution of 10,000 entrants in senior citizen roller derby.

Some figures may not display clearly when rendered as a PDF or printed.
Content on this page was last changed on March 19, 2009.
© 2002 BC Decker Inc. Show Disclaimer
| 5343. | Norman GR, Streiner DL. PDQ Statistics . 3rd ed. Hamilton, Ontario: BC Decker Inc.; 2003. |
Next Page: Measures of The Middle: Means, Medians, and Modes »