Everything is data. Every entity, living, non-living, and artificial, possesses some data. For example, you, dear reader, can have a gigabyte-sized data file filled with your data. The details of your academic & professional career, your personal & financial credentials, browser history, bookmarks & passwords – there are many different kinds of data and much information. Now, think of the number of human beings on this planet. Think of all the different tangible & abstract entities in existence —governments, businesses, institutions, animals, cars, computers, tools, applications, etc. Everything possesses and can be represented using data.
Statistics, a branch of applied mathematics, provides us with the tools to process, understand, manipulate, and extract information from data. And frequency distributions are one of the most commonly used and effective tools among them all.
This lucid guide provides you with everything you need to know about frequency distributions and their tabulations.
What is a Frequency Distribution Table?
Statistics applies mathematical laws, rules, operations, etc., to better sense data. Statistical distributions such as frequency distributions are, in essentiality, functions that operate upon data to uncover information, identify relationships, etc.
In any research, the next step after data collation is always cleaning up and organizing data. This is crucial if we are to carry out effective and in-depth analysis & uncover
Frequency distributions are functions that help to identify any pattern in a dataset. They do so by visualizing the frequency of occurrence of elements in the data set across observations and looking at their variability. Frequency distributions allow analysts to determine the number of times a value or an outcome appears across observations. They also showcase the range of values or the spread of an attribute.
Frequency distribution is vital in descriptive statistics and generally appears in tabular or graphical format. Let’s have a look.
How to Construct Frequency Distribution Tables in Statistics?
A frequency distribution table is a common way to present the frequency of values/outcomes and their range across observations. Tables organize the observations of the attributes/parameters under study with columns presenting the values recorded.
- Frequency à The inverse of time or, rather, in statistics, the number of times a value or range of values appears when studying a variable;
- Distribution à Distributions in statistics map observation values of attributes to a certain measure. Probability and frequency distributions are two primary categories of statistical observations.
A frequency distribution, thus, maps attribute values to their frequency of appearance across observations. They generally appear as tables or graphs. An accurate frequency distribution or graph requires complete information about the range of values research attributes take. The range is then divided into class intervals for clarity, organization, and easy visualization.
Thus, if we had to define frequency di7stributions, it can go something like this à
Frequency distributions are collections of observations obtained by segregating observations into classes and observing the frequency of occurrence of values/outcomes in every class.
There are different kinds of frequency distributions in descriptive statistics. Each kind has its nuances and applications. Developing each type is a bit different than the other. We will take a look at each of these types in detail below.
Different Types of Frequency Distributions
The five major types of frequency distributions are à
- Grouped Frequency Distribution
- Ungrouped Frequency Distribution
- Cumulative Frequency Distribution
- Relative Frequency Distribution
- Relative Cumulative Frequency Distribution
Each of these variants tells us all the different possible outcomes of attribute/s. They also show the number of times a single attribute or multiple attributes take a particular value, the range of their values, and the data interval in which attributes lie.
Let’s look at them one by one.
Frequency Distribution Table for Grouped Data
Consider the following table that lists readings of the weights of male students in a statistics course.
The table below shows a common way of classifying the above observations according to the frequency of occurrence.
Frequency Distribution Grouped Data
As evident, the entire range of values observed are grouped into classes. Each class or class interval comprises a group of values. This is called a grouped frequency distribution.
Observation data are grouped into 12 weight classes, each with an interval of 10. Every class interval can take up to 10 possible values. From the list of weights, we find that the lowest weight belongs to the lowest class while the highest value of weight belongs to the topmost class. The entire distance between the bottom and top classes is divided into multiple class intervals. The frequency column informs us how many times data from each class interval appears and shows us the total frequency of observations.
As can be seen, frequencies peak at 150-159. We also have a lower peak at 160-169. The frequency gradually decreases, but we find heavy concentrations in the 160s and 170s. If we plotted a frequency distribution graph, we would find that the distribution of weights is not balanced but tilted in the direction of the heavyweights.
Next, let’s look at the frequency distributions of ungrouped data.
Frequency Distribution Table for Ungrouped Data
Ungrouped frequency distributions present the frequencies of individual data elements instead of data classes. These distribution types come in handy when determining the number of times specific values appear in a dataset/s or observation/s.
One key thing to note is that ungrouped frequency distributions work best when the number of samples or observations is low. Things become a problem when there are hundreds of observations, most with unique values.
Here’s an example of an ungrouped frequency distribution—>
An ungrouped frequency distribution table will work fine because the number of unique values is low. Ungrouped frequency distributions are also called discrete frequency distributions.
Let’s now take a look at relative frequency distributions.
Relative Frequency Distribution Tables
This is an important variation of a generic frequency distribution that shows the frequency of every class interval as a fraction/part/percentage of the total frequency of the entire distribution. The advantage of relative frequency distributions is that they show us the concentration of observations among different classes in a particular distribution.
Above is a relative frequency distribution table developed from weight observations. The relative frequencies are obtained by finding the percentage of the ratios of frequencies of each class and the total frequency.
So, for the 160-169 class interval, we find the relative frequency as follows
(12/53) * 100 = 22.64 % or 23 %
This tells us that values within the 160-169 class interval comprise around 23% of all observations. Relative frequencies help compare two or more distributions based on the total number of observations.
You can easily convert a frequency distribution table into a relative frequency distribution table. Do so by dividing the frequency of each class by the total frequency of the whole distribution and then multiplying by 100. This gives you the percentage. If you want proportions, then skip the multiplication by 100. Proportions will vary between 0 and 10, while percentages will vary between 0 and 100. Relative frequency is the normalization of the absolute frequency by the total number of frequencies.
Let’s move on to the next type, cumulative frequency distributions.
Cumulative Frequency Distribution Tables
Simply put, cumulative frequencies show the absolute frequencies of all values/ outcomes/groups at or below a certain level. Cumulative frequencies allow us to find out the relative standing of a group in a distribution. In most cases, cumulative frequencies are converted into percentages, also known as percentile ranks.
Here are the steps to crafting a cumulative frequency table.
- Develop the generic frequency table.
- Add the frequency of each class to the sum of frequencies of all classes below it. The sum will give you the cumulative frequency of every class.
- Always begin with the lower-most class and then work your way upwards. Keep on calculating the cumulative frequencies in ascending order.
Here’s the cumulative frequency table for our weight observations.
If relative comparisons among different classes are particularly important, then cumulative frequencies are converted to cumulative percentages. Cumulative percentages show the relative changes among classes, that is, how much one class interval differs from the other.
The above example shows a huge increase in cumulative frequency percentages at 150-159 and 160-169. Subsequent increases are relatively low. This indicates that values are concentrated at these class intervals.
Let’s move on to the final variant, the relative cumulative frequency distribution.
Relative Cumulative Frequency Table
A relative cumulative frequency distribution sums the relative frequency of all values at and below a certain class interval. We can also define cumulative relative frequency as the percentage of times a value appears at or below a certain class interval.
Finding the relative cumulative frequency is nothing way too tough. All you need to do is find the cumulative frequencies of all class intervals and the relative cumulative frequencies by finding the percentage of individual cumulative frequencies concerning the total frequency.
Here are new data and their cumulative & relative cumulative frequencies.
All data is rounded off to the nearest value. As per the process, frequencies are added starting from the lowest class interval to obtain the cumulative frequencies. Relative and cumulative relative frequencies are determined by dividing the frequency of each class interval by the total frequency & then multiplying by 100.
If we had to interpret cumulative relative frequency, we could say that 53.3 % and 83.3 % of all values lie in the 91-103 and 103-115 class intervals.
And those were the five major variants of frequency distributions.
One key thing to note is that the nature of variables affects frequency distributions and their nature. Though the essence of both processes is the same, the format for developing discrete frequency distributions differs slightly from continuous frequency distributions.
Discrete vs Continuous Frequency Distributions
Discrete variables can take discrete values within the range of their variation. It is, thus, natural to come up with appropriate classes for accommodating all discrete values. Below is an example.
- Given are the weekly wages of workers. Develop a discrete frequency distribution from the data.
300, 240, 240, 150, 120, 240, 120, 120, 150, 150, 150, 240, 150, 150, 120, 300, 120, 150, 240, 150, 150, 120, 240, 150, 240, 150, 120, 120, 240, 150
The frequency distribution of all the given discrete values will be of the following form:
Below is data showcasing the daily maximum temperature in a city for 50 consecutive days.
As can be seen, the temperature variable takes discrete values. The data is non-categorical and ordinal. Hence, converting these discrete values into continuous values will be relatively easy. All we need to do is define appropriate class intervals.
To do that, we need to define:
- The minimum value among the data: 17
- The maximum value: 35
- The entire range of values: 18
If we take the number of classes to be 5, then an appropriate width for every class can be 4. We should be able to round off the product of class intervals and the number of classes to the entire data range.
The continuous frequency distribution will, thus, be as follows:
Let’s wrap up this write-up with a look at the key parameters/characteristics of frequency distributions.
Characteristics of Parameters of Frequency Distributions
Frequency distributions possess key parameters or characteristics central to descriptive statistics.
The three most important among them are:
- Measures of central tendency
- Measures of dispersion or variability
- Measures of symmetry/asymmetry or skewness
Let’s take a quick look at each one of these parameters.
- Measures of Central Tendency
Measures of central tendency define how much the data tends towards the central position of the dataset. Naturally, they also help in finding the central position as well. These measures help in finding the center around which data is distributed.
Mean, mode and median are the three key measures of central tendency. Mean is the average value, mode denotes the middle value, and median is the value that appears the most.
Consider the following dataset à 1, 9, 2, 5, 55, 47, 3, 4, 7, 101
- The mean is calculated by dividing the sum of all data elements by the total number of all data elements. For the given dataset, the mean is
(1+9+2+5+55+47+3+4+7+101)/10 = 234/10=23.4
- The median involves arranging all data elements in an ordered manner, preferably in ascending order. If the number of elements is odd, then the median is the middle value. If even, then we take the average of the two middle values.
If we arrange the given dataset in an ascending manner, we will have the following:
1, 2, 3, 4, 5, 7, 9, 47, 55, 101
There are 10 elements, so the average of the two middle elements is the median.
So, the median is:
(5+7)/2 = 6
- The mode is the value that appears the most in the dataset. If no value repeats itself, then there’s no mode. And that is the case for our dataset.
- Measures of Dispersion or Variability
Dispersion or variability defines how spread or scattered the data in a distribution is. Measures of dispersion are range, interquartile range, standard deviation, and variance.
- The range is calculated by subtracting the minimum value in a dataset from the maximum value.
- Variance defines the average degree by which all values or points in a dataset differ from the mean of all data points. It combines all the values in a data set and produces a measure of spread. The formula is:
s2 = ΣNi=1 (Xi – m)2 / N
Where Xi are the data elements, N is the total number of data elements, m is the sample mean for sample variance, and the population mean for population variance,
- Standard deviation is the square root of variance. It denotes the dispersion of a dataset from the mean. Higher standard deviations indicate higher dispersion.
s = ΣNi=1 (Xi – m)2 / N
- To understand the interquartile range, we first need to understand quartiles and percentiles.
Percentiles or Nth percentiles state the Nth percentage of values lesser or equal to a certain value. Intuitively., (100-N) the percentage of values are above that value. The formula is à
i = (N/100) * n
where N is the value of interest and n is the total number of values
Quartiles divide the entire dataset into four parts of more or less equal size. The dataset needs to be ordered from low to high to find quartiles. If we sort in an ascending manner and then divide the entire dataset into four near-about equal parts, we would find a:
- 1st or lower quartile or the 25th percentile (25% of all values)
- 2nd quartile or 50th percentile (50% of all values or the median)
- 3rd or upper quartile or 75th percentile (75% of all values)
The interquartile range is then calculated by subtracting the lower quartile from the upper quartile. This is the middle 50% range of all the data items.
Well, that’s all the space we have for today. Hope this was an interesting and informative read for one & all. Use this article for quick reference anytime you need help with frequency distribution tables.
However, if you wish to avail yourself of expert help, connect with us today. At MyAssignmenthelp.com, we have industry-leading academic writing and tutoring professionals to help you.
Call, mail, or drop a message at our live chat portal today.
Frequently Asked Question (FAQs)
What is a frequency distribution table, and how is it used to organize data?
Frequency distribution tables are tabulated representations that present or organize the frequency counts of the values or outcomes of a set of variables. When it comes sot organizing data, absolute, relative, and cumulative frequencies.
- Organize data items in an ordered fashion,
- Show the number of times a specific outcome appears,
- Indicate the class to which it belongs,
- How data elements are concentrated or spread out,
- How do different classes compare to one another
How do you create a frequency distribution table, and what are its key components?
To create a frequency distribution table, you need to know:
- The complete dataset to be analyzed
- The range of the entire dataset
- Determine frequencies of every element in the dataset
- Develop an appropriate number of class intervals for grouped data
- If the data is ungrouped, then you can group them by developing the right number of class intervals of the right size.
- Calculate the frequencies of all the data elements and then find relative, cumulative, or relative cumulative frequencies.
What is the purpose of constructing a frequency distribution table, and what insights can it provide about a dataset?
Frequency distribution is one of the most powerful tools in descriptive statistics. It allows us to:
- To take a glance at the entire dataset at once
- Organize and present data in an intuitive and easy-to-understand manner
- Find out the amount of dispersion or spread in the data
- Identify outliers or extremities
- It helps in developing apt visual representations
- Compare information among different data sets
You can find the mean, variance, quartiles & percentiles, and standard deviation from a dataset.
How can outliers or extreme values impact the interpretation of a frequency distribution table?
Outliers and extremities lie at an abnormal distance from central positions and/or other data points in a distribution. If there are quite a few outliers in the dataset, then both central tendency and dispersion measures can be affected. Outliers affect the mean but not the median or mode. Outliers on the left side of a frequency distribution graph can bring down the mean, while those on the right can push up the mean.
A value must lie at least 3 standard deviations from the mean to be classified as an outlier.
In what situations or fields is a frequency distribution table commonly employed for data analysis and representation?
Frequency distributions are employed in any field that needs to make better sense or extract information from their data. From psychology to AI, law to medicine, education to social sciences, natural sciences to engineering, frequency distribution tables and descriptive statistics are employed EVERYWHERE for easy data representation & analysis.