Excel offers a large number of Statistics functions. This article will show you how to use these functions in your analyses.
To illustrate the statistical functions, we will start from the following table which contains the (fictitious) salaries of employees
And the distribution of wages is represented by this chart
The file to do the example is here
Common Excel Statistics functions
To perform the wage analysis, we will use many of the most common Excel statistics functions.
Min and Max functions
These functions return the minimum and maximum of the data set
=MIN(tbl_Employee[Salary]) => $1,520
=MAX(tbl_Employee[Salary]) => $4,750
How many values are in the series
Watch out! There may be a trap because COUNT and COUNTA don't return the same information
=COUNT(tbl_Employee[Salary]) => 50
=COUNTA(tbl_Employee[Salary]) => 50
The two functions return the same value here, but this is not always the case 🤔
The COUNT function counts only the numeric values in your range of cells. The COUNTA function counts ALL the values (texts and numbers)
Functions to analyze the entire series
Here we will see several functions that return different results. Properly used, these functions will reveal important information about your data.
Everyone knows the principle of averaging; we divide the sum of the values by the number of values
On the other hand, many people forget that the average gives only a partial vision. Indeed, it is always necessary to complete the mean by the standard deviation
=AVERAGE( tbl_Employee[Salary]) => $1,722.31
I have already presented the interest of the standard deviation in this article. But to summarize, it allows you to find the dispersion of your values around the average.
=STDEV.S(tbl_Employee[Salary]) => $688.56
The difference between Average and Standard Deviation is essential.
- 68% of the data are between the mean and 1 times the standard deviation
- 95% of the data are between the mean and 2 times the standard deviation
The MEDIAN is not the most known function but returns essential information.
The median returns the value that divides your series into two equal parts.
=MEDIAN(tbl_Salairés[Salary]) => $1,765
Therefore, we can say that half of the salaries are less than $1,765 (and also the opposite).
You can also find that the result of an Average and a Median has nothing to do with it and should not be confused.
The MODE function returns the most present value. different of the MOD function for modulo)
=MODE.SNGL(tbl_Employee[Salary]) => $1,560
So this indicates that the most common salary is $1,560 in our company. But we do not know how many times this salary is present on our list 🤔
This information could be easily obtained with the COUNTIFS function.
=COUNTIFS(tbl_Employee[Salary];MODE.SNGL(tbl_Employee[Salary])) => 4
The salary of $1,560 is present 4 times in our data set.