bugl
bugl
HomeLearnPatternsPathsSearch
HomeLearnPatternsPathsSearch

Loading lesson path

Learn/Data Science/DS Statistics
Data Science•DS Statistics

Data Science - Statistics Variance

Flash cards

Review the key moves

1/4
Core idea

What is the main idea behind Data Science - Statistics Variance?

Lesson checks

Practice each idea before moving on

Short Mimo-style checks built from this lesson's code, terms, and sequence.

1Quick choice

Which statement best captures the main point of this lesson?

2Order

Put the learning moves in the order that makes the concept easiest to apply.

Step 3: For Each Difference - Find the Square Value
Step 2: For Each Value - Find the Difference From the Mean
Step 1 to Calculate the Variance: Find the Mean
3Data move

Before charting or modeling a dataset, which move should come first?

Variance

Variance is another number that indicates how spread out the values are.

In fact, if you take the square root of the variance, you get the standard deviation. Or the other way around, if you multiply the standard deviation by itself, you get the variance!

We will first use the data set with 10 observations to give an example of how we can calculate the variance:

DurationAverage_PulseMax_PulseCalorie_BurnageHours_WorkHours_Sleep
3080120240107
3085120250107
459013026087
459513027087
4510014028007
6010514029078
6011014530078
6011514531088
7512015032008
7512515033088

Tip

Variance is often represented by the symbol Sigma Square: σ^2

Step 1 to Calculate the Variance: Find the Mean

We want to find the variance of Average_Pulse.

  1. Find the mean:
(80+85+90+95+100+105+110+115+120+125) / 10 = 102.5

The mean is 102.5

Step 2: For Each Value - Find the Difference From the Mean

  1. Find the difference from the mean for each value:
80 - 102.5 = -22.5
85 - 102.5 = -17.5
90 - 102.5 = -12.5
95 - 102.5 =
 -7.5
100 - 102.5 = -2.5
105 - 102.5 = 2.5
110 - 102.5 = 7.5
115 -
 102.5 = 12.5
120 - 102.5 = 17.5
125 - 102.5 = 22.5

Step 3: For Each Difference - Find the Square Value

  1. Find the square value for each difference:
(-22.5)^2 = 506.25
(-17.5)^2 = 306.25
(-12.5)^2 = 156.25
(-7.5)^2 =
 56.25
(-2.5)^2 = 6.25
2.5^2 = 6.25
7.5^2 = 56.25
12.5^2 = 156.25
 17.5^2 = 306.25
22.5^2 = 506.25

Note

We must square the values to get the total spread.

Step 4: The Variance is the Average Number of These Squared Values

  1. Sum the squared values and find the average:
(506.25 + 306.25 + 156.25 + 56.25 + 6.25 + 6.25 + 56.25 + 156.25 + 306.25 +
 506.25) / 10 = 206.25

The variance is 206.25.

Use Python to Find the Variance of health_data

We can use the var() function from Numpy to find the variance (remember that we now use the first data set with 10 observations):

Example

import numpy as np
var = np.var(health_data)
print(var)

Use Python to Find the Variance of Full Data Set

Here we calculate the variance for each column for the full data set:

Example

import numpy as np
var_full = np.var(full_health_data)
print(var_full)

Previous

Data Science - Statistics Standard Deviation

Next

Data Science - Statistics Correlation