Flash cards
Review the key moves
What is the main idea behind Data Science - Database Table?
Lesson checks
Practice each idea before moving on
Short Mimo-style checks built from this lesson's code, terms, and sequence.
Which statement best captures the main point of this lesson?
Put the learning moves in the order that makes the concept easiest to apply.
Before charting or modeling a dataset, which move should come first?
Database Table
A database table is a table with structured data.
The following table shows a database table with health data extracted from a sports watch:
| Duration | Average_Pulse | Max_Pulse | Calorie_Burnage | Hours_Work | Hours_Sleep |
|---|---|---|---|---|---|
| 30 | 80 | 120 | 240 | 10 | 7 |
| 30 | 85 | 120 | 250 | 10 | 7 |
| 45 | 90 | 130 | 260 | 8 | 7 |
| 45 | 95 | 130 | 270 | 8 | 7 |
| 45 | 100 | 140 | 280 | 0 | 7 |
| 60 | 105 | 140 | 290 | 7 | 8 |
| 60 | 110 | 145 | 300 | 7 | 8 |
| 60 | 115 | 145 | 310 | 8 | 8 |
| 75 | 120 | 150 | 320 | 0 | 8 |
| 75 | 125 | 150 | 330 | 8 | 8 |
This dataset contains information of a typical training session such as duration, average pulse, calorie burnage etc.
Database Table Structure
A database table consists of column(s) and row(s):
| Column 1 | Column 2 | Column 3 | Column 4 | Column 5 | Column 6 | |
|---|---|---|---|---|---|---|
| Duration | Average_Pulse | Max_Pulse | Calorie_Burnage | Hours_Work | Hours_Sleep | |
| Row 1 | 30 | 80 | 120 | 240 | 10 | 7 |
| Row 2 | 30 | 85 | 120 | 250 | 10 | 7 |
| Row 3 | 45 | 90 | 130 | 260 | 8 | 7 |
| Row 4 | 45 | 95 | 130 | 270 | 8 | 7 |
| Row 5 | 45 | 100 | 140 | 280 | 0 | 7 |
| Row 6 | 60 | 105 | 140 | 290 | 7 | 8 |
| Row 7 | 60 | 110 | 145 | 300 | 7 | 8 |
| Row 8 | 60 | 115 | 145 | 310 | 8 | 8 |
| Row 9 | 75 | 120 | 150 | 320 | 0 | 8 |
| Row 10 | 75 | 125 | 150 | 330 | 8 | 8 |
A row is a horizontal representation of data.
A column is a vertical representation of data.
Variables
A variable is defined as something that can be measured or counted.
Examples can be characters, numbers or time.
In the example under, we can observe that each column represents a variable.
| Duration | Average_Pulse | Max_Pulse | Calorie_Burnage | Hours_Work | Hours_Sleep |
|---|---|---|---|---|---|
| 30 | 80 | 120 | 240 | 10 | 7 |
| 30 | 85 | 120 | 250 | 10 | 7 |
| 45 | 90 | 130 | 260 | 8 | 7 |
| 45 | 95 | 130 | 270 | 8 | 7 |
| 45 | 100 | 140 | 280 | 0 | 7 |
| 60 | 105 | 140 | 290 | 7 | 8 |
| 60 | 110 | 145 | 300 | 7 | 8 |
| 60 | 115 | 145 | 310 | 8 | 8 |
| 75 | 120 | 150 | 320 | 0 | 8 |
| 75 | 125 | 150 | 330 | 8 | 8 |
There are 6 columns, meaning that there are 6 variables (Duration, Average_Pulse, Max_Pulse, Calorie_Burnage, Hours_Work, Hours_Sleep).
There are 11 rows, meaning that each variable has 10 observations.
But if there are 11 rows, how come there are only 10 observations?
It is because the first row is the label, meaning that it is the name of the variable.