In this section, we will prepare the data for modeling, training and testing.
The target (as noted previously) is the ‘passed’ column. Here I’ll list the feature columns to get an idea of what’s there.
Variable | Description | Data Values |
---|---|---|
Dalc | workday alcohol consumption | 1, 2, 3, 4, 5 |
Fedu | father’s education | 0, 1, 2, 3, 4 |
Fjob | father’s job | at_home, health, other, services, teacher |
Medu | mother’s education | 0, 1, 2, 3, 4 |
Mjob | mother’s job | at_home, health, other, services, teacher |
Pstatus | parent’s cohabitation status | A, T |
Walc | weekend alcohol consumption | 1, 2, 3, 4, 5 |
absences | number of school absences | 0 ... 75 |
activities | extra-curricular activities | no, yes |
address | student’s home address type | R, U |
age | student’s age | 15, 16, 17, 18, 19, 20, 21, 22 |
failures | number of past class failures | 0, 1, 2, 3 |
famrel | quality of family relationships | 1, 2, 3, 4, 5 |
famsize | family size | GT3, LE3 |
famsup | family educational support | no, yes |
freetime | free time after school | 1, 2, 3, 4, 5 |
goout | going out with friends | 1, 2, 3, 4, 5 |
guardian | student’s guardian | father, mother, other |
health | current health status | 1, 2, 3, 4, 5 |
higher | wants to take higher education | no, yes |
internet | Internet access at home | no, yes |
nursery | attended nursery school | no, yes |
paid | extra paid classes within the course subject (Math or Portuguese) | no, yes |
reason | reason to choose this school | course, home, other, reputation |
romantic | within a romantic relationship | no, yes |
school | student’s school | GP, MS |
schoolsup | extra educational support | no, yes |
sex | student’s sex | F, M |
studytime | weekly study time | 1, 2, 3, 4 |
traveltime | home to school travel time | 1, 2, 3, 4 |
Some Machine Learning algorithms (e.g. Logistic Regression) require numeric data so the columns with string-data need to be transformed. The columns in this data-set that had ‘yes’ or ‘no’ values had the values converted to 1 and 0 respectively. Those columns that had other kinds of categorical data were transformed into dummy-variable columns.
In addition, the target data was also changed so that instead of ‘yes’ and ‘no’ values it contained only ‘1’ and ‘0’ values.
- Original Feature Columns: 30
- With Dummies: 48
With dummy variables there are now 18 more columns in the feature data.