Preparing the Data ------------------ In this section, we will prepare the data for modeling, training and testing. Identify feature and target columns ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The target (as noted previously) is the 'passed' column. Here I'll list the feature columns to get an idea of what's there. .. csv-table:: Features :header: Variable, Description, Data Values :delim: ; Dalc;workday alcohol consumption;1, 2, 3, 4, 5 Fedu;father's education;0, 1, 2, 3, 4 Fjob;father's job;at_home, health, other, services, teacher Medu;mother's education;0, 1, 2, 3, 4 Mjob;mother's job;at_home, health, other, services, teacher Pstatus;parent's cohabitation status;A, T Walc;weekend alcohol consumption;1, 2, 3, 4, 5 absences;number of school absences;0 ... 75 activities;extra-curricular activities;no, yes address;student's home address type;R, U age;student's age;15, 16, 17, 18, 19, 20, 21, 22 failures;number of past class failures;0, 1, 2, 3 famrel;quality of family relationships;1, 2, 3, 4, 5 famsize;family size;GT3, LE3 famsup;family educational support;no, yes freetime;free time after school;1, 2, 3, 4, 5 goout;going out with friends;1, 2, 3, 4, 5 guardian;student's guardian;father, mother, other health;current health status;1, 2, 3, 4, 5 higher;wants to take higher education;no, yes internet;Internet access at home;no, yes nursery;attended nursery school;no, yes paid;extra paid classes within the course subject (Math or Portuguese);no, yes reason;reason to choose this school;course, home, other, reputation romantic;within a romantic relationship;no, yes school;student's school;GP, MS schoolsup;extra educational support;no, yes sex;student's sex;F, M studytime;weekly study time;1, 2, 3, 4 traveltime;home to school travel time;1, 2, 3, 4 Preprocess feature columns ~~~~~~~~~~~~~~~~~~~~~~~~~~ Some Machine Learning algorithms (e.g. Logistic Regression) require numeric data so the columns with string-data need to be transformed. The columns in this data-set that had 'yes' or 'no' values had the values converted to 1 and 0 respectively. Those columns that had other kinds of categorical data were transformed into dummy-variable columns. In addition, the target data was also changed so that instead of 'yes' and 'no' values it contained only '1' and '0' values. * Original Feature Columns: 30 * With Dummies: 48 With dummy variables there are now 18 more columns in the feature data. Split data into training and test sets ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Next the data was shuffled and then split into training and testing sets. .. csv-table:: Training and Testing Data :header: Set, Count Training Instances,300 Test Instances,95