Software

Here are some auto-generated descriptions of some of the software used. train_test_split, mean_squared_error, DecisionTreeRegressor, and GridSearchCV are from sklearn, the rest was part of this project.

Evaluating Model Performance

Splitting the Data

shuffle_split_data(X, y[, test_size, ...]) Shuffles and splits data into training and testing subsets
train_test_split(*arrays, **options) Split arrays or matrices into random train and test subsets

The Performance Metric

performance_metric(y_true, y_predict) Calculates total error between true and predicted values
mean_squared_error(y_true, y_pred[, ...]) Mean squared error regression loss

Decision Tree Regressor

fit_model(X, y[, k, n_jobs]) Tunes a decision tree regressor model using GridSearchCV
DecisionTreeRegressor([criterion, splitter, ...]) A decision tree regressor.

Analyzing Model Performance

learning_curves(X_train, y_train, X_test, y_test) Calculates performance of several models with varying training data sizes
model_complexity(X_train, y_train, X_test, ...) Calculates the performance of the model as model complexity increases.

Model Prediction

ci(data[, statfunction, alpha, n_samples, ...]) Given a set of data data, and a statistics function statfunction that applies to that data, computes the bootstrap confidence interval for statfunction on that data.