Software¶

Here are some auto-generated descriptions of some of the software used. train_test_split, mean_squared_error, DecisionTreeRegressor, and GridSearchCV are from sklearn, the rest was part of this project.

Evaluating Model Performance¶

Splitting the Data¶

shuffle_split_data(X, y[, test_size, ...]) Shuffles and splits data into training and testing subsets

train_test_split(*arrays, **options) Split arrays or matrices into random train and test subsets

The Performance Metric¶

performance_metric(y_true, y_predict) Calculates total error between true and predicted values

mean_squared_error(y_true, y_pred[, ...]) Mean squared error regression loss

Decision Tree Regressor¶

fit_model(X, y[, k, n_jobs]) Tunes a decision tree regressor model using GridSearchCV

DecisionTreeRegressor([criterion, splitter, ...]) A decision tree regressor.

Grid Search¶

GridSearchCV(estimator, param_grid[, ...]) Exhaustive search over specified parameter values for an estimator.

Analyzing Model Performance¶

`learning_curves`(X_train, y_train, X_test, y_test)	Calculates performance of several models with varying training data sizes
`model_complexity`(X_train, y_train, X_test, ...)	Calculates the performance of the model as model complexity increases.

Model Prediction¶

ci(data[, statfunction, alpha, n_samples, ...]) Given a set of data data, and a statistics function statfunction that applies to that data, computes the bootstrap confidence interval for statfunction on that data.