ISL Default Dataset
Table of Contents
Set Up
# python
from pathlib import Path
import os
# pypi
from dotenv import load_dotenv
import numpy
import pandas
import rpy2.robjects as robjects
import rpy2.robjects.packages as rpackages
The Dataset
Load It
base = rpackages.importr("base")
utils = rpackages.importr("utils")
utils.chooseCRANmirror(ind=1)
utils.install_packages("ISLR")
islr = rpackages.importr("ISLR")
default = rpackages.data(islr).fetch("Default")["Default"]
default = pandas.DataFrame.from_dict({key: numpy.asarray(default.rx2(key))
for key in default.names})
Summarize It
default.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 10000 entries, 0 to 9999 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 default 10000 non-null int32 1 student 10000 non-null int32 2 balance 10000 non-null float64 3 income 10000 non-null float64 dtypes: float64(2), int32(2) memory usage: 234.5 KB
Save It
The reason for converting the rpy2 object to a pandas DataFrame was to make saving it easier. The rpy2 DataFrame does have a method called to_csvfile
but I've never used it before so this seemed safer.
env_path = Path("~/.local/share/env").expanduser()
load_dotenv(env_path, override=True)
path = Path(os.environ["ISL_DEFAULT"]).expanduser()
default.to_csv(path, index=False)
Citation
- James, G., Witten, D., Hastie, T., and Tibshirani, R. ISLR: Data for an Introduction to Statistical Learning with Applications in R. 2017. [R package version 1.2]. CRAN. [cited 2020 Aug 7]. https://cran.r-project.org/package=ISLR
Comment
This is a simulated dataset created for An Introduction To Statistical Learning. Unlike the ISL Credit Data Set dataset this isn't listed for download on the books page which is why I downloaded it through R, although it looks like it is available from the gzipped tar file on the site but it's in the rda format - a binary format that sounds sort of like the pickle format but for R.