ISL Default Dataset

Set Up

# python
from pathlib import Path
import os

# pypi
from dotenv import load_dotenv

import numpy
import pandas
import rpy2.robjects as robjects
import rpy2.robjects.packages as rpackages

The Dataset

Load It

base = rpackages.importr("base")
utils = rpackages.importr("utils")

utils.chooseCRANmirror(ind=1)
utils.install_packages("ISLR")
islr = rpackages.importr("ISLR")
default = rpackages.data(islr).fetch("Default")["Default"]
default = pandas.DataFrame.from_dict({key: numpy.asarray(default.rx2(key))
                                      for key in default.names})

Summarize It

default.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   default  10000 non-null  int32  
 1   student  10000 non-null  int32  
 2   balance  10000 non-null  float64
 3   income   10000 non-null  float64
dtypes: float64(2), int32(2)
memory usage: 234.5 KB

Save It

The reason for converting the rpy2 object to a pandas DataFrame was to make saving it easier. The rpy2 DataFrame does have a method called to_csvfile but I've never used it before so this seemed safer.

env_path = Path("~/.local/share/env").expanduser()
load_dotenv(env_path, override=True)
path = Path(os.environ["ISL_DEFAULT"]).expanduser()
default.to_csv(path, index=False)

Citation

  • James, G., Witten, D., Hastie, T., and Tibshirani, R. ISLR: Data for an Introduction to Statistical Learning with Applications in R. 2017. [R package version 1.2]. CRAN. [cited 2020 Aug 7]. https://cran.r-project.org/package=ISLR

Comment

This is a simulated dataset created for An Introduction To Statistical Learning. Unlike the ISL Credit Data Set dataset this isn't listed for download on the books page which is why I downloaded it through R, although it looks like it is available from the gzipped tar file on the site but it's in the rda format - a binary format that sounds sort of like the pickle format but for R.