Logistic Regression
Main Idea: Find the parameters for a line that partitions a data set.
General Approach (MLIA: p. 84)
- Collect Data: any method
- Prepare Data: Convert to numeric data if needed.
- Analyze: Any method.
- Train: Find the optimal coefficients to classify the data.
- Use: Given new data, classify it based on the previously classified data.
Pros, Cons, and Data Types
Pros:Cons:
- Computationally Cheap
- Easy to implement
- Easy to interpret
Data Types:
- Succeptible to overfitting
- Not always accurate
- Numeric Values
- Nominal Values
Sidebar on Nominal Values
Nominal Values are data that you can determine to be equivalent to other data or belonging to a set of data, but no ordering or other numeric calculations are possible.
- Dichotomous: Belongs to one of two groups
- Non-Dichotomous: Belongs to one of multiple groups
- Nominal Values are usually summarized using frequencies or percentages (and sometimes summarized by mode).
- Column (bar) charts are the best form of graphical representation (along with pie charts)
- These are also called categorical or qualitative values