Supervised Learning
4.2. Supervised Learning#
Learning Outcome
Students will be able to classify data using supervised machine learning techniques, search for and define a function that describes how different measured variables are related to one another and utilize predictive techniques such as linear regression.
Sample Tasks
Differentiate between supervised and unsupervised learning.
Identify, or give an example of, an unsupervised learning technique.
Identify, or give an example of, a supervised learning technique.
Classify data using K-nearest neighbors.
Classify discrete data using the Naive Bayes algorithm.
Use simple linear regression analysis to predict the value of a response variable based on a given explanatory variable.
Interpret the y-intercept and make inferences about the slope of a simple linear regression equation.
Evaluate the assumptions of regression analysis and know what to do if the assumptions are violated.
Interpret the correlation coefficient.
Describe the purpose of multiple linear regression.
Input variable information and data for multiple linear regression.
Describe and discern the data assumptions required for multiple linear regression.
Interpret scatterplots and probability plots concerning the data assumptions for multiple linear regression.
Write a prediction equation and make predictions based on a multiple linear regression model.
Use a command such as lm() in R to perform multiple linear regression.
Use logistic regression to describe the relationship between an explanatory variable and a dichotomous response variable.
Compare and contrast logistic regression and ordinary least squares regression.
Fit a logistic model and use the model to estimate the odds from a single probability.
Describe the statistical model of logistic regression with a single explanatory variable.
Identify the estimates of the regression parameters and write the equation for a fitted model.
For a given logistic model, compute and interpret the threshold value.
Use a command such as glm() in R to perform logistic regression.
Our first readings, from Computational and Inferential Thinking [ADW21], explain the simplest statistical learning model of fitting a line to show the relationship between two features.
Reading Questions
How do you calculate a correlation coefficient?
Why is it called least squares?
Thinking Question
How would an outlier affect the regression line?
Our second readings, also from Computational and Inferential Thinking [ADW21], explain the simplest machine learning algorithm for classification.
Reading Questions
What is the k in k-nearest neighbors?
What is the difference between an attribute and a feature?
How do you measure the accuracy of a classifier?
Further Resource
This reading, from 5. Machine Learning in the Python Data Science Handbook [Van16], goes deeper into linear regression.
Further Resource
These extra readings from Learning Data Science [LGN23] use Calculus and Linear Algebra concepts that are not expected for readers of this book.