4.2. Supervised Learning#

Learning Outcome

Students will be able to classify data using supervised machine learning techniques, search for and define a function that describes how different measured variables are related to one another and utilize predictive techniques such as linear regression.

Sample Tasks

  • Differentiate between supervised and unsupervised learning.

  • Identify, or give an example of, an unsupervised learning technique.

  • Identify, or give an example of, a supervised learning technique.

  • Classify data using K-nearest neighbors.

  • Classify discrete data using the Naive Bayes algorithm.

  • Use simple linear regression analysis to predict the value of a response variable based on a given explanatory variable.

  • Interpret the y-intercept and make inferences about the slope of a simple linear regression equation.

  • Evaluate the assumptions of regression analysis and know what to do if the assumptions are violated.

  • Interpret the correlation coefficient.

  • Describe the purpose of multiple linear regression.

  • Input variable information and data for multiple linear regression.

  • Describe and discern the data assumptions required for multiple linear regression.

  • Interpret scatterplots and probability plots concerning the data assumptions for multiple linear regression.

  • Write a prediction equation and make predictions based on a multiple linear regression model.

  • Use a command such as lm() in R to perform multiple linear regression.

  • Use logistic regression to describe the relationship between an explanatory variable and a dichotomous response variable.

  • Compare and contrast logistic regression and ordinary least squares regression.

  • Fit a logistic model and use the model to estimate the odds from a single probability.

  • Describe the statistical model of logistic regression with a single explanatory variable.

  • Identify the estimates of the regression parameters and write the equation for a fitted model.

  • For a given logistic model, compute and interpret the threshold value.

  • Use a command such as glm() in R to perform logistic regression.

[OhioDoHEducation21]

Our first readings, from Computational and Inferential Thinking [ADW21], explain the simplest statistical learning model of fitting a line to show the relationship between two features.

Reading Questions

  • How do you calculate a correlation coefficient?

  • Why is it called least squares?

Thinking Question

  • How would an outlier affect the regression line?

Our second readings, also from Computational and Inferential Thinking [ADW21], explain the simplest machine learning algorithm for classification.

Reading Questions

  • What is the k in k-nearest neighbors?

  • What is the difference between an attribute and a feature?

  • How do you measure the accuracy of a classifier?

Further Resource

This reading, from 5. Machine Learning in the Python Data Science Handbook [Van16], goes deeper into linear regression.

Further Resource

These extra readings from Learning Data Science [LGN23] use Calculus and Linear Algebra concepts that are not expected for readers of this book.