Single vs. Two or More Variables
2.2. Single vs. Two or More Variables#
Learning Outcome
Students will be able to select appropriate charting techniques based on the type of data and the number of variables they intend to present.
Sample Tasks
Discuss differences between numerical and categorical data.
What types of charts are appropriate for numerical data?
What types of charts are appropriate for categorical data?
Effects of outliers
Explain the difference between an explanatory variable and a response variable.
Generate a general hypothesis about the relationship between two variables.
Construct a scatter plot using a large data set containing 1000+ points.
Confirm that a trendline is appropriate for the data.
Build a linear model using the trendline.
Examine the outliers and decide if they should be included in the model.
If appropriate, remove the outliers, adjust model, plot a new trendline, and build a new model.
Compare the two plots, trendlines, and models.
Summarize the effects of the outliers on the response variable.
Our first set of readings, from Computational and Inferential Thinking [ADW21], show how to use bar charts for categorical data and histograms for numerical data. Such charts are applied to a single column of data, and so illustrate the behavior of 1 variable.
Reading Questions
What is the area principle for visualizations?
What makes a histogram different from a bar chart?
Our second set of readings are from Learning Data Science [LGN23] and are repeated from Section 2.1.
The first of these also considers visualizing the behavior of 1 variable:
The second of these considers visualizing the relationship between 2 variables:
Reading Questions
What is a rug plot?
What is the difference between a histogram and a density plot?
How can you illustrate the relationship between a categorical feature and a numerical feature?
Further Resources
See Section 2.1 for information on plotting with Matplotlib and Seaborn.