2.2. Single vs. Two or More Variables#

Learning Outcome

Students will be able to select appropriate charting techniques based on the type of data and the number of variables they intend to present.

Sample Tasks

  • Discuss differences between numerical and categorical data.

    • What types of charts are appropriate for numerical data?

    • What types of charts are appropriate for categorical data?

  • Effects of outliers

    • Explain the difference between an explanatory variable and a response variable.

    • Generate a general hypothesis about the relationship between two variables.

    • Construct a scatter plot using a large data set containing 1000+ points.

    • Confirm that a trendline is appropriate for the data.

    • Build a linear model using the trendline.

    • Examine the outliers and decide if they should be included in the model.

    • If appropriate, remove the outliers, adjust model, plot a new trendline, and build a new model.

    • Compare the two plots, trendlines, and models.

    • Summarize the effects of the outliers on the response variable.

[OhioDoHEducation21]

Our first set of readings, from Computational and Inferential Thinking [ADW21], show how to use bar charts for categorical data and histograms for numerical data. Such charts are applied to a single column of data, and so illustrate the behavior of 1 variable.

Reading Questions

  • What is the area principle for visualizations?

  • What makes a histogram different from a bar chart?

Our second set of readings are from Learning Data Science [LGN23] and are repeated from Section 2.1.

Reading Questions

  • What is a rug plot?

  • What is the difference between a histogram and a density plot?

  • How can you illustrate the relationship between a categorical feature and a numerical feature?

Further Resources

See Section 2.1 for information on plotting with Matplotlib and Seaborn.