1.4. Cleaning data#

Learning Outcome

Students will be able to identify incorrect, incomplete, inaccurate, irrelevant, or missing data and then modify, replace, or delete that information as needed.

Sample Tasks:

  • Explain why some set of data has missing values and how to account for the missing data.

  • Detect outliers using graphical methods such as boxplots

  • Classify outliers as errors, missing values, or unusual values.

  • Clean data as necessary and eliminate variables deemed as irrelevant.

  • Use a package such as tidyr in R to clean raw data sets.

  • Enhance raw data as necessary by converting time zones, making currency conversions, calculating values or time units.

  • Address outliers in a set of data using mathematical techniques such as log transformations or interpolation, or by deductive correction or deterministic imputation for missing values.

[OhioDoHEducation21]