Cleaning data
1.4. Cleaning data#
Learning Outcome
Students will be able to identify incorrect, incomplete, inaccurate, irrelevant, or missing data and then modify, replace, or delete that information as needed.
Sample Tasks:
Explain why some set of data has missing values and how to account for the missing data.
Detect outliers using graphical methods such as boxplots
Classify outliers as errors, missing values, or unusual values.
Clean data as necessary and eliminate variables deemed as irrelevant.
Use a package such as tidyr in R to clean raw data sets.
Enhance raw data as necessary by converting time zones, making currency conversions, calculating values or time units.
Address outliers in a set of data using mathematical techniques such as log transformations or interpolation, or by deductive correction or deterministic imputation for missing values.