Collecting Data
1.2. Collecting Data#
Learning Outcome
Students will be able to acquire raw data from a variety of sources.
Sample Tasks:
Distinguish between different sources of data such as relational database, automated data collection, and online surveys.
Discern between structured data sources, sources that are searchable such as relational databases, and unstructured data sources, sources that are not searchable such as social media and text messages.
Collect data from open or public data sources such as data.gov, IPUMS, Kaggle, Quandl, The World Bank, US Census Bureau, NASA, Amazon Web Services or Google Cloud Platform.
Convert a file from its present format into a format that is prepared for analysis.
Our first reading, from Learning Data Science [LGN23], gives some examples of different types of data available and the formats they might be in.
Reading Questions
What does CSV stand for?
Why doesn’t everyone use the same file format?
Rather than reading more text on different data formats, spend a few minutes (or a few hours) exploring what is available on the internet. Some places to look:
Reading Questions
What was the most common file format you found?
What was the strangest file format or type of data you found?
What was the most interesting data you found?