1.4.1. More Python#

This section continues Section 1.3.1 Introduction to Python, with emphasis on python tools for cleaning data.

Our first readings, from Computational and Inferential Thinking [ADW21], are repeated from Section 1.3.2. They remind us how to define a function in python and how to apply a function to each entry in a column.

Our second readings describe the python string type str and some of its methods. These methods are convenient for cleaning up various things that could be wrong with string entries in data.

  • 3.1.2. Strings from the python tutorial

  • 4.7 Text Sequence Type — str documentation

    • 4.7.1. String Methods In particular, look at

      • str.lstrip, str.rstrip, and str.strip, which can remove unwanted characters such as extra spaces, ‘$’, or ‘%’ from the start or end of a string

      • str.lower, str.upper, and str.title, which can change the case of letters in a string

      • str.replace, which can replace characters in a string

Reading Questions

  • What combination of string methods can convert ‘$1,000,000’ to ‘1000000’?

  • What combination of string methods can convert ‘ athens, OH’ to ‘Athens’?

Our next readings are two python functions that can convert a string that looks like a number into a number.

  • int converts a string into an integer, so int('123') returns 123.

  • float converts a string into a floating-point number, so float('123.4') returns 123.4

Reading Questions

  • What will int('123.4') return?

  • What will int(123.4) return?

  • What will float('2/3') return?

  • What will float(2/3) return?

Our last readings describe the python set object. Converting a column in a Table to a set (as in set(tbl.column('label')) is a convenient way to see all distinct entries in a column.

Reading Questions

  • 'frog' in {'dog','frog'}?

  • len(set(['a','b','c','a'])) == 4?

Further Resource

See the Further Resources in Section 1.3.1.