Introduction To Python Data Types

108 VIEWS

Every programming language is built upon fundamental constituents that provide the building blocks for constructing the more sophisticated programming-based tools. Learning how and when to use each is the first thing one should do when encountering a new programming language. In any language, there are often multiple ways of accomplishing the same goal, but that does not always mean each solution is equally efficient.

Understanding each data type allows you to take full advantage of the language’s design and program as efficiently and effectively as possible. The principle built-in data types used in Python include:

  1. Numerics
  2. Booleans
  3. Sequences
  4. Mappings
  5. Modules
  6. Classes
  7. Methods
  8. Functions
  9. Exceptions

There are a few others, but these are the most important and most frequently used.

This tutorial reviews the basics of how and when to use each, and (for those migrating from Python 2 to 3) will also point out some of the differences between Python 2 usage versus Python 3 usage.

Installing Python

To follow along with the exercises in this tutorial, you’ll need to have a recent version of Python installed. I’ll be using a free, pre-built distribution of Python 3.6 called ActivePython, which you can download here. Installation instructions can also be found here.

All set? Let’s go.

Numerics

Numeric types consist of integers, floating type numbers (or floats), and complex numbers. Each are equivalent to their mathematical counterpart:

  1. Integers are whole numbers that can be negative, zero, or positive
  2. Floating numbers are real numbers represented in decimal form with a predefined precision
  3. Complex numbers consist of a real and an imaginary component, both represented as floating numbers.

The constructors int(), float(), and complex() are used to produce numbers of each type in Python:

x = int(2)
y = float(2)
z = complex(2,2)

print(x + x)
print(x + y)
print(x + z)

Output:

4
4.0
(4+2j)

Alternatively, Python will automatically define a number as in integer if you do not include a decimal; as a float number if you do; and as a complex number if you use the form a+bj where j indicates the imaginary part. The output is the same:

x = 2
y = 2.0
z = 2 + 2j 

print(x + x)
print(x + y)
print(x + z)

Output:

4
4.0
(4+2j)

Choosing whether to use an integer, a float, or a complex number is pretty straightforward. If your application requires more than one significant digit, using integers won’t cut it. If your data is complex, it makes sense to use complex numbers. Addition is not the only operation that numeric types can undergo. The following table lists the operations that can be performed with/on numeric types:


Each data type can undergo certain operations. Many operations overlap between types, but some are unique. A full list of operations for each data type can be found in the documentation.

Booleans

Following numeric types, perhaps the most common data type encountered are boolean types. Boolean values are a special case of numeric types used to express True and False. In mathematical operations, they behave exactly like 1 and 0, but can also be used in the context of boolean operations and comparisons (i.e. greater than, less than, equal, not equal, etc.).

Sequences

A sequence describes a series of values. The values could be individual words, phrases, numbers, or even a series within a series. The four basic sequence types are:

  1. Lists
  2. Tuples
  3. Ranges
  4. Strings

The first three sequence types are able to hold any type of data values, while Strings are limited exclusively to text. Sequences can either be immutable or mutable. A mutable sequence can be changed after it has been created, while an immutable sequence cannot.

Let’s take a closer look at each sequence type:

Lists: Lists are mutable sequences that typically hold homogeneous data. They are ideal for use cases where the values are all of the same category (e.g. age of students in a class, number of pitches in an inning, or items to buy at the store, etc), or when values need to be added or removed recursively. They can be created in several ways, but are always denoted with square brackets: [ ]. For example, we can create a shopping list:

shopping_list = ['rice', 'eggs', 'bacon', 'cauliflower', 'apples']
print(shopping_list)
shopping_list.sort()
print(shopping_list)

Output:

[ 'rice', 'eggs', 'bacon', 'cauliflower', 'apples']
[ 'apples', 'bacon', 'cauliflower', 'eggs', 'rice']

List operations allow for quick sorting, accessing of individual values (known as slicing), or reassigning and deleting:

print(shopping_list[3])
shopping_list[3] = 'egg whites'
print(shopping_list)
shopping_list.remove('apples')
print(shopping_list)

Output:

eggs
[ 'rice', 'eggs', 'bacon', 'cauliflower', 'egg whites', 'rice']
[ 'bacon', 'cauliflower', 'egg whites', 'rice']

Technically, the shopping list is a list of sequences, as strings are sequences themselves (more on this later).

Tuples: Tuples are nearly identical in concept to lists, except that they are immutable. Because of this, they are ideally suited for use cases that need to preserve the sequence throughout operations. This is typically the case when a sequence consists of heterogeneous values. Rather than square brackets, they are denoted with parentheses: ( ). An example with heterogeneous values:

person = ('John', 'Smith', 23, 'Jane', 'Smith', 26)
print(person[0:3])

Output:

('John', 'Smith', 23)

We can slice the tuple to view individual values, but cannot delete or reassign:

person[3] =25 

Output:

TypeError: 'tuple' object does not support item assignment

You can, however, define a new tuple using the original tuple:

John = person[0:3]
print(John)

Output:

('John', 'Smith', 23)

Ranges: The range type is used to represent an immutable sequence of numbers. They are almost exclusively used in recursive operations like for loops to define the number of cycles or loops. The following example creates a range starting at 1, stopping at 11, and incrementing by steps of 1:

r = range(1, 11, 1)
print(list(r))

Output:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

We can use the range object within a simple for loop:

for i in r:
    print('The for loop is on loop: ' + str(i))

Output:

The for loop is on loop: 1
The for loop is on loop: 2
The for loop is on loop: 3
The for loop is on loop: 4
The for loop is on loop: 5
The for loop is on loop: 6
The for loop is on loop: 7
The for loop is on loop: 8
The for loop is on loop: 9
The for loop is on loop: 10

Strings: As I previously alluded to, strings are immutable sequences of Unicode points. Their use cases are restricted to those requiring text information, which, regardless of what you use Python for, are bound to come up. Whether for parsing headers when importing data, or plotting text information on a graph, understanding how to use strings is essential to programming in Python. Returning to the list example, we can slice the shopping list to obtain the fourth value, then slice again (because the string itself is a sequence) to obtain the first letter:

print(shopping_list[3])
print(shopping_list[3][0])

Output:

rice
r

Strings support all of the common sequence operations (like slicing), as well as a few additional “methods.” Methods are another built-in Python data type that define functions on specific objects, like strings (more on this later). For example, using a method called capitalize will capitalize the first letter in the string:

print(shopping_list[3])

shopping_list[3] = shopping_list[3].capitalize()
print(shopping_list)

Output:

rice
['bacon', 'cauliflower', 'egg whites', 'Rice']

It is worth noting that there is an additional sequence type tailored for processing binary data, but is less commonly used. For more information on these types, please see the documentation.

Mappings

There is only one mapping type in Python, known as a dictionary. Dictionaries are mutable sets of key-value pairs, where each key maps onto each value. Numerics or strings can be used for the keys, and values can be lists, dictionaries, or other mutable types. Dictionaries are ideally suited for storing sets of information for multiple objects, where each object has its own set of data. Consider an address book:

address_book = {'John': ['925 First Street', 'San Jose', 'California'] , 
                'Jane': ['501 Market Street', 'San Francisco', 'California'],
               'Mary': ['1911 Lincoln Avenue', 'Los Angeles', 'California']}
print(address_book['Jane'])

Output:

['501 Market Street', 'San Francisco', 'California']

This allows for quick access to the value (the address in this case) by simply knowing the key (name). Notice that I use a list for each value, and of course, the same operations are supported:

address_book['Jane'][1]

Output:

'San Francisco'

The values do not have to be the same type. I could have easily put in a single string, an integer, or anything else for Mary instead of using a list.

Dictionaries also support methods. One of the more useful ones is the keys() method, which prints out all the keys within the dictionary.

address_book.keys()

Output:

dict_keys(['John', 'Jane', 'Mary'])

As you can see, dictionaries are a bit different from the sequence types. They are meant to reflect the many real-life databases that have information tied to a single key (i.e. drivers license number, passport number, student ID, etc.) and should be used in such cases.

Other Built-in Types

In addition to the data types already discussed, there are several that are slightly more advanced, but essential to know when programming in Python. Now that you’re familiar with the numeric, boolean, sequence, and mapping types, it’s time to explore a few more:

  1. Sets: are an unordered collection of objects. A set in Python is equivalent to the mathematical definition of a set. It behaves similarly to a list or tuple, except it’s unordered and cannot contain duplicates. There are two types of sets:
    set() which is mutable, and
    frozenset() which is immutable
  2. Modules: are folders containing Python code. They are an easy way to organize, share, and download code. The code within a module can define functions, classes and variables, but can also include executable scripts. The code can be accessed through a simple import module statement.
  3. Classes: are user-defined prototypes for data objects, complete with their own attributes and variables. They offer a way of creating more sophisticated objects beyond the data types I have already mentioned. As your use case for Python grows in complexity, the built-in data types are increasingly incapable of keeping up with user-defined classes in terms of efficiency. Using classes provides an inherent structure to your code that is logical and readable to you and others. You can read more about them here.
  4. Functions: are self-contained blocks of code that perform a specific function, hence the name. Every function has an input. That input is transformed in some way, and is then typically spit back out. Functions can be user-defined, but Python has several built in ones. In fact, we have already been using the print() function to display some of the data types already discussed. There is really no limit to what a function can do.
  • Methods: As mentioned previously, a method is a function defined for a specific class. In the string example, the capitalize method is accessed with string.capitalize. Methods provide a shortcut for common operations performed on objects of the same class.
  • Differences between Python 2 & 3

    The introduction thus far describes data types in Python 3 (3.7.4 to be specific). For those of you that are just now transitioning from Python 2 to 3, you’ll need to keep in mind that the usage for some built-in data types has changed. For example:

    1. The Print Function: Python 3 requires the use of parentheses. For example, the Python 2 syntax of print list becomes print(list) in Python 3.
    2. Division Operator: In Python 2, using the division operator on two integer values produces an integer output (i.e., the result is rounded if the division produces a remainder). In Python 3, the same operation produces a float output. Before migrating, you may want to examine any division operators in your Python 2 code, and convert them to use floats instead of integers.
    3. String Type: Python 2 strings are ASCII characters by default, but the default behaviour in Python 3 is to treat all strings as Unicode.
    4. xrange()and range(): in Python 3, the xrange() function has been deprecated. However you can use range() function in Python 3 to accomplish the same goals as xrange().

    Dante is a former physicist who left the laboratory to join the ranks of scientists and engineers turning to techniques within data science to solve today’s problems. He is currently pursuing a Master's degree in Data Science for Complex Economic Systems. He lives in Turin, Italy.


    Discussion

    Click on a tab to select how you'd like to leave your comment

    Leave a Comment

    Your email address will not be published. Required fields are marked *

    Menu