Python Basics

General information on using Python for Data Science and Machine Learning

Updated: 03 September 2023

Python Basics

Based on this Cognitive Class Course

Labs

Jupyter Notebooks with Examples on these can be found in the labs folder

The Labs are from this Cognitive Class Course and are under the MIT License

Types

Hello World

We can simply print out a string in Python as follows

1
print('Hello, World!')

Python Version

We can check our version as follows

1
import sys
2
print(sys.version)

The sys module is a built-in module that has many system specific parameters and functions

Comments

Comments can be done by using the #

1
# Python comments

Docstrings

Python also allows for use of docstrings which can appear immediately after a function, class definition, or at the top of a module, these are done as follows

1
def hello():
2
'''
3
This function will say hello
4
It also takes no input arguments
5
'''
6
return 'Hello'
7
hello()

Also note that Python uses ' and " to mean the same thing

Types of Objects

Python is object oriented, and dynamically typed. We can get the type of a value in python with the type function

1
type(12) # int
2
type(2.14) # float
3
type("Hello") # str
4
type(True) # bool
5
type(False) # bool

We can get information about a type using the sys object properties, for example

1
sys.float_info

Type Conversion

We can use the following to convert between types

1
float(2)
2
int(1.1)
3
int('1')
4
str(1)
5
str(1.1)
6
int(True) # 1
7
int(False) # 0
8
float(True) # 1.0
9
bool(1) # True

Expressions

Expressions in python can include integers, floats, and strings, depending on the operation

We can do the following

1
1 + 2 # addition
2
1 - 2 # subtraction
3
1 / 2 # division
4
1 // 2 # integer division

Integer division will round off to the nearest integer

It is also helpful to note that Python will obey BODMAS

Variables

Variables can simply be assigned without being defined first, and are dynamically types

1
x = 2
2
y = x / 2
3
4
x = 2 + 4
5
x = 'Hello'

In a notebook we can simply evaluate the value of a variable or expression by placing it as the last line of a cell

Strings

Defining Strings

Strings can be defined with either ' or ", and can be a combination of any characters

1
'Hello World'
2
'H3110 Wor!$'
3
"Hello World"

Indexing

Strings are simply an ordered sequence of characters, we can index these as any other array with [] as follows

1
name = 'John'
2
name[0] # J
3
name[3] # n

We can also index negatively as follows

1
name = 'John'
2
name[-1] # n
3
name[-4] # J

Length

We can get the length of a string with len()

1
len(name) # 4

Slicing

We can slice strings as follows

1
name = 'John Smith'
2
name[0:4] # John
3
name[5:7] # Sm

Or generally as

1
string[start:end]

Stride

We can also input the stride, which will select every nth value within a certain range

1
string[::stride]
2
string[start:stop:stride]

For example

1
name[::3] # Jnmh
2
name[0:4:2] # Jh

Concatenation

We can concatenate strings as follows

1
text = 'Hello'
2
text + text # HelloHello
3
text * 3 # HelloHelloHello

Escape Characters

At times we may need to escape some characters in a Python string, these are as follows

CharacterEscape
newline<NEW LINE>
\\
\’
\”
ASCII Bell\a
ASCII Backspace\b
ASCII FF\f
ASCII LF\n
ASCII CR\r
ASCII Tab\t
ASCII VT\v
Octal Character\ooo
Hex Character\xhh

We can also do multi line strings with the """ or '''

If we have a string that would otherwise need escaping, we can use a string literal as follows

1
text = r'\%\n\n\t'
2
text # '\%\n\n\t'

String Operations

We have a variety of string operations such as

1
text = 'Hello;
2
text.upper() # HELLO
3
text.lower() # hello
4
text.replace('Hel', 'Je') # Jello
5
text.find('l') # 2
6
text.find('ell') # 1
7
text.find('asnfoan') # -1

Tuples

Define

A tuple is a way for us to store data of different types, this can be done simply as follows

1
my_tuple = ('Hello', 3, 0.14)
2
type(my_tuple) # tuple

A key thing about tuples is that they are immutable. We can reassign the entire tuple, but not change its values

Indexing

We can index a tuple the same way as a string or list using positive or negative indexing

1
my_tuple[1] # 3
2
my_tuple[-2] # 3

Concatenation

We can also concatenate tuples

1
my_tuple += ('pies', 'are', 3.14)
2
my_tuple # ('Hello', 3, 0.14, 'pies', 'are', 3.14)

Slice and Stride

We can slice and stride as usual with

1
my_tuple[start:end]
2
my_tuple[::2]
3
my_tuple[0:4:2]

Sorting

We can sort a tuple with the sorted function

1
sorted(tuple)

The sorted function will return a list

Nesting

Since tuples can hold anything, they can also hold tuples

1
my_tuple = ('hello', 4)
2
my_tuple2 = (my_tuple, 'bye')

We can access elements of tuples with double indexing as follows

1
my_tuple2[0][1] # 4

Lists

Defining

A list is an easy way for us to store data of any form, such as numbers, strings, tuples, and lists

Lists are mutable and have many operations that enable us to work with them more easily

1
my_list = [1,2,3,'Hello']

Indexing

Lists can also be indexed using the usual method both negatively and positively

1
my_list[1] # 2
2
my_list[-1] # Hello

Operations

Slice and Stride

1
my_list[start:end] # slicing
2
my_list[::stride]
3
my_list[start:end:stride]

Extend

Extend will add each object to the end of the list

1
my_list = [1,2]
2
my_list.extend([item1, item2])
3
my_list # [1, 2, item1, item2]

Append

Append will add the input as a single object to the last value of the list

1
my_list = [1,2]
2
my_list.append([item1, item2])
3
my_list # [1, 2, [item1, item2]]

Modify an element

List elements can be modified by referencing the index

1
my_list = [1,2]
2
my_list[1] = 3
3
my_list # [1,3]

Delete an Element

1
my_list = [1,2,3]
2
del(my_list[1])
3
my_list # [1,3]

We can delete elements by index as well

String Splitting

We can split a string into a list as follows

1
my_list = 'hello'.split()
2
my_list # [h,e,l,l,o]
3
4
my_list = 'hello, world, !'.split(',')
5
my_list # ['hello', 'world', '!']

Cloning

Lists are stored by reference in Python, if we want to clone a list we can do it as follows

1
new_list = my_list[:]

Sets

A set is a unique collection of objets in Python, sets will automatically remove duplicate items

Defining a Set

1
my_set = {1, 2, 3, 1, 2}
2
my_set # {1, 2, 3}

Set Operations

Set from a List

We can create a set from a list with the set function

1
my_set = set(my_list)

Add Element

We can add elements to a set with

1
my_set.add("New Element")

If the element already exists nothing will happen

Remove Element

We can remove an element from a set with

1
my_set.remove("New Element")

Check if Element is in Set

We can check if an element is in a set by using in which will return a bool

1
"New Element" in my_set # False

Set Logic

When using sets we can compare them with one another

Intersection

We can find the intersection between sets with & or with the intersection function

1
set_1 & set_2
2
set_1.intersection(set_2)

Difference

We can fin d the difference in a specific set relative to another set with

1
set_1.difference(set_2)

Which will give us the elements that set_1 has that set_2 does not

Union

We can get the union of two sets with

1
set_1.union(set_2)

Superset

We can check if one set is a superset of another with

1
set_1.issuperset(set_2)

Subset

We can check if one set is a subset of another with

1
set_1.isSubset(set_2)

Dictionaries

Dictionaries are like lists, but store data by a key instead of an index

Keys can be strings, numbers, or any immutable object such as a tuple

Defining

We can define a dictionary as a set of key-value pairs

1
my_dictionary = {"key1": 1, "key2": "2", "key3": [3, 3, 3], "key4": (4, 4, 4), ('key5'): 5, (0, 1): 6, 92: 'hello'}

Accessing a Value

We can access a value by using its key, such as

1
my_dictionary['key1'] # 1
2
my_dictionary[(0,1)] # 6
3
my_dictionary[5] # 'hello'

Get All Keys

We can get all the keys in a dictionary as follows

1
my_dictionary.keys()

Append a Key

Key-value pairs can be added to a dictionary as follows

1
my_dictionary['New Key'] = new_value

Delete an Entry

We can delete an entry by key using

1
del('New Key)

Verify that Key is in Dictionary

We can use the in operator to check if a key exists in a dictionary

1
'My Key' in my_dictionary

Conditions and Branching

Comparison Operators

We have a few different comparison operators which will produce a boolean based on their condition

OperationOperatori = 1
equal==i == 1
not equal!=i != 0
greater than>i > 0
less than<i < 2
greater than or equal>=i >= 0 and i >= 1
less than or equal<=i <= 2 and i <= 1

Logical Operators

Python has the following logical operators

OperationOperatori = 1
andand i == 1 and i < 2
oror i == 1 or i == 2
notnotnot(i != 0)

String Comparison

When checking for equality Python will check if the strings are the same

1
'hello' != 'bye' # True

Comparing strings is based on the ASCII Code for the string, for example 'B' > 'A' because the ASCII Code for B is 102 and A is 101

When comparing strings like this the comparison will be done in order of the characters in the string

Branching

Branching allows us to run different statements depending on a condition

If

The if statement will only run the code that forms part of its block if the condition is true

1
i = 0
2
if i == 0:
3
print('Hello')

If-Else

An if-else can be done as follows

1
i = 0
2
if i == 1:
3
print('Hello')
4
else:
5
print('Bye')

Elif

If we want to have multiple if conditions, but only have the first one that is true be executed we can do

1
i = 0
2
if i == 1:
3
print('Hello')
4
elif i == 0:
5
print('Hello World')
6
elif i > 1:
7
print('Hello World!!')
8
else:
9
print('Bye')

Loops

For Loops

A for loop in Python iterates through a list and executes its internal code block

1
loop_vals = [1,6,2,9]
2
for i in loop_vals:
3
print(i)
4
#1 6 2 9

Range

If we want to iterate through the values without using a predefined list, we can use the range function to generate a list of values for us to to iterate through

The range function works as follows

1
ran = range([start,], stop, [,step])
2
ran # [start, start + step, start + 2*step, ... , stop -1]

The range function only requires the stop value, the other two are optional,the stop value is not inclusive

1
range(5) # [0,1,2,3,4]
2
range(5, 10) # [5,6,7,8,9]
3
range(5, 10, 2) # [5,7,9]

Using this we can iterate through the values of our array as follows

1
loop_vals = [1,6,2,9]
2
for i in range(len(loop_vals)):
3
print(loop_vals[i])

While Loops

While loops will continue until the stop condition is no longer true

1
i = 0
2
while (i < 10):
3
print(i)
4
i ++
5
# 0 1 3 4 5 6 7 8 9

Functions

Defining

Functions in Python are defined and called as follows

1
def hello():
2
print('Hello')
3
4
hello() # Hello

We can have arguments in our function

1
def my_print(arg1, arg2):
2
print(arg1, arg2)
3
4
my_print('Hello', 'World') # Hello World

Functions can also return values

1
def my_sum(val1, val1):
2
answer = val1 + val2
3
return answer
4
5
my_sum(1,2) # 3

A function can also have a variable number of arguments such as

1
def sum_all(*vals):
2
return sum(vals)
3
4
sum_all(1,2,3) # 6

The vals object will be taken in as a tuple

Function input arguments can also have default values as follows

1
def has_default(arg1 = 4):
2
print(arg1)
3
4
has_default() # 4
5
has_default(5) # 5

Or with multiple arguments

1
def has_defaults(arg1, arg2 = 4):
2
print(arg1, arg2)
3
4
has_defaults(5) # 5 4
5
has_defaults(5,6) # 5 6

Help

We can get help about a function by calling the help function

1
help(print)

Will give us help about the print function

Scope

Functions have access to variables that are globally defined, as well as their own local scope. Locally defined variables are not accessible from outside the function unless we declare it as global as follows

1
def make_global():
2
global global_var = 5
3
4
make_global()
5
global_var # 5

Note that the global_var will not be defined until our function is at least called once

Objects and Classes

Defining a Class

We can define a class Circle which has a constructor, a radius and a colour as well as a function to increase its radius and to plot the Circle

We make use of matplotlib to plot our circle here

1
import matplotlib.pyplot as plt
2
%matplotlib inline
3
4
class Circle(object):
5
6
def __init__(self, radius=3, color='blue'):
7
self.radius = radius
8
self.color = color
9
10
def add_radius(self, r)
11
self.radius += r
12
return(self.radius)
13
14
def draw_circle(self):
15
plt.gca().add_patch(plt.Circle((0, 0), radius=self.radius, fc=self.color))
16
plt.axis('scaled')
17
plt.show()

Instantiating an Object

We can create a new Circle object by using the classes constructor

1
red_circle = Circle(10, 'red')

Interacting with our Object

We can use the dir function to get a list of all the methods on an object, many of which are defined by Python already

1
dir(red_circle)

We can get our object’s property values by simply referring to them

1
red_circle.color # red
2
red_circle.radius # 10

We can also manually change the object’s properties with

1
red_circle.color = 'pink'

We can call our object’s functions the same way

1
red_circle.add_radius(10) # 20
2
red_circle.radius # 20

The red_circle can be plotted by calling the draw_circle function

Reading Files

Note that the preferred method for reading files is using with

Open

We can use the built-in open function to read a file which will provide us with a File object

1
example1 = '/data/test.txt'
2
file1 = open(example1,'r')

The 'r' sets open to read mode, for write mode we can use 'w', and 'a' for append mode

Properties

File objects have some properties such as

1
file1.name
2
file1.mode

Read

We can read the file contents to a string with the following

1
file_content = file1.read()

Close

Lastly we need to close our File object with

1
file1.close

We can verify that the file is closed with

1
file1.closed # True

With

A better way to read files is by using using the with statement which will automatically close the file, even if we encounter an exception

1
with open(example1) as file1:
2
file_content = file1.read()

We can also read the file in by pieces either based on characters or on lines

Read File by Characters

We can read the first four characters with

1
with open(example1,'r') as file1:
2
content = file1.read(4)

Note that this will still continue to parse the file, and not start over each time we call read(), so we can read the first seven characters is like so

1
with open(example1,'r') as file1:
2
content = file1.read(4)
3
content += file1.read(3)

Read File by Lines

Our File object looks a lot like a list with each line a new element in the list

We can read our file by lines as follows

1
with open(example1,'r') as file1:
2
content = file1.readline()

We can read each line of our file into a list with the readline function like so

1
content = []
2
with open(example1,'r') as file1:
3
for line in file1:
4
content.append(line)

Or with the readlines function like so

1
with open(example1, 'r') as file1:
2
content = file1.readlines()

Writing Files

We can also make use of open to write content to a file as follows

1
out_path = 'data/output.txt'
2
with open(out_path, 'w') as out_file:
3
out_file.write('content')

The write function works the same as the read function in that each time we call it, it will just write a single line to the file, if we want to write multiple lines to our file w need to do this as follows

1
content = ['Line 1 content', 'Line 2 content', 'Line 3 content']
2
with open(out_path, 'w') as out_file:
3
for line in content:
4
out_file.write(line)

Copy a File

We can copy data from one file to another by simultaneously reading and writing between the files

1
with open('readfile.txt','r') as readfile:
2
with open('newfile.txt','w') as writefile:
3
for line in readfile:
4
writefile.write(line)

Pandas

Pandas is a library that is useful for working with data as a DataFrame in Python

Importing Pandas

The Pandas library will need to be installed and then imported into our notebook as

1
import pandas as pd

Creating a DataFrame

We can create a new DataFrame in Pandas as follows

1
df = pd.DataFrame({'Name':['John','Jack','Smith','Jenny','Maria'],
2
'Age':[23,12,34,13,42],
3
'Height':[1.2,2.3,1.1,1.6,0.5]})

Read CSV as DataFrame

We can read a csv as a DataFrame with Pandas by doing the following

1
csv_path ='data.csv'
2
df = pd.read_csv(csv_path)

Read XLSX as DataFrame

We need to install an additional dependency to do this firstm and then read it with the pd.read_excel function

1
!pip install xlrd
2
xlsx_path = 'data.xlsx'
3
df = pd.read_excel(xlsx_path)

View DataFrame

We can view the first few lines of our DataFrame as follows

1
df.head()

Assume our data looks like the following

NameAgeHeight
0John231.2
1Jack122.3
2Smith341.1
3Jenny131.6
4Maria420.5

Working with DataFrame

Assigning Columns

We can read the data from a specific column as follows

1
ages = df[['age']]
Age
023
112
234
313
442

We can also assign multiple columns

1
age_vs_height = df[['Age', 'Height']]
AgeHeight
0231.2
1122.3
2341.1
3131.6
4420.5

Reading Cells

We can read a specific cell in one of two ways. The iloc fnction allows us to access a cell with the row and column index, and the loc function lets us do this with the row index and column name

1
df.iloc[1,2] # 2.3
2
df.loc[1, 'Height'] # 2.3

Slicing

We can also do slicing using loc and iloc as follows

1
df.iloc[1:3, 0:2]
NameAge
1Jack12
2Smith34
1
df.loc[0:2, 'Age':'Height']
AgeHeight
0231.2
1122.3
2341.1

Saving Data to CSV

Using Pandas, we can save our DataFrame to a CSV with

1
df.to_csv('my_dataframe.csv')

Arrays

The Numpy Library allows us to work with arrays the same as we would mathematically, in order to use Numpy we need to import it as follows

1
import numpy as np

Arrays are similar to lists but are fixed size, and each element is of the same type

1D Arrays

Defining an Array

We can simply define an array as follows

1
a = np.array([1,2,3]) # casting a list to array

Types

An array can only store data of a single type, we can find the type of the data in an array with

1
a.dtype

Manipulating Values

We can easily manipulate values in an array by changing them as we would in a list. The same can be done with splicing and striding operations

1
a = np.array([1,2,3]) # array([1,2,3])
2
a[1] = 5 # array([5,2,3])
3
b = c[1:3] # array([2,3])

We can also use a list to select a specific indexes and even assign values to those indexes

1
a = np.array([1,2,3]) # array([1,2,3])
2
select = [1,2]
3
b = a[select] # array([1,2])
4
a[select] = 0 # array([1,0,0])

Attributes

An array has various properties and functions such as

1
a = np.array([1,2,3])
2
a.size # size
3
a.ndim # number of dimensions
4
a.shape # shape
5
a.mean() # mean of values
6
a.max() # max value
7
a.min() # min value

Array Operations

We have a few different operations on arrays such as

1
u = np.array([1,0])
2
v = np.array([0,1])
3
u+v # vector addition
4
u*v # array multiplication
5
np.dot(u,v) # dot product
6
np.cross(u,v) # cross product
7
u.T # transpose array

Linspace

The linspace function can be used to generate an array with values over a specific interval

1
np.linspace(start, end, num=divisions)
2
np.linspace(-2,2,num=5) # array([-2., -1., 0., 1., 2.])
3
np.linspace(0,2*np.pi,num=10)
4
# array([0. , 0.6981317 , 1.3962634 , 2.0943951 , 2.7925268 ,
5
# 3.4906585 , 4.1887902 , 4.88692191, 5.58505361, 6.28318531])

Plotting Values

We can apply a function to these values by using array operations, such as those mentioned above as well as others like

1
x = np.linspace(0,2*np.pi, num=100)
2
y = np.sin(x) + np.cos(x)

2D Arrays

Defining a 2D Array

Two dimensional Arrays can be defined by a list that contains nested lists of the same size as follows

1
a = np.array([[11,12,13],[21,22,23],[31,32,33]])

We can similarly make use of the previously defined array operations

Accessing Values

Values in a 2D array can be indexed in either one of two ways

1
a[1,2] # 23
2
a[1][2] # 23

Slicing

We can perform slicing as follows

1
a[0][0:2] # array([11, 12])
2
a[0:2,2] # array([13, 23])

Mathematical Operations

We can perform the usual mathematical operations with 2D arrays as with 1D

Dancing Man

The following Script will make a dancing man if run in Jupyter > because why not

1
from IPython.display import display, clear_output
2
import time
3
4
val1 = '(•_•)\n<) )╯\n/ \\'
5
val2 = '\(•_•)\n( (>\n/ \\'
6
val3 = '(•_•)\n<) )>\n/ \\'
7
8
while True:
9
for pos in [val1, val2, val3]:
10
clear_output(wait=True)
11
print(pos)
12
time.sleep(0.6)