From GEST-S482 Digital Business
Jump to navigation Jump to search


Introduction (I): Downloading and Installing Python (Windows)

You can download the latest version of Python for Windows,macOS and Ubuntu for free at it is important to know that depending on the version installed, some libraries may not work.

On the download page, you will have the choice between 64 bit or 32 bit, depending on your computer but there's a high probability that it's 64 bits.

On windows select Start → Control Panel → System and check which type of system you have 64 bit or 32 bit.
Then download the correct version and double click on it.

  1. Select install for all users and also add python to the path (Path will add Python to the environment variables and this will allow you to run .py programs on your computer with python).
  2. You have the choice between installing now or customizing the installation (I recommend customization which will allow you to choose some of the tools you are going to install).

Introduction (II)

What is Python? No, it is not this weird animal slithering on the ground, neither a film (even though I would strongly suggest you to watch the Monthy Python movie). Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Then, why would a business manager learn it? Besides being pretty useful, it is a tool that can help to solve complex mathematical problems. It can be used to retrieve large amounts of information, while managing, storing, analyzing and casting it. Moreover, it is essential to understand it in order to interact with robots and machines.

The logic of programming is quite parallel and identical with the logic of writing and optimizing processes within a company. Furthermore, most companies even use a language called SQL (see next session) to treat their databases. On the one hand, Python enables to work with SQL (quite useful for doing our exercises), on the other, it allows to develop a programming mindset to deal with any language.

Python is one of the most widely used languages in programming today because it is composed of a very large community that allows this language to enjoy a huge library in various fields: data analysis, image processing, games, (less known), deep learning, artificial intelligence, web scraping, web development (back-end) especially thanks to frameworks such as Django or swift. Python is much better known for its data science libraries, however, it would be a mistake to limit it to just that.

Let's explain what Python is!


Python is a programming language where variables are one of the most basic elements. A variable contains a value and in order to associate a value to a certain variable, the equality sign is used (e.g.: my_age = 21). In python when we define variables within the body of a function they are only accessible within the function and therefore they are called local variables. Local variables are variables in which information will only be stored in a particular space, for example a function, and therefore any general change to your script for that variable will have no effect on the variable in the function. While the variables called global are variables that can change according to the assignment of a new value in the script.

example :

Explication local globale variable.png

I create a local variable function where I give values to a and b and then I assign other values to a and b and we notice that when I call the function the values within the function have not changed, they are so-called local variables.

Using variables has 3 main advantages:

  • Avoid having to keep information in your head as you program, you just simply need to create a variable with the information required in it
  • Deal with data that is not known at the moment you program. Think of a program that would need the age of the user to run and display something on the screen, you don't know the age of the user yet. In this case, by some manipulation, you can create a variable that will store the future information required
  • Making your code more readable

Data Types

Each variable has a type depending on the value it is associated to. The type is important in python when we use functions. As a matter of fact, functions generally work with specific types. For instance, it only makes sense from a mathematical point of view to divide 2 numbers, you cannot divide a string of characters by a number.

Throughout this class, we will mainly encounter four data types, called primitive types:

  1. The integer number, 'int', is the number without decimal point.
  2. The float number, 'float', is the number that contains decimal point.
  3. The string, 'str', represents a sequence of characters and is surrounded by either single quotation marks, or double quotation marks.
  4. The boolean, 'bool', is a type that can take up only 2 different values True and False. This data type is particularly useful in conditional and comparison expressions.

Most of the time, error messages are useful to understand where the error comes from. TypeError is the general description of an error when you are trying to run a function with a combination of variables of the wrong type. Unsupported operand type(s) for / : ‘str’ and ‘int’ means that this function / (divide) cannot be used with a string of characters as the first value and an integer number as the second one.

It is possible to convert this type of value with the casting process. To do so, we use constructor functions:

  • int() to transform the value into an integer number,
  • float() to transform the value into a float number,
  • str() to transform the value into a string.

This does not change the type of the initial variable. It only creates a copy of the transformed value.

However, casting only works for certain types of conversions.

The type function permits us to find the type of the variable.

For instance:



Python will send us back: float

What does add two strings do?

Concatenation or string concatenation is the operation of joining two strings together. I can then create a new variable containing the addition of the 2 other variables and the global value is the sum of the 2 first variables. I can always retrieve a specific part of the first variables in the global variable by using the list options.

Data Structures


Dictionaries are structures that have values related to specific keys. To create a dictionary, we need squiggly braces '{}' and pairs of key and value. Each pair is separated by comma ','. The key (usually strings) and the value are separated by a colon ':'. Each of the keys and values is surrounded by double quotation marks.

Dico = {"key1" : "value1", "key2" : "value2"}

There are two main operations that dictionaries enable to do:

a) To retrieve the value of key1, it is simply needed to write the name of the dictionary followed the name of the key wanted in brackets '[]'. So, for the example above, the code would be:


And Python would answer:


N.B. 1: Sometimes, in some programming interfaces, the print() function will be necessary to get the value inside the key --> print(Dico["key1"])

N.B. 2: The KeyError states that no value was stored under that key and that it does not exist in the dictionary

b) To change the value of a key or add a new pair of key and value, the process is the same as in a) but now the equal sign needs to be used before the new value. So, for the example above, the code would be:

Dico["key1"] = "new_value"

c) To have access to all the keys in a dictionary, the function keys() needs to be used. So, for the example above, the code would be:


And Python would answer:

dict_keys(['key1', 'key2'])

There is an important thing about dictionaries to mention; a dictionary is unordered. It is not because you declared a key before another that it is in any way before that other one. It can create issues when writing more complicated codes.


Lists are ordered and changeable collections of values, data structures or variables that allow duplicate members. In a list, the keys are always integers in a range (it means that if an integer is a valid index in this list, every integer between 0 and this integer are valid indices of the list, too).

We create lists by putting elements separated by commas "," between 2 special bracket squares "[" and "]".

List = ["item0", "item1", "item2", "item3", "item4"]

There are several operations that lists enable to do:

    • Access items
      a) You can access a list item with its index number. You need to write the name of the list followed by the index in brackets'[]' to obtain the value available at that index. The index is an integer.
      For the list above, the code would be:
      And Python would answer:
      b) You can access last items of a list by negative indexing. [-1] refers to the last item of the list, [-2] to the second last items,….
      For the list above, the code would be:
      And Python would answer:
      c) You can access several consecutive items in a list by using two indexes separated by a colon. You will have a sublist from the list.
      For the list above, the code would be:
      And Python would answer:
      ['item1', 'item2']
      N.B: If you try to access the list at an index that does not exist, the IndexError states that the list index requested is out of range. It is important not to forget that a list index starts at 0 and not 1.
    • Change item value
      To change the value of a list item, you have to refer to the index number.
      List[Index of the item to be modified] = New value for the item
    • Add items
      It is possible to add elements to your list by using the append() method. The append() command adds one value at the end of the list. We can only add one value using the append command.
      Python will send us back : ["digital","firm","is","cool","not"]

Dictionaries and lists start counting at 0.

A useful tool is the function “len” that returns the length of a list. It is also possible to handle strings as lists.

txt = "welcome to the jungle"

x = txt.split()



Tuples are similar to lists, however, we cannot change the elements of a tuple once it is assigned (whereas we can change the elements of a list). A tuple is a collection of objects which are ordered and immutable. For this reason, if you want to add an element in your tuple, you have first to change your tuple in list.

An empty tuple is written with two brackets and nothing inside.



To obtain a non-empty tuple, we have to add commas between each element.




A module is a file that contains and defines a set of functions, classes and/or variables (all those are referred to as attributes). It can be included in an application you are creating. Also, it can be imported into other modules or into the main module. So, we can consider a module as a code library. In fact, using a module also allows you to organize your code logically, this refers to the concept of modular programming.

When we import modules we’re able to call functions that are not built into Python. Some modules are installed as part of Python, and some we will install through pip. So, one main advantage of modules is that you don't need to reinvent the wheel every time you are creating new code, i.e. you will simply import in your code the module containing the function you want to use, and then call the function.

For instance, let's say you try to find the square root of 25, simply import the numpy module and then call the sqrt() function in order to easily get your result.

 import numpy as np



Create a module

To create a module just save the code you want in a file with the file extension .py:

e.g. save this code in a file named

def greeting(name):
  print("Hello, " + name)

Use a module

  • import statement

Once the module has been created, you can use the functions defined in it by using the import statement:

e.g. Import the module named first_module, and call the greeting function:

import first_module


Hello, Simon

Note: When using a function from a module, use the syntax: module_name.function_name.

So, the import statement allows you to import one or more modules into your Python program which enables you to make use of the definitions built in those modules.

  • from... import statement

To refer to items from a module within your program, you can use the from … import statement. When importing modules in such a way, you can call the functions by name rather than through dot notation .function_name.

By using this statement, you can specify which functions to reference directly.

from first_module import greeting


Hello, Simon

In the example above, we first call the from keyword, then first_module for the module. After that, we use the import keyword and call the specific function greeting, we would like to use.

Sometimes, you may see the import statement take in references to everything defined within the module by using an asterisk *.

from first_module import *

Rename a module

You can modify the names of modules and their functions within Python by using the as keyword.

What's the purpose of doing that? You may want to abbreviate a longer name that you are using a lot so that it is easier and faster to use. You may also want to change a name because you have already used the same name for something else in your program or another module you have already imported also uses that name.


import math as m




Within the program, we now refer to the number pi as m.pi rather than math.pi.

Examples of modules

Here is a list of well-known and often used modules within Python and their common alias (but you can re-name the module as you wish).

  • numpy as np
    -> multi-dimensional arrays and matrices and mathematical functions
  • pandas as pd
    -> data analysis and manipulation tool
  • matplotlib.pyplot as plt
    -> create 2D graphs and plots (in fact, pyplot is an object from the module matplotlib).

Install module using pip3

Pip3 (Package Installer for Python 3) allows you to install modules from the Python Package Index and other indexes. It handles the management of dependencies for you and acts as version manager for community-managed modules. It becomes handy when you have to install huge modules like Numpy or Pandas, and use it on several projects.
Here is some useful command you can run in a terminal:

  • pip3 --version
    -> Display the current version of pip3
  • pip3 install packages1 packages2 packages3 ...
    -> Download and install packages on your machine
  • pip3 uninstall packages1 packages2 packages3 ...
    -> Uninstall packages on your machine
  • pip3 freeze
    -> Display packages + version installed localy
  • pip3 search package_name
    -> Search for modules in the Python Package Indexes
  • pip3 help
    -> Display useful commands

Control Flow

Conditional Logic

Python makes writing conditional logic very close to writing English statements. An "if statement" is written with the if keyword. Then the test we want to execute, followed by a colon " : ". In the next line, an indent is necessary to write the result if the condition is fulfilled. Also, the first non-indented line indicates the end of the conditional block.

elif means that if the previous statement is not true, try the other condition. (else... if)

else means that for everything that has not been caught in the previous conditions, run this statement.

N.B.: by using if, elif and else, only one of the clauses will match, the first one to appear to the reading head of the computer. When a clause matches, it will automatically go through all the remaining conditions without treating them.

Usually, mathematics are used in conditional logic such as a > b, a >= b, a == b, a != b (a different to b). Every conditional keyword is followed by a colon and what enters the condition will have to be indented in the next line(s).

if a > b :
   print (a)

Other words such as in or not in can be used in conditions. For example, to search if a key is present in a dictionary.

dico = {"key1" : "value1", "key2" : "value2"}
if 'key2' in dico:
   print(dico["key2"])    #this will return the value2

It is also possible to combine conditions with and or with or.

if a > b and a != 0:


Loops are useful to lighten your code, they are used with lists. Instead of writing multiple times the same thing, you can use a loop. To create a loop we use the keywords 'for a in b' followed by a colon and what is in the loops needs to be indented in the next line(s).

You can escape a loop before it ends with break.

notes = [0, 5, 15, 12, 20, 8 ,11]
for i in notes:
   print(i)        #it will print 0, 5, 15, 12 but
   if i == 20:
       break       #it won't print 8 and 11

To create a loop we can also use the keyword while followed by a condition and a colon. What we want to do while the condition is not fulfilled yet needs to be indented in the next line(s).
Be careful not to create an infinite loop! (To stop an infinite loop, press CTRL + C)

count = 0
lenght = 10
while count < lenght:
   count+=1       #this line add one to count (if you forget this line it will print an infinite number of 0)"


Loops can often become long and heavy in a code. Therefore, Python has an easier way to solve loops issues. It is possible to reduce the number of lines in our code by using what is called a "comprehension". Here is an example found in our exercises, normally we should code our loop this way:

binary_coding = []
for value in pixels_in_a_row:

Nevertheless, comprehensions allow people to write the whole loop in only one line.

binary_coding = [value%2 for value in pixels_in_a_row]

Moreover, it is also possible to add conditions in comprehensions. You just have to write the condition wanted in the existing brackets.

number_list = [ x for x in range(20) if x % 2 == 0]

By the way, this exercise requires understanding how to translate binary code to a pixel value and this is explained in this section on numerisation.

Python Exceptions

A Python program, or any other type of programming language, terminates as soon as it encounters any type of error. An error can be an Exception or a SyntaxError.

There exist a multitude of Exception:

              ZeroDivisionError: division by zero
              NameError: name 'spam' is not defined
              TypeError: Can't convert 'int' object to str implicitly

for any others Exception, please refer to this page.

In order to manage those errors, we can add in our code several functions print("until now, my code works") to find which line gives a problem. Furthermore, advanced programmers are used to writing a try and catch function.

        while True:
           x = int(input("Please enter a number: "))
        except ValueError:
           print("Oops!  That was no valid number.  Try again...")

The code below the clause try is executed until an exception occurs. When the exception occurs, the code below the clause except is executed.

SyntaxErrors occur when the python compiler detects an incorrect statement.

   print("hello world"))
   #SyntaxError: invalid syntax

In this example, the problem comes from the additional parenthesis.

Built-in Functions

In Python, there are several functions that already exist. We can create a list of the most important ones:


The print function prints the specified message into the screen. The message can be a string or any other object, the object will be converted into a string before written to the screen.

Illustration print.png
txt = "hello, my name is Nicole"
Hello, my name is Nicole


The type() method returns the class type of the argument(object) passed as parameter. It can be very useful for debugging purposes.

numbers_list = [1, 2]
<class 'list'>


The len() function returns the number of items in an object. Moreover, when the object is a string, the len() function returns the number of characters in the string.

ListNumber = (1, 2, 3)


The append() method adds an item to the end of the list.

Courses = ['Digital Firm', 'Accounting', 'Banking']
['Digital Firm', 'Accounting', 'Banking', 'Marketing']


The split() method splits a string into a list.

txt = "hello, my name is Peter, I am 26 years old"
x = txt.split(", ")
['hello', 'my name is Peter', 'I am 26 years old']


The int() function converts the specified value into an integer number. The int() function returns an integer object constructed from a number or string x, or return 0 if no arguments are given.

Nombre = '2' 
result_1 = int(Nombre) + 4 


The range() function is used to generate a sequence of numbers.

for i in range(10): 
   print(i, end =" ") 
0 1 2 3 4 5 6 7 8 9

Also if you pass two arguments, it will be the list of integer between the first argument (included) and the second (excluded).

for i in range(3,10): 
3 4 5 6 7 8 9


Lambdas are one-line functions, they are small "anonymous" functions. We use that kind of function when we require a nameless function for a short period of time. This is commonly used when you want to pass a function as an argument to higher-order functions (i.e. functions that take other functions as their arguments).

multiply= lambda x, y: x*y 
print(multiply(3, 5))


Map applies a function to all the items in an input_list, that must be defined before using map.

Generally to write a python code for squaring numbers we would as such :

numbers = [0, 2, 3, 4]
squared = []
for i in items:
Output : [0, 4, 9, 16]

This can also be done with map and lambda :

numbers = [0, 2, 3, 4]
squared_numbers=list(map(lambda x : x**2, numbers))
Output : [0, 4, 9, 16]


The lower() method is used to transform a string so that all characters are lowercase.

print (a.lower())
Output : digital


The function str cast an object into a string.



Mathematical operators

Operator Name
+ Addition
- Subtraction
* Mulitplication
/ Division
% Modulus
** Exponentiation
// Floor division
== Equal
!= Not Equal
> Greater than
< Less than
>= Greater or equal to
<= Less than or equal to

Mathematical operations

Addition operator

Symbol : +. This operator adds two operands.

a = 2 + 3
a = a + 5 #or a+=5

Subtraction operator

Symbol : -. This operator subtracts two operands.

a = 10 - 3
a = a - 2 #or a -= 2

Multiplication operator

Symbol : *. This operator multiplies two operands.

a = 2 * 5
a = a * 2  #or a *= 2

Division operator

Symbol : /. This operator divides the first operand by the second.

a = 10 / 2
a = a / 2 #or a /= 2

Modulus operator

Symbol : %. It is used on integers and gives the remainder of the division of one number by another. The result is therefore 0 or 1. It can be useful to transform a sequence of even and odd numbers into a binary sequence.

Examples :

remainder = 8%2
remainder = 5%2

Exponentiation operator

Symbol : **. This operator returns the first operand to the power of the second operand.

a = 2 ** 3
a = a ** 2 #or a**= 2

Floor division operator

Symbol : //. This operator is similar to a normal division expect that it returns the largest possible integer.

a = 25 // 3
a = a // 3 #or a//= 3

Check your understanding


Question 1: How many times will the word "No" be printed on the screen after this loop has executed?

  for a_numb in range(10):
    if a_numb == 7:
      print("No")Immersive Reader
  • 10
  • 0
  • 1
  • 9

Question 2: Assume you have the following data structure:

   zlong = { "yop" : [ "inni", "meiney", "miney", "moh"],
                  "zerp" : [1, 2, 3, 4]

How do you access the value 3?

Question 3: In the following code, at which line is the error located? (pass is a command that does nothing, but is valid)

  1.  : a = 7
  2.  : if a = 4:
  3.  : print("Yipee Ki Yay.")
  4.  : else:
  5.  : pass

Question 4: What is the correct way to access the second element of the list "my_list"

  • my_list[1]
  • my_list["2"]
  • my_list["1"]
  • my_list[2]

Question 5: What is the correct way to assign the value "giraffe" to a variable called "animal_with_long_neck" in Python?

   giraffe = "animal_with_long_neck"
   animal_with_long_neck = giraffe
   "giraffe" = animal_with_long_neck
   giraffe = animal_with_long_neck
   animal_with_long_neck = "giraffe"
   "animal_with_long_neck" = giraffe


  • Question 1: 9
  • Question 2: zlong["zerp"][2], zlong["zerp"][-2]
  • Question 3: line 2
  • Question 4: my_list[1]
  • Question 5: animal_with_long_neck = "giraffe"

Introduction to Numpy


NumPy, short for Numerical Python, is a library providing functions for scientific computing. NumPy's core strengths are multidimensional arrays, and the functions to work with these arrays. Numpy is an open-source library available on GitHub and a large community of developers is still working on it.

How to import NumPy

Any time you want to use a package or library in your code, you first need to make it accessible. In order to start using NumPy and all of the functions available in NumPy, you’ll need to import it. This can easily be done with this single line of code (we shorten numpy to np in order to gain time but it's also a convention, every programmer does it!)

import numpy as np

What's an array?

Numpy array is a powerful N-dimensional object which is in the form of **rows** and **columns.** It's also very important for the arrays that all the datas within are of the same type. In a python list, you don't care if your data are different. With numpy arrays it's not the case, datatype must all be the same. If they are not the same you won't be able to perform the numpy functions on the numpy arrays.

The first thing to do is to understand what a Multi-dimensional array is. As you can see in the picture, a Multi-dimensional array is like a matrix in mathematics. The array in this picture has got 2 dimensions because it has rows and columns.


Let's create our first single-dimensional array from nested python lists. We first import the library. And then we use the np.array() function to transform this python list into a single-dimensional array.

m = np.array([1,2,3]) # array is the numpy function to create the array

Output – [1 2 3]

For a multi-dimensional array it's pretty similar.

m = np.array([(1,2,3),(4,5,6)])

Output – [[ 1 2 3] [4 5 6]]

Play with arrays


The ndim method return the dimension of the array you are working with. It's also good to know if it's a single dimensional array or a multi-dimensional array.

m = np.array([(1,2,3),(4,5,6)])

Output - 2


The dtype method, return the data type of the data that is stored in the array.

m = np.array([(1,2,3)])

Output - int32 In fact, the data inside are integers (numbers without decimals and the 32 is saying that they are coded on 32 bytes).

Size & Shape

The size method of an array return the number of components in that array. The shape method return the shape of the array in this format (rows, column).

m = np.array([(1,2,3,4,5,6)])

Output - 6 (for the size) Output - (1,6) (for the shape)

Reshaping a NumPy array

Reshape is when you change the number of rows and columns of your array, it give you a new vision of the numpa array. Let's take the example array below, he has 3 columns and 2 rows but we want to reshape it into an array of 2 columns and 3 rows. NymPy2.png

m = np.array([(8,9,10),(11,12,13)]) #2 rows, 3 columns 
m = m.reshape(3,2) #3 rows and 2 columns


Slicing is basically extracting a particular set of elements from an array. Let's put this into practice with the below example where we want to slice 2 specific values (9,11). The syntax of the slicing method is the following : [rows, columns]. Be careful the closing index is not included.


m = np.array([(8,9),(10,11),(12,13)])
print(m[0:2,1]) # we take rows index 0&1 but the index 2 is not included and we take columns index 1

Output - [9 11]

Min, Max, Sum, Squared root & Standard deviation

This is a sample of some basics functions that u can apply on NumPy arrays. It's pretty basic but it'll allow us to dive into more complex operations.

m = np.array([1,2,3])
(m.min()) # output - 1
(m.max()) # output - 3
(m.sum()) # output - 6
(np.sqrt(m)) #output - [1, x, y]
(np.std(m)) #output - [1, x, y]

The concepts of axis in NumPy arrays

NumPy4.png As you can see on the picture, we've got a 2 dimension array with 3 rows and 2 columns. The concept of axis is to acces either all the columns or all the rows. The columns are under axis 0 and the rows under axis 1. For example if you want to calculate the sum of all the rows to in order to compute a total you can use this code snippet:

m = np.array([(8,9),(10,11), (12,13])
print(m.sum(axis=1)) # Output - [30 33]

Perform operations between arrays

You can perform operations between arrays such as addition, subtraction, multiplication, division.

x= np.array([(1,2,3),(3,4,5)])
y= np.array([(1,2,3),(3,4,5)])
print(x+y) # Output - [[2 4 6] [6 8 10]]
print(x-y) # Output - [[0 0 0] [0 0 0]]
print(x*y) # Output - [[1 4 9] [9 16 25]]
print(x/y) # Output - [[1 1 1] [1 1 1]]

Next, let's imagine you want to put 2 arrays together in 1, it's possible with vertical stacking or horizontal stacking. The vertical stacking is creating new rows in order to store the new data, the horizontal stacking is creating new columns. Be careful the vstack() and hstack() functions are changing the shape of your array.

x= np.array([(1,2,3),(3,4,5)])
y= np.array([(1,2,3),(3,4,5)])
print(np.vstack((x,y))) # Output - [[1 2 3] [3 4 5] [1 2 3] [3 4 5]]
print(np.hstack((x,y))) # Output - [[1 2 3 1 2 3] [3 4 5 3 4 5]]


There are times when you might want to carry out an operation between an array and a single number (also called an operation between a vector and a scalar) or between arrays of two different sizes. For example, your array (we’ll call it “data”) might contain information about distance in miles but you want to convert the information to kilometers. You can perform this operation with:

data = np.array([1.0, 2.0])
data * 1.6 # Output - [1.6 3.2]

Speed up your code with numpy "vectorized" function

In this section we'll discover the fact, that python loops are quite slow. In fields like data science and machine learning, you always need to speed the calculations because the database you've to deal with is very big.

What's vectorization ?

Remember that arrays can only contain data of a single datatype, they are homogeneous. For example, an array full of numbers can contain integers (coded on 8 bytes) or float (coded on 32 bytes) but no a mix of integers and floats. That's a huge difference with python lists that can contain integers, floats, strings and even Boolean in the same list. The strict rule when it comes to datatypes in arrays comes with a huge benefit, because the datatypes are homogeneous NumPy is able to delegate the tasks of mathematical operations to an optimized C code. This process is called vectorization. The difference in speed between the optimized C code and python lists is tremendous, because the python list before each mathematical operations needs to check the datatype of every one of the objects contained in the list.

Vectorization: In the context of high-level languages like Python, Matlab, and R, the term vectorization describes the use of optimized, pre-compiled code written in a low-level language (e.g. C) to perform mathematical operations over a sequence of data.

Difference in terms of speed - Simple example

Consider, for instance, the task of summing the integers 0-9,999 stored in an array. Calling the sum functions from NumPy is in fact running an optimized and compiled C code on our array. In order to compare the performance of both the numpy vectorized function and the classic loop, we add %%time (at the beginning of our Jupyter Nootebook) in order to get the time required to run the code cell.

# np.arange() generate an array with all the integers between 0 and the argument of arrange minus 1.
total np.sum(np.arange(10000)) 
#Time required to run the cell: 308 µs

Let's do the same with a python classic loop:

total = 0
for i in np.arange(10000):
    total = i + total
	#Time required to run the cell: 9.09 ms

We can see that the vectorized function is more than 1000 times faster. It's pretty amazing and the difference is even bigger when the database is bigger.

NumPy Cheat Sheet

A cheat sheet including all the most used numpy objects, functions, methods. It's something you should always get next to you while coding.




Introduction to Pandas

According to the creators of the library, 'pandas is a Python package providing, fast, flexible and expressive data structures design to easily work with "relational" or "labeled" data.' It's the high-level building block in order to make real world data analysis in Python.

What's the kind of data pandas is handling ?

import pandas as pd

Pandas is most of the time handling data under the form of a DataFrame. DataFrame.png

A DataFrame is a 2-dimension data structure that can store data of different types in columns. It's similar to a spreadsheet, an SQL table or a DataFrame in R.

Create a Datframe

CS_members = pd.DataFrame({
      "Name": ["Théodore Verhaegen", "Ernest solvay", "Julien Beats"],
      "Level of study": ["BA3", "MA2", 58],
      "email" : ["", "", ""]})

Output for the DataFrame creation.png

  • To create the Pandas DataFrame, we used the dictionaries keys as column headers and all the values in each list as columns of the DataFrame.
  • A DataFrame is a 2-dimension data structure that can store data of different types in columns. It's similar to a spreadsheet, an SQL table or a DataFrame in R.

Each column is called a Serie, in our example we then have 3 series "Name", "Level of Study" and "email". With series we are able to only work on 1 columns of the DataFrame.

Selecting a column

To select the column, put the column header between square brackets []. It's very similar on how to select data in python dictionaries using the key. When selecting a single column of the DataFrame, the result is a panda series.

CS_members["Level of study"]

Output - Result2.png

Here it's interesting to see that in pandas, there's no string datatype. Strings are referred as pandas objects.


Now, that we are able to only select some series in a DataFrame. Let's do the same with the rows by filtering them. In order to do that, we'll load a much bigger database with all the passengers of the titaninc. The goal of the code is to filter the rows by only returning the passengers who are older than 35 years old.

import pandas as pd 
titanic = pd.read_csv("titanic.csv") #import will be cover in point 2.
above_35 = titanic[titanic["Age"] > 35] # Creation of a new DataFrame only containing passengers older than 35 years

Filter DataFrame.png

Import & Export Data

Import export.png [1]

Import Data

Most of the time, when you are working with Pandas. The first thing you want to do is to retrieve data on the World Wide Web. In order to do that, pandas read_* function is perfect. For example, if you find the database you need but it's written in CSV. The pd.read_csv() will format the CSV file into a pandas DataFrame in a fraction of a second. Another example, if you need to work on data from an API. The format of the API request is in JSON. The pd.read_json() function return a pandas DataFrame containing all the json elements.

titanic = pd.read_csv("titanic.csv")
A good practice when you are importing database from the web, is to always have a look at the data structure. By default when displaying a DataFrame, the first and last 5 rows will be shown. If you only want to see the first 5 rows of the df you can use the head() method. For the 5 last rows it's the tail() method. The ultimate way to get all the information about the Datframe use the info() method.

Info results.png Explanations about the output:

  • It's indeed a DataFrame
  • There are 3 entries, meaning 3 rows. Each row has an index from 0 to 2
  • There are 3 columns all containing object type of data (string).
  • It's also returning the approximate amount of RAM (memory) used to hold the DataFrame.

Export Data

Let's continue with our example of the Cercle $olvay, imagine that your colleagues aren't familiar with pandas. They then ask you to export the pd DataFrame in an excel spreadsheet.

CS_members.to_excel('CS.xlsx', sheet_name='members', index=False)

Whereas read_* function is used to import data to a DataFrame, the to_* methods are used to stored data on your computer. The to_excel method stores the data in an excel file named CS.xlsx, by adding the attribute sheet_name the default Sheet1 in Excel is renamed members and by setting the index to false. The column containing the index are not saved in the spreadsheet.

Examination of Python

You may be asked to write a couple of lines of code to do something simple (again, something that can be implemented instantly in a few lines of code), or to predict the output of a simple algorithm, or to spot and fix errors in the code.

Where to go?

Main page Exercises - Next Session ERD

  1. Pandas Documentaion, Pandas Documentation