Sql

From GEST-S482 Digital Business
Jump to navigation Jump to search

Introduction

SQL stands for Structured Query Language.

SQL is a query language, it is a:

  • Data definition language: Management of tables and relations;
  • Data management language: Access/Read, creation, modification, suppression;
  • Data security language: definition of user privileges.

SQL is not a programming language, unlike Python. These two however are complementary, and cannot replace one another. SQL is the standard language to deal with Relational Databases. A relational database defines relationships in the form of tables, and an important concept that needs to be mastered when it comes to understanding and representing relational databases is the Entity-Relationship Diagram (ERD).

SQL lets you access and manipulates databases. and it is of the utmost importance since needed to access any database. Nowadays, it is widely used by companies operating in a vast array of sectors and industries. We live in a society in which customers' data are highly important, and companies are ready to pay a lot of money to collect new data. Indeed, a non-negligible part of SQL is the understanding of customer behaviors. By analyzing the databases, it is possible to retrieve specific information about each customer, which can be used to give them a better personal experience. Also, SQL will be used for all dynamic websites, which is database-driven, such as online banking, social media sites, railway reservations, e-commerce, etc. Therefore, mastering SQL is a huge asset for a business engineer.

Relational Databases

SQL is mainly used to retrieve information stored in some structured fashion, generally called relational databases. In those databases, data is stored in tables that are linked to one another. Each record in those tables has a field, or a list of fields, that identifies it uniquely.

Entities are the "nouns" or objects. Basically it should answer the questions: who, what, where and when. They are defined by a certain number of attributes.

Attributes are the properties of the entities, the characteristics of the objects represented by the entities. The attributes that enable to differentiate two rows of data in a table are called "primary keys".

Primary keys are attributes/fields that enable differentiating two records, even if all the other fields are rigorously identical. The primary key can be an attribute, a combination of attributes or a surrogate key (a unique, generated number). The presence of a primary key enables the creation of links between tables.

Foreign keys are attributes of a relational table that is the primary key of another relational table.

Different types of relations exist between tables. Two tables can, for example, be linked through a One-to-Many relationship (1:M), a One-to-One relationship (1:1), or a Many-to-Many relationship (M:M). To deal with a Many-to-Many relationship, a junction table containing the primary key of both tables involved in the relationship must be computed.

Internal Redundancies [1]

Sometimes, some tables show what is called an internal reduction phenomenon. This is when information is repeated unnecessarily.

Many problems can then be present:

  • Waste of space in the tables due to the repetition of elements;
  • If you change a value, you have to change the values of all similar lines;
  • If you delete the elements of a single line, you lose some information about that element;
  • Make sure that all similar lines match (check spelling).

Here is a small example:

Internal redundancy - Example.png

The ISBN code is a piece of information that contains the exact title and author of a book. Therefore, if the title of book 1067 was lost, it could be found by decrypting the ISBN code of book 1022, since it is identical to the ISBN number 1067. The same information is therefore recorded several times.

This phenomenon, internal redundancy, violates the principle offounder of the databases:

"All relevant facts in the field of application must be recorded there once and only once."


Internal redundancy has many drawbacks. In our example, if you want to change the title of a book, you have to do it as many times as there are copies. You therefore cannot guarantee that the titles of the different copies are spelled the same way. Moreover, the copies of a book must be registered in accordance with the first one and the deletion of a single copy would lead to the definitive loss of the information on its title and its author.

If two rows have the same value for the ISBN column, then they have the same value for the TITRE column and for the AUTEUR column. The TITRE and AUTEUR values depend on the ISBN value. We are therefore here in the presence of functional dependency.

    ISBN -> TITRE, AUTEUR

ISBN is the determinant of dependency while TITRE and AUTEUR are the determinants. Functional dependency allows the detection and correction of redundancy situations. Functional dependency is a form of integrity constraint.

Functionnal dependency - Example.png

There is internal redundancy as soon as there is a determinant that is not a table identifier. A functional dependency where the determinant is not an identifier is said to be abnormal. For an identifier, one can always write:

    NUMERO -> TITRE, AUTEUR, ISBN, DATE_ACHAT, EMPL

This is, in a way, the definition of the identifier. Functional dependency is therefore normal, which means that there is no internal redundancy.

To correct the LIVRE table, it is broken down into two tables: an OUVRAGE structure table (determinant, determinate) and an EXEMPLAIRE structure table (determinant, residual). The determinant (ISBN) is a foreign key to the OUVRAGE table.

Now we have two standard tables, whereas before we had a non-standard table. To achieve this, we do what we call table normalization. The disadvantages of the internal redundancy have therefore disappeared, since there is no longer redundancy.

Mandatory clauses

The only mandatory part of a query is composed by the SELECT and FROM clauses. The SELECT clause associated with the FROM clause allows us to retrieve selected data from particular tables in the database.

The SELECT clause allows us to select the data we want the database to return (we can use an asterisk * to return all available data). Say you have a table named "Companies" (just like in the exercises), the clause "SELECT * FROM Companies" will allow you to see all the data contained in the Companies table.

Joining tables

Inside the FROM clause, we can declare a JOIN to join the results of two or more different tables (this will join the tables side by side). To have relevant and observable results, we use the keyword ON to specify the fields that should be common between the different tables.

The most common type of join is called an INNER JOIN, i.e. only the rows that have a correspondence in the other table will show up in the results. When writing JOIN, you will always by default use an inner join.

→ For example, say you have two tables, "Companies" and "Recommendations". The clause "SELECT * FROM Companies JOIN Recommendations ON Companies.ticker = Recommendations.symbol" gathers the two tables in one along the ticker and symbol columns. This means that the records for which the ticker and the symbol are the same will be joined in one record. Basically, you give the name of the first table (here Companies), then JOIN, then the second table (Recommendations) and then, you use the keyword ON to specify the fields that should be common between the two tables (ticker and symbol).

Another type of join is the LEFT JOIN, which shows all the records of the table that comes before the keyword and only shows the information of the rows of the second table, if they match with a row from the first table. The left join differs from the inner join, in the fact the inner join will only show the items for which the "ON" condition is met, whereas the left join will show every element of the first introduced table. The RIGHT JOIN follows the same logic as the LEFT JOIN, except that it will show every element of the table which were introduced after the right join.

→ For example: The clause "SELECT * FROM Companies LEFT JOIN Recommendations ON Companies.ticker = Recommendations.symbol" will show all elements of the table "Companies" and will add elements of the "Recommendations" table if they match one of the "Companies" table on the common field.

In sum, an INNER join will return rows and data only if there is a match between the two tables while a LEFT (or RIGHT) join will return all the rows from the left (or right) table, even if there isn't a match in the other table.

When you do not want fields to be common but just want to have every single item in one table, you can use the UNION ALL. This will allow you to "place" the tables on top of each other (this can be useful with aggregation operations).

Useful clauses

Simple filtering

Where

The WHERE clause allows us to filter out some of the rows from the tables based on a particular condition. This clause indeed checks a condition, or series of conditions, for each row. When these conditions are verified for a row, this row will show up in the results. Otherwise, it is discarded from the query.

Also, the conditions are classical but these two are different from Python conditions.

Operator Name
= Equal
<> Different from


  • You can filter by numeric values:

Example:

SELECT name
FROM students
WHERE age > 12 
  • You can also filter by text:

Example:

SELECT id
FROM students 
WHERE name = 'Marie'

Where and

You could also decide to add multiple conditions with the WHERE AND clause:
Example:

SELECT name
FROM students 
WHERE age >12 
AND age < 21 

Where or

You could decide you want one of two conditions and use the WHERE & OR clause:
Example:

SELECT name 
FROM students 
WHERE age =12 
OR age = 21 

Where in

You could decide you want sereval condition and use the WHERE IN clause:
Example:

SELECT name
FROM students 
WHERE age IN (21,25,28,30)

Where is null

If you only want to take into account non-missing values, use the IS NOT NULL clause

Example:

SELECT id
FROM students
WHERE age IS NOT NULL 

If you want to address the missing values, you will write : "WHERE age IS NULL" .

Where like

You could decide that you want to look for a pattern. You can then use the WHERE, then use the LIKE clause:

Examples:

  • If you want to look for names starting with a:
SELECT name 
FROM students
WHERE name LIKE 'a%'
  • If you want to look for names ending with a, use '%a'
  • To find any name that has "Ai" in it in any position: '%Ai%'

To have records that don't match the pattern, use the NOT LIKE clause.

Aggregation functions

In SQL the functions that are aggregates are AVG(), SUM(), MIN(), MAX(), COUNT().

  • To count all the rows in a table: SELECT COUNT(*)
  • To find the sum of all salaries: SELECT SUM(salary)
  • To find the average salary: SELECT AVG(salary)
  • To find the maximum salary: SELECT MAX(salary)
  • To find the minimum salary: SELECT MIN(salary)

By default, if you do not specify it, aggregation functions in the SELECT clause will squeeze (aggregate multiple values of) the table to 1 row, thus give one final value. Most of the time, the question only requires to squeeze the table up to a certain level, you will therefore use the GROUP BY clause.

Group by

The GROUP BY clause enables to perform aggregations by making it possible to do some operations across rows, on a certain field.

Example: You want to have the average salary of the countries in your data base:

SELECT country, AVG(salary) 
FROM database 
GROUP BY country 
Country Average salary
Belgium 2500€
France 2700€

Having

The HAVING clause is very similar to the WHERE clause. The HAVING clause enables to filter some lines based on the value of an aggregation result.

In other words, the HAVING clause can be used with aggregates functions, when the WHERE clause is not allowed to do that because it works on row's data.

Order by

The ORDER BY clause enables to sort the results of the query. For example, we can use the keyword ASC for ascending order of the results, or DESC for descending.

You could sort multiple columns at once.

SELECT age, height
FROM team 
ORDER BY age, height 

Limit

The LIMIT clause is executed at the end of the query and cuts the results. If it is followed by one positive number x, it will show the x first results. If the numbers is negative, it will show the last x results. Finally, if you give two numbers, x and y, it will show the y rows that follow row x (E.g. if you give 3 and 5 it will show the rows 4 to 8, both included).

Example:

SELECT symbol, AVG(age) as average_age
FROM Executives
GROUP BY symbol
ORDER BY average_age
LIMIT 4, 8 

Rand

The RAND clause is used to return a completely random number or a random number within a range.

Example:

SELECT FLOOR(RAND()*(b-a+1))+a

In the example, a and b are numbers that define the range. "a" is the smallest number and "b" is the largest.

Coalesce

The COALESCE clause substitutes real value for NULLs.
Example:

SELECT COALESCE(comm, 0) 
FROM emp 

Delete from

Delete from enables you to remove rows from a table. If you associate this command to a "where", it is possible to select the concerned lines that will be removed. For instance:

 DELETE from COURS WHERE [credentials >5] 

removes all the courses that have more than 5 credentials from the table. You can also specify two conditions in the WHERE clause by using AND. For instance:

 DELETE from COURS WHERE [credentials > 5] AND [schedule = 'morning']

.

Update

The "UPDATE" clause enables you to modify existing rows. For instance, with the command : "Update inscription set note=10", all the students who are enrolled will have a 10 in all the courses they follow.

Insert into

Insert into[name of the tables in which data should be added]([list of the columns for which data is supplied in the query]) Values([values to insert in the fields listed above])

Drop

The "DROP" clause allows you to drop completely a table in you database.

 DROP TABLE random_table 

Physical Structures

Database tables are stored on the computer disk. If they are large, it can take a long time to access and modify the data. The physical structures ensure good performance for reading and editing operations.

                                          Main mechanisms -> Index and Storage space

However, the user who reads or modifies data is unaware of the presence of these physical structures when formulating a query.

Index

Indexes in databases work a bit like the indexes found in encyclopedias. They make it possible not to browse through everything when you want to search for just one word. One way to do this is to create a correspondence table that associates each value in the column, or columns, with the list of numbers in the corresponding lines.

In order to create an index, we use:

CREATE INDEX 'index_name' ON 'table' 

It is also possible to create an index on a single column:

CREATE INDEX 'index_name' ON 'table' ('column1')

To create an index on two (or more) columns, you can use:

CREATE INDEX 'index_name' ON 'table' ('column1', 'column2')

Finally, to delete an index, you just have to use:

DROP 'index name'

An index will generally be defined on the columns that constitute an identifier in order to speed up access and verification before inserting a row. An index will often be defined on a frequently used selection condition and on the columns that constitute a foreign key in order to speed up access and verification before deleting a row.

Storage Space

The rows of a table are stored on a disk. The rows will be stored in a special space reserved for them called storage space. This space corresponds to a file occupying all or part of a disk, or even several disks. A storage space is characterized (among other things) by its address, its initial volume, the way it grows or shrinks according to needs, by the tables on which it holds the lines and by the technique used to store the lines.

In order to create the storage space, we use:

CREATE DBSPACE 'storage space name'

To put a table in a storage space, you use:

CREATE TABLE 'table name' (...) IN 'storage space name'

Finally, to delete a storage space, you use:

DROP 'storage space name'

SQL with python

A database, Sqlite is a library already installed with python but is very limited. Thus, for light-weighted database there is no need for an external server. However, though quite similar, it still requires a particular grammar. Here are some essentials :

Import sqlite and create a connection

 import sqlite3
connection = sqlite3.connect('name_of_your_DB') 

Object "cursor" and "execute"

In order to interact with the database and perform commands, we need to use the cursor and use its "execute".

 
#create a variable c to use the cursor
c = connection.cursor() 

# Each time you want to command something you must use this variable c and the "execute".
c.execute("INSERT INTO table1 VALUES ('value1','value2','value3', 'value4')")

#It is important to "commit" the changes made, because #once the connection closed, they will be lost.

connection.commit()

#It is important to close the connection when you're done.

c.close()

Check your understanding

Questions

Question 1: What is the relation between movies and actors?

  • Self-Join
  • One-to-Many
  • Many-to-Many
  • One-to-One

Question2 2.Which of the following keyword is not an aggregation operation?

  • SUM
  • AVG
  • MAX
  • PROD
  • COUNT

Question 3.You can create a many-to-many relation between two tables by putting a foreign key in each of the tables.

  • True
  • False

Question 4. A junction table is a table that only contains foreign keys of tables involved in a many-to-many relation.

  • True
  • False

Solutions

  • Question 1: Many-to-Many
  • Question 2: PROD
  • Question 3: False
  • Question 4: False

Link to the assistant web page

Examination of SQL

You could get a simple ERD diagram and you would have to write an SQL statement that would yield a specific resultset. Or, you could get a SQL statement and tell what it will yield, or one that includes errors that you should spot and fix.

Where to go ?

Main page Exercises- Next Session BPMN

  1. Faulkner, S., 2018. EIMIB212 - Bases De Données. UNamur. Année académique 2018-2019.