--> Skip to main content

Featured

Steps to Create a Project on GitHub

Steps to create a project on GitHub:  1.   Start a repo on GitHub 2.   Make a README.md file 3.   Open vs code – new folder – new terminal – git clone http:…. (from the repo). 4.   In terminal – cd theprojectname   à move into the project file 5.   Ls -la is for showing hidden file à not working for me ???? 6.   Make some changes to README file, in terminal git status à shows all the changes on the file 7.   To add all changes update to the folder à git add . (the dot . means all changes to be added) 8.   Add a new file index.html, in terminal à git commit -m “title of commit” -m “description of commit” 9.   Then git push origin master 10.                 ****Initial a repo in local text editor à git init 11.                 After use git add . etc, when pus...

Create a Database with Three ' Many to Many ' Relationships Using Python - Student / Class / Professor

This project is the extension of 'roster.py' introduced in "Python for Everybody" by Dr. Chuck. I've created a simpler version of this project days ago. Based on the 'Student to Class' structure, I extended my query to add a new table of 'Professor', because in reality, each class should have one or two professors to teach, and each professor may teach one or several classes. So there are now several pairs of 'many to many' relationships. 

In the end, there should be a student in a specific class taught by a professor. I did this project followed the flow of my design of the database.

This project needs two files as the data source. One is 'roster_data_sample.json' downloaded from Dr. Chuck's website; another is 'professor.csv' file I created using an online dataset generating tool.

STEP ONE: I created 5 tables in a database. Student and Class tables are connected by SandC joining table, Class and Professor tables are interconnected by CandP table. In Student, Class and Professor tables, I used 'UNIQUE' constraint for all the names/titles so that no duplicated name is allowed.

import sqlite3

# create sqlite database
con = sqlite3.connect('roster.sqlite')
cursor = con.cursor()

# create 5 tables
cursor.executescript('''
DROP TABLE IF EXISTS Student;
DROP TABLE IF EXISTS Class;
DROP TABLE IF EXISTS Professor;
DROP TABLE IF EXISTS SandC;
DROP TABLE IF EXISTS CandP;

CREATE TABLE Student (
    id integer primary key autoincrement not null unique,
    name text unique
);
CREATE TABLE Class (
    id integer primary key autoincrement not null unique,
    courseTitle text unique
);
CREATE TABLE Professor (
    id integer primary key autoincrement not null unique,
    name text unique
);
CREATE TABLE SandC (
    sid integer,
    cid integer,
    primary key (sid, cid)
);
CREATE TABLE CandP (
    cid integer,
    pid integer,
    primary key (cid, pid)
);
''')

Now, let's read json file and collect all data from it.

import json
# read from json file
data = open('roster_data_sample.json').read() # type is str
# conver str to json
jsondata = json.loads(data)

Instead of populating all tables at once, which will be really hard, I decided to add data to Class table first.

# insert all courses into Class table
for entry in jsondata:
    courseName = entry[1]
    cursor.execute(
        '''INSERT OR IGNORE INTO Class (courseTitle) values (?)''',(courseName,)
    )

Open a DB Browser, I can inspect the newly created database with all five tables, and the Class table has been populated successfully.

Next to work on is the Professor table. I need to read the csv file to retrieve each professor's full name. The csv file has two columns: first name and last name, I prefer to use full name as the needed data for Professor table's name column. 

It warrants mentioning that when use csv.reader() to read a csv file, the returned value is an iterator, for this specific project, I need to have the returned result from each line to be a list, so that I can refer to its index for first and last name respectively. This is actually very easy to solve thanks to the iterator to list conversion by type constructor :   list(returned_iterator).
 
Thereafter I inserted full names constructed from the csv file into the Professor table.

# read professor.csv file
import csv

pFile = open('professor.csv')
csvFile = csv.reader(pFile, delimiter=',') #  iterator, can't use indexing
csvList = list(csvFile)
for row in csvList:
    # print(row) # output : ['Alfred', 'Clark'] ['Amelia', 'Richards'] ['Haris', 'Ross']
    fullname = f'{row[0]} {row[1]}'
    # print(name)
    cursor.execute(
        'INSERT INTO Professor (name) values (?);',(fullname,)
    )

Here comes the time to populate the Student Table. Read json file, retrieve the student's name, then store all unique names into the database. For 'unique' constraint, 'INSERT OR IGNORE' is used.

# populate Student table from the json file
# print(jsondata) #[['Charley', 'si110', 1], ['Mea', 'si110', 0], ...]
for entry in jsondata:
    studentName = entry[0]
    # use IGNORE to warrant UNIQUE constraint
    cursor.execute('''
        INSERT OR IGNORE INTO Student (name) values (?)        
    ''',(studentName,))

Also, for generating the 'sid' and 'cid' used in SandC table, I need to loop through json data again. It is a two-steps process. First to get the name of a student from json file, go to the Student table to find its corresponding student id as 'sid', then fetch the course name of this student from the json file, get its respective class id as 'cid' from the Class table. These two ids are the data we wanted for populating the SandC table.

# populate SandC table
for entry in jsondata:
    studentName = entry[0]
    courseName = entry[1]

    # get sid from Student table
    cursor.execute('''
    SELECT id FROM Student WHERE name = (?)
    ''',(studentName,))
    studentId = cursor.fetchone()[0]

    # get cid from Class table
    cursor.execute('''
    SELECT id FROM Class WHERE courseTitle = (?)
    ''', (courseName,))
    courseID = cursor.fetchone()[0]

    # insert pair of ids into the table
    cursor.execute('''
    INSERT INTO SandC (sid, cid) values (?,?)
    ''',(studentId, courseID))

The last table in this database is CandP table, it's the joining table for Class and Professor tables. I imagine that some classes have only one instructor, some may have up to 3 instructors for a bigger enrollment. For a cleaner code, I first made a helper function to randomly assign 1 to 3 instructors to each course stored in the Class table.

Then I noticed something a bit wired. Due to the randomness, several professors don't have any classes assigned yet. To make the project more realistic, I picked out all the professors who have no classes to teach yet, assigned each of them a random class to teach.

One coding trick here is to use 'set'  instead of 'list' for getting the difference between two lists.  Python 'list' can't use minus '-' operator, however 'set' can. I used no_class_pids = list(set(pro_ids)-set(pids)) line of code to find the difference between two lists.

# randomly create CandP table by assign 1-3 professors to each course randomly

import  random

# helper function for assigning professors to this course
def get_professor_ids():
    # how many professors for this course
    num = random.randint(1,3) # output randomly 1,2,3
    
    cursor.execute('select id from Professor')
    professorIDs = [idTuple[0] for idTuple in cursor.fetchall()]
    p_ids = random.choices(professorIDs,k=num)
    return p_ids

# collect all course ids into a list
cursor.execute('SELECT id FROM Class')
courseIDs = [courseID[0] for courseID in cursor.fetchall()]

# assign a random number of professors to each course
for courseID in courseIDs:
    professors_ids = get_professor_ids()
    for p_id in professors_ids:
        cursor.execute('INSERT OR IGNORE INTO CandP (cid, pid) values (?,?)', (courseID, p_id))


# use set() to get list of professors without class to teach, assign one class to him/her
# 1. get all professors' ids into a list from CandP then from Professor table
cursor.execute('SELECT pid FROM CandP;')
pids = [id[0] for id in cursor.fetchall()]

cursor.execute('SELECT id FROM Professor')
pro_ids = [id[0] for id in cursor.fetchall()]

# compare the two list to find difference using set()'s '-' operator
no_class_pids = list(set(pro_ids)-set(pids))
# print(no_class_pids) # [2, 4, 7, 10, 12, 16, 18, 19]

# keep on adding new rows into CandP table with a professor and a randomly assigned class id
for pid in no_class_pids:
    courseId = random.choice(courseIDs)
    cursor.execute('INSERT INTO CandP (cid, pid) values (?,?)', (courseId, pid))

The last step, I need to create a joining table to connect Student, Class and Professor tables, so that each student can go to a unique class that is taught by a unique professor. I call this table SCP, to be simple, that has 3 columns including sid, cid and pid for student id, class id and professor id respectively. I didn't plan out this table at the beginning of the project, so I have to add it at near the end.

The process of creating SCP table is a bit complex. I copied SandC table first, add a new column called 'pid' for storing a professor's id, then using CandP table as a reference, I picked one random professor from all professors who teach the class as the only professor to teach this student.

Again, for picking a random professor id out of professors for a given class name, I made a helper function 'get_one_professor(courseid)'. Since the column of 'pid' in the SCP table has been created, I used "UPDATE" statement for inserting the 'pid' created into each row.

# create a table SCP to connect all tables
# copy SandC, add one column 'pid',
cursor.execute('create table SCP AS select * from SandC')
cursor.execute('alter table SCP add pid;')


# choose random one id to insert into the cell of 'pids' using CandP 

# STEP 1: A helper function to randomly choose a pid from cid
def get_one_professor(courseid):
    cursor.execute('select pid from CandP Where cid=?',(courseid,))
    pid = random.choice([id[0] for id in cursor.fetchall()])
    return pid

# STEP 2: get cid and sid of each row, update SCP with the newly assigned pid
cursor.execute('select cid,sid from SCP')
print(cursor.fetchall())
for theCID, theSID in cursor.fetchall():
    pid = get_one_professor(theCID)
    cursor.execute('''
    UPDATE SCP SET pid = (?)
    WHERE cid = (?) AND sid = (?)
    ''', (pid, theCID, theSID))

The joining table SCP should have a primary key to guarantee a unique combination of (sid, cid, pid), It might be smart to add the constraint when creating this table, but now it's too late. For SQLite, ALTER TABLE command has limited performance, in order to add a primary key into SCP table, I need a multi-steps approach to achieve this at this stage.
1. rename the SCP to OLD_SCP
2. Create a SCP table with primary key defined
3. Insert into SCP all data from OLD_SCP
4. Remove OLD_SCP table from database

# add Primary key to SCP
cursor.executescript(
    '''
    ALTER TABLE SCP RENAME TO OLD_SCP;
    CREATE TABLE SCP (
        sid integer, 
        cid integer, 
        pid integer,
        Primary key (sid, cid, pid));
    INSERT INTO SCP SELECT * FROM OLD_SCP; 
    DROP TABLE OLD_SCP;
    '''
)
So far so good. 

The database with 6 tables are now created. The SCP table has a constraint of primary key with 3 IDs combined. This will guarantee that a unique 'student-class-professor' combination can be generated by this database.

Now, the last step is to join the tables together. This is the happy moment to see all data source are put together for generating a neatly displayed dataset. 

# join tables to show student, class and professor
cursor.execute('''
SELECT Student.name, Class.courseTitle, Professor.name 
From Student JOIN Class JOIN Professor JOIN SCP
ON Student.id = SCP.sid
AND Class.id = SCP.cid
AND Professor.id = SCP.pid
ORDER By Student.name, Class.id, Professor.name;
''')

print(cursor.fetchall())

con.commit()
con.close()

To visualize the resulting table, I used DB Browser:


During the process of this project, remember to always use commit() to save the progress and use close() to close the SQLite connection after each SQLite command.

 
The reward of doing coding projects is the feeling of satisfaction at the end when you finally put all the puzzle pieces together, and becoming a better coder after clearing all the obstacles in your way. 

Happy coding!

For source code, please visit GitHub page.

Popular Posts