I use python sqlite3 module to create a table 'count_email', this table has 'email' and 'count' columns, I retrieve email address from a text file and count how many times a same email has appeared in the file, I store that integer as the value in the 'count' column of the same row.
First, I need a text file with tons of email addresses saved to it. I use an online fake email generator to create 10 random emails, then randomly choose some of the emails to copy and paste several times again so that a list of 50+ emails has been created with only 10 unique email addresses. The emails file looks like this
From earlene57@wiza.com
From donnelly.treva@gmail.com
From shad.hyatt@hotmail.com
From nienow.pascale@koss.info
note:
From dgutmann@boehm.net
From earlene57@wiza.com
From donnelly.treva@gmail.com
From shad.hyatt@hotmail.com
From wisoky.lane@gmail.com
Read this file to get all email address
Create a database to store emails and their counts. For doing so, import sqlite3 package, build connection, , then create a cursor. If the table has been created before, drop it each time when run this program, then create a new table with two columns mentioned above. Check out the code in details to see how to INSERT a new row when no such email has been found in the database, also how to UPDATE the database with an increment count using placeholder '?'.
Once the database has been created, we can do sorting easily using SELECT command with ORDER BY. Check out the whole script below.
import sqlite3
# 1: CREATE/CONNECT a database -> create a table
# create a db
con = sqlite3.connect('emails.db')
cursor = con.cursor()
# starting 'count_email' table fresh each time run this program
cursor.execute('DROP TABLE IF EXISTS count_email')
# create an emtpy table of two columns
cursor.execute('''
CREATE TABLE count_email (email TEXT, count INTEGER)
''')
# 2.
# open email text file get email address
filehandle = open('emaildb.txt')
for ln in filehandle:
if not ln.startswith('From'): continue
email = ln.split()[1]
# TO move the cursor to the right row using placeholder
cursor.execute('SELECT count FROM count_email WHERE email = ?', (email, ) )
# fetch one row
row = cursor.fetchone()
# if no such row existing in the db (None), insert this email into db and put a 1 as count value
if row is None:
cursor.execute('''INSERT INTO count_email (email, count) VALUES (?, 1)''', (email, ))
# if this row has been created before in the db, then update the count by incrementing by 1
else:
cursor.execute('''UPDATE count_email SET count = count + 1 WHERE email = ?''', (email, ))
# commit the changes to the disk
con.commit()
#3 display top 5 emails in the db in descending order
# move cursor to all that meet the criteria
results =cursor.execute('SELECT email, count FROM count_email ORDER BY count DESC LIMIT 5')
all = results.fetchall()
for row in all:
print(f'email: {row[0]} count:{ row[1]}')
con.close()