MusicXML is an XML-based file format for representing western musical notation. I found some random musicXML files online, so I decided to create a database to store part of the music information using Python.
The musicXML file looks like this:
NOTE: for embedding an XML file in an HTML page, such as the doc below, use 'textarea' element, which will preserve both the formatting and the angle brackets of the file.
After examine this XML file, I notice the 'part-list' tag in this XML file has two different children tags rotating with each other. Use some Python code, I can print out the children tags of 'part-list'.
import xml.etree.ElementTree as ET
# find the tree and its root
tree = ET.parse('sqlite3_practice\\xmlsamples\\ActorPreludeSample.musicxml')
root = tree.getroot()
print(root.tag) # output: score-partwise
# print out all children tags of the root element
children = ''
for child in root:
children += child.tag + ' '
print(children)
What I am interested in is the 'score-part' tags. My goal is to retrieve all the sub-tags of 'score-part' tag and register each of the sub-tags into a database. Because each child tag may further have its own child tag (children tags) or have no child tag at all, what I need to do is to track down the the deepest level sub-tags to get the tag names and their text content.
For achieving this, I created a method called 'check()' to check on each tag element recursively till I can get the text value from the deepest sub-tags.
In order to use this check() function to get all needed sub-tags, I use root.findall() of ElementTree method to collect all 'score-part' tags under 'part-list' tag, which is a child tag of root tag. Now I can use check() method to go through each 'score-part' tag for creating a dictionary to store all needed {key:value} pairs.
This is how I coded:
# collect all the deepest level tags into a set, to remove duplicate
# this is for creating the table columns later on
tags = set()
texts = []
# helper method recursively check each sub-tag till find .text
def check(element):
# I used the following code to locate some bugs inside the xml file
if element.text is None:
print(f'OOPS! {element.tag} has None text')
return
# when tag has sub-tags, the 'text' is a long space string
# such as " ", so use strip() to remove all spaces first
if element.text.strip() == "":
# print(f 'GO DOWN {element.tag} for ')
for kid in element:
check(kid)
else:
# print(f'{element.tag}: {element.text}')
tag = element.tag
text = element.text
tags.add(tag)
texts.append(text)
# get 'score-part' tags from XML file
content = root.findall('part-list/score-part') # return a list of 'score-part' tag
# print(content)
for entry in content:
check(entry)
# check each tag group, if a word is repeated often, create a new table for that category if necessary
print(texts)
# ouptput is not repeated often ['Piccolo', 'Picc.', 'score', 'Picc. (V2k)', '1', '73', '80', '0', 'Flutes', 'Fl.', 'score', 'Fl. (V2k)', 'wind.flutes.flute', '2', '74',...] so no need to create another relational table
print(tags)
# output: {'pan', 'part-abbreviation', 'volume', 'midi-channel', 'instrument-sound', 'group', 'part-name', 'midi-program', 'instrument-name'}
Now, with all those unique tags retrieved from the XML file, we can use these tag names as columns' names for the table in a database.
import sqlite3
# creat a table using the tags as column names:
cursor.executescript('''
DROP TABLE IF EXISTS AP;
CREATE TABLE AP (
id integer primary key autoincrement not null unique,
volume text,
pid text,
"group" text,
"part-abbreviation" text,
"instrument-sound" text,
"midi-channel" text,
"part-name" text,
"midi-program" text,
"instrument-name" text);''')
Next step in my plan is to add text of each tag into the table created. Under each "score-part" tag, there are different subtags, I plan to input all tags' text values into the table as one row, for doing this, I need a dictionary to be generated from each "score-part" tag, where keys are sub-tags' names, values are the texts. I therefore created a new recursively function 'get_variables()' for achieving this. A dictionary can be returned from each 'score-part' tag using this function. I can then use this dictionary to input all corresponding values into the right cells in the table. Let's check out this function first:
# helper method recursively check each sub-tag till find .text
def get_variables(element, dic):
# get pid from the attibute of Score-part tag only once by check dic length
if len(dic) == 0:
pid = element.attrib
dic['a'] = pid.get('id')
# I used the following code to locate some bugs inside the xml file
if element.text is None:
print(f'OOPS! {element.tag} has None text')
# when tag has sub-tags, the 'text' is a long space string
# such as " ", so use strip() to remove all spaces first
elif element.text.strip() == "":
# print(f'GO DOWN {element.tag} for ')
for kid in element:
get_variables(kid,dic)
else:
# print(f' {element.tag}: {element.text}')
tag = element.tag
text = element.text
dic[tag] = text
return
# print(f'before final return -> {dic}')
return dic
# print out to test the dic results
for entry in content[:2]:
dic = get_variables(entry, dic={})
print(dic)
This 'get_variables()' function is very similar to the 'check()' function above. However, I stumbled a little bit on making this work initially. The difference is that the second function needs to return a dictionary from a recursive method.
The right logic should be like this:
Only 'return' after you get the final result, the final result could be inside 'if' or 'else' block, it's irrelevent. Using 'return dic' to return the finalized dictionary.
The last but not the least is to populate the table with data stored in each dictionary. I use 'dic.get()' to make sure even if no specific value is available for some tag, a default value can be assigned.
# use all tags to create a table
for entry in content:
# each entry needs a brand new empty dic
dic = get_variables(entry, dic={})
# get all variables
pid = dic.get("a", "0")
part_abrv = dic.get('part-abbreviation', "")
group = dic.get('group', "")
volume = dic.get('volume', "")
midi_channel = dic.get('midi-channel', "")
instrument_sound = dic.get('instrument-sound', "")
part_name = dic.get('part-name', "")
midi_program = dic.get('midi-program', "")
instrument_name = dic.get('instrument-name', "")
# add into table
cursor.execute('''INSERT INTO AP
("pid", "part-abbreviation", volume, "group","midi-channel", "instrument-sound" , "part-name" , "midi-program", "instrument-name")
VALUES (?,?,?,?,?,?,?,?,?);''', (pid, part_abrv, volume, group,midi_channel, instrument_sound, part_name, midi_program, instrument_name) )
con.commit()
# close the connection
con.close()
Here is the table 'AP' created.