Welcome to the SIMSSA Database!
The SIMSSA Database is designed as a repository and discovery tool for symbolic music files (e.g. MEI, Kern, MusicXML, and MIDI). Users can browse existing files or upload their own. The current site is a prototype that is still under development. It serves as part of the SIMSSA Project, a SSHRC Partnership Grant. The SIMSSA Database is the successor of an older database created as part of Julie Cumming’s Digging into Data grant, designed to gather symbolic music files in one place to do computer-aided counterpoint analysis. The new database has made improvements in several areas, explained below.
Modelling Bibliographic Metadata for Music
Our data model was first presented in 2017 as “A database model for computational music research” (McKay et al., 2017). The model is fairly complex, designed to handle all the different kinds of music items that we might need to catalog — from prints to manuscripts and even recordings in the future. Since the original presentation, it has evolved in response to our development and testing process, but the basic premise of extensive support for all the different forms of music items remains key to our model. We’ve referred to library standards for bibliographic metadata such as the IFLA-LRM as well as RISM‘s Muscat . We also use controlled vocabularies for some fields, so the user is presented with suggested vocabulary -- for example, we use the Library of Congress Medium of Performance thesaurus for voices and instruments. We also pull in information about composers from VIAF. Users can override the suggestions and add a new entry; we want to balance quality of data with flexibility. We are working towards adding URIs to our data to structure our work in a linked-data friendly way.
The database allows users to enter an immediate source — a URL or book title, for example — as well as one “parent” source. Modelling more complex hierarchies is a future goal, as well as being able to relate sources to other sources. Another part of provenance is keeping track of who did what when. We have entities for tracking how different file types are encoded, as well as validation which can track verifying the quality of files and can include details about workflow or reliability (For more on the methodology for building a good corpus, see Cumming et al, 2018. We also make it possible to record information about software used to encode symbolic files (e.g., a MusicXML file exported from Musescore).
Searching Musical Content
The SIMSSA DB uses Cory McKay‘s jSymbolic software to allow content-based search of all the files in the database. When a piece is entered in the database, we automatically used jSymbolic to extract the musical features. This allows users to search for musical content as defined by jSymbolic features — aspects such as range, the presence of certain intervals, or other patterns. Users can also combine metadata and content search.
It takes a lot of time to build a research corpus and ensure that everything is high quality, so we want to be sure we get to keep it, both for our own future use and for others who may wish to reproduce our results or conduct their own studies. While datasets are being built and edited, they go in GitHub. This allows us to track changes and collaborate really easily, and prevents things from getting lost in Google Drive. SIMSSA DB is great for finished corpus, storing related features, making it discoverable, and conducting searches, but for citation purposes we need something even more robust. We are working towards using Zenodo for “release quality” datasets to cite in papers. Zenodo is an open-access repository run by the folks responsible for CERN and allows us to generate a DOI for a stable, citeable dataset.
Developing the Database on GitHub
You can view our progress on our GitHub repository here, and see our developer documentation here.
If you have any suggestions and feedback about the design of the SIMSSA DB, or inquiry the newest update as well as the date for official release, contact us on Twitter: @simssaproject.