Music Information Retrieval
Essay by review • December 30, 2010 • Research Paper • 1,555 Words (7 Pages) • 1,359 Views
Abstract:
This paper describes a retrieval method for a significant amount of music tracks. It takes a melody sung or a text input as a search clue which is sent over the internet to retrieve the song from the database. This system uses a variety of procedures to support multiple search modes. The thorough indexing method allows for rapid recovery of the song. Integration of these components within a single architecture shall improve performance and functionality.
Overview of Objectives:
The aim of this paper is to utilise and develop existing techniques for Music Information Retrieval. The system shall be able to retrieve a song from either a text query or a melody query. This melody can be sung, a riff or humming. Browsing the database shall also be available to the user. The database shall reside on a server. For the purposes of this paper, it shall contain 10,000 songs. However, taking into consideration all the features of this system, I would expect it to be able to process a database 1000% larger.
Text
Query
Search Engines
Text Matching
Theme Matching
Phonetic Matching
Markov Representation
Music
Archive
User
Interface
Aural
Query
Figure 1 - Name That Tune Architecture
Functional Description of the System:
The system shall have a graphical interface containing 2 search engines:
1. Text Input - User may enter the title, line of song, artist
2. Aural Input - User may sing, hum, play a riff
The user would choose their preferable search method.
The aural input is processed through a Speech Recognition System. This comprises of 2 components:
1. An acoustic model - using statistical modelling using hidden Markov models. 2. A language model - model of the expected word sequences. This vocabulary will contain 80,000 words.
The songs shall be contained in a database with a corresponding value recorded in an index. This would contain 4 sections:
1. Traditional Text Matching
2. Theme Matching
3. Phonetic Matching
4. Markov Representation
There are other techniques, but after researching I feel that these are the most productive.
Sections 1 and 2 deal with the Text Input and 3 and 4 with the Aural Input. The file structure shall be that of the inverted file. This stores term locations. It shall also include term proximity. The most effective method to compute term locations is to use a hashing algorithm which will be employed in this system.
As all lyrics shall be recorded with each song, the Text Matching search shall be the most accurate. However, many users may not input the correct words and incorrect songs would be displayed.
This method shall use general information retrieval techniques.
1. Remove all stop words as these do not aid retrieval. Eg. a, the, that, so etc. Research has shown that memory requirements for document representations can often be reduced by more than 50%.
2. This system shall use the Porter stemming algorithm to remove suffixes that inhibit matching. This would require the creation of a dictionary of suffixes. This uses iterative rules of the form
a. remove plurals, -ED, ING .
b. terminal Y - > CARRI, CARRY - > CARRI
c. map double suffices to single, - ISATION
d. deal with -IC, -FULL, -NESS
e. remove -E if word > 2
The option of selecting a theme would be available. The MUSART application utilises this feature. [6] This will be in the form of a drop down menu. To improve this I would add the genre to the classification. Genres to divide the corpus: Rock, Jazz, Country and Folk, Classical, Dance and Rap. If neither achieved successful results or are not selected, the phonetic search would then be deployed.
The user would make an aural input which would be phonetically deconstructed yielding results from the Phonetic Index. This index contains a summary of each song's phonetic decomposition. English has about 45 distinct phones. Our phonetic dictionary would contain more than 200,000 entries. [2] Recognition at the phone level means we do not have to explicitly build models of all words, new words can be added to the vocabulary easily if their phonetic structure is known. This shall make the system adaptable to out-of-vocabulary words.
Searching for a phonetic string has a greater success rate when the text search has yielded no correct results. This is discussed in Witbrock and Hauptmann's paper although they are dealing with incorrectly transcribed documents, the same premise applies in this case. [3]
Hidden Markov Model, H.M.M., is a very powerful tool to statistically model a process that varies in time. It can be seen as a doubly embedded stochastic process with a process that is not observable (hidden process) and can only be observed through another stochastic process (observable process) that produces the time set of
observations. A HMM can be fully specified by
1. N, the number of states in the model
2. M, the number of distinct observation symbols per state
3. A = {aij}, the state transition probability distribution
4. { ( ) } j B = b k , the observation symbol probability distribution
5.
...
...