Data Set

The data set was composed of samples from the MedleyDB (R. Bittner, J. Salamon, M. Tierney, M. Mauch, C. Cannam and J. P. Bello).

For ease of processing, a small JSON file was created with references to the file locations and their proper categorizations.

Training

This neural network took advantage of a more aggressive form of windowing than the last did.

Rather than read in a single sample from each file, the software reads in the entire file, splitting it into overlapping windows of variable size.

Once divided like this, an entire file can be fed in in sequential order, yielding a timeline of the entire song indicating where solos were occurring.

Analysis

After feeding an entire file through the network, an array of softmax results was created: [[0.01, 0.02, 0.90, 0.07], [0.03, 0.02, 0.89, 0.0.6], ...]

This was then fed through a function that converted it from a direct linear array into a more manageable array of ranges, formatted as:

[{
       "start": 10,
       "stop": 15,
       "category": "vocal"
 }, ...]