Data Dialogue: Audio Cover Songs: Analysis And Synthesis
As humans, we take for granted the ability to recognize songs which are ``similar,'' but this is a challenging task to automate algorithmically due possible transformations that can happen between two different versions of the same song (e.g. tempo changes, instrument changes, mixing differences, added/deleted sections). With the explosion of (music) data, however, the automatic audio cover song identification (CSI) problem remains important. Most CSI techniques focus on pitch-based features such as chroma, since pitch is preserved even under extreme instrument substitutions, but these techniques fail for songs in which pitch does not carry the dominant musical expression, such as certain hip hop songs. In our work, we show that it is (surprisingly) possible to use non-pitched features, such as MFCCs, for CSI by computing their self-similarity in small blocks. Furthermore, we show a substantial improvement over pitch-based and non-pitched based features alone by using an unsupervised ``similarity network fusion'' technique to fuse our features with more traditional pitch-based features upstream. We show state of the art results on several datasets, including a new dataset we introduce known as ``Covers 1000.''