1
|
- Ye Wang, Min Yen Kan, Tin Lay Nwe, Arun Shenoy, Jun Yin
|
2
|
- Motivation
- Is singing voice transcription really necessary ?
- Speech recognizers cannot be directly deployed
- Availability of music lyrics on the internet
|
3
|
|
4
|
- Bryan Adams – Back to you
|
5
|
|
6
|
|
7
|
|
8
|
|
9
|
|
10
|
|
11
|
|
12
|
|
13
|
- Chorus sections detected by high level of repetition.
- Accounts for phoneme, word and line level repetition.
|
14
|
|
15
|
|
16
|
- Observation : Gaps between sections are shorter and more stable as
compared to the sections themselves
|
17
|
|
18
|
|
19
|
|
20
|
|
21
|
- Starting point calculation more difficult than duration estimation
|
22
|
- Decreasing order of criticality:
|
23
|
- Line level alignment of text and musical audio
- Text is crucial for duration estimation
- Rhythm detection can inform downstream components
- Accuracy of chorus detection is vital
- Vocal detection model uses training based approach
- For real-time performance: need to explore alternative vocal detection
models
|
24
|
- GENERAL
- Limitation - 4/4 Meter, V1-C1-V2-C2-B-O
- Future Work – alternate meter and song structure
- AUDIO
- Limitation – MM-HMM Optimal Classifier ?
- Future Work - mixture modeling or classifiers like SVM and NN
- Limitation – Restricted to percussive audio
- Future Work – new approach to drumless rhythm detection
- TEXT
- Limitation – Phoneme duration estimation independent of tempo
- Future Work – Tempo information re-estimation
|