Line level alignment of
text and musical audio
Text is
crucial for duration estimation
Rhythm detection can
inform downstream components
Accuracy of
chorus detection is vital
Vocal detection model
uses training based approach
For
real-time performance: need to explore alternative vocal detection models