LyricAlly

Conclusion


Line level alignment of text and musical audio

Text is crucial for duration estimation

Rhythm detection can inform downstream components

Accuracy of chorus detection is vital

Vocal detection model uses training based approach

For real-time performance: need to explore alternative vocal
detection models