LyricAlly: Automatic Synchronization of Acoustic Musical Signals and Textual Lyrics
Ye Wang, Min Yen Kan, Tin Lay Nwe, Arun Shenoy, Jun Yin

Introduction
Motivation
Is singing voice transcription really necessary ?
Speech recognizers cannot be directly deployed
Availability of music lyrics on the internet

Slide 3

"Bryan Adams – Back to..."
Bryan Adams – Back to you

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

"Chorus sections detected by high..."
Chorus sections detected by high level of repetition.
Accounts for phoneme, word and line level repetition.

Slide 14

Slide 15

"Observation : Gaps between sections..."
Observation : Gaps between sections are shorter and more stable as compared to the sections themselves

Slide 17

Slide 18

Slide 19

"LYRCIALLY SYSTEM DEMO"
LYRCIALLY SYSTEM DEMO

"Starting point calculation more difficult..."
Starting point calculation more difficult than duration estimation

"Decreasing order of criticality:"
Decreasing order of criticality:

"Line level alignment of text..."
Line level alignment of text and musical audio
Text is crucial for duration estimation
Rhythm detection can inform downstream components
Accuracy of chorus detection is vital
Vocal detection model uses training based approach
For real-time performance: need to explore alternative vocal detection models

"GENERAL"
GENERAL
Limitation - 4/4 Meter, V1-C1-V2-C2-B-O
Future Work – alternate meter and song structure
AUDIO
Limitation – MM-HMM Optimal Classifier ?
Future Work - mixture modeling or classifiers like SVM and NN
Limitation – Restricted to percussive audio
Future Work – new approach to drumless rhythm detection
TEXT
Limitation – Phoneme duration estimation independent of tempo
Future Work – Tempo information re-estimation