LyricAlly: Automatic
Synchronization of Acoustic Musical Signals and Textual Lyrics
|
|
|
Ye Wang, Min Yen Kan, Tin Lay Nwe, Arun
Shenoy, Jun Yin |
Introduction
|
|
|
|
|
Motivation |
|
Is singing voice transcription really
necessary ? |
|
|
|
Speech recognizers cannot be directly
deployed |
|
Availability of music lyrics on the
internet |
|
|
|
|
Slide 3
"Bryan Adams – Back
to..."
|
|
|
Bryan Adams – Back to you |
Slide 5
Slide 6
Slide 7
Slide 8
Slide 9
Slide 10
Slide 11
Slide 12
"Chorus sections
detected by high..."
|
|
|
|
Chorus sections detected by high level
of repetition. |
|
Accounts for phoneme, word and line
level repetition. |
|
|
|
|
Slide 14
Slide 15
"Observation : Gaps
between sections..."
|
|
|
Observation : Gaps between sections are
shorter and more stable as compared to the sections themselves |
|
|
Slide 17
Slide 18
Slide 19
"LYRCIALLY SYSTEM
DEMO"
"Starting point
calculation more difficult..."
|
|
|
Starting point calculation more
difficult than duration estimation |
"Decreasing order of
criticality:"
|
|
|
Decreasing order of criticality: |
|
|
|
|
|
|
"Line level
alignment of text..."
|
|
|
Line level alignment of text and
musical audio |
|
|
|
Text is crucial for duration estimation |
|
|
|
Rhythm detection can inform downstream
components |
|
|
|
Accuracy of chorus detection is vital |
|
|
|
Vocal detection model uses training
based approach |
|
|
|
For real-time performance: need to
explore alternative vocal detection models |
"GENERAL"
|
|
|
|
GENERAL |
|
Limitation - 4/4 Meter, V1-C1-V2-C2-B-O |
|
Future Work – alternate meter and song
structure |
|
|
|
|
|
AUDIO |
|
Limitation – MM-HMM Optimal Classifier
? |
|
Future Work - mixture modeling or
classifiers like SVM and NN |
|
|
|
Limitation – Restricted to percussive
audio |
|
Future Work – new approach to drumless
rhythm detection |
|
|
|
|
|
TEXT |
|
Limitation – Phoneme duration
estimation independent of tempo |
|
Future Work – Tempo information
re-estimation |
|
|
|
|
|
|
|
|