support for alignment output in tsv format #407

contentnation · 2024-02-23T16:21:10Z

Support of alignment data output.
Kind of matching on issue #364
Can be used as a base for #391 and #361
Runs text to speech 2 times, one for normal audio generation,
a second time for each word.
Since both produce different outputs and times, a correction is applied.
Not perfect, but "good enough". Both will self sync after each sentence, so only slight offset are created.

vytskalt · 2024-04-03T11:58:13Z

I've been trying this out. Looks like when using a long text some of the last words are being skipped in the alignment file.

contentnation · 2024-04-03T12:44:58Z

@vytskalt can you provide an example so I can debug/fix it?

vytskalt · 2024-04-03T13:01:45Z

@vytskalt can you provide an example so I can debug/fix it?

Yes, this is the command I'm running:

cat text.txt | piper --sentence-silence 0.5 -m en_US-ryan-high --output_file out.wav --alignment-data alignment.tsv

This is the text (random Reddit post): text.txt

In the alignment.tsv, 2 of the last words are missing.

contentnation · 2024-04-03T13:52:39Z

ok, it's not the length that is the issue, it's the content. For example: "musical/sport" will be spoken as 3 words. "in the" is mangled into one spoken word. My word/phoneme sync trips over this. Needs to be fixed, I have to find another way to sync.

… or split by "musical/sports". Also fixed missing sentence silence in calculation

charlyhayoz · 2024-05-07T15:28:19Z

Hi,

i pulled this pull request and make a build but the --ali gnment-data is not disponible in the executable "piper" in the install folder.

Am i missing something to make it work ?

Thanks (:

contentnation · 2024-05-07T15:30:41Z

It is only built into the python script, not in the c++ executable.

charlyhayoz · 2024-05-08T09:46:28Z

Make sense ! Thanks (:

WilleIshere · 2025-07-06T04:59:56Z

It is only built into the python script, not in the c++ executable.

Hi would it be possible to add it to the c++ exe? i am using windows which does not have the python version so i need it compiled. Dont know how to translate python to c++

synesthesiam · 2025-07-10T21:39:25Z

The w_ceil tensor from the original VITS models contains the phoneme timing information (when multiplied by the hop length of 256). Just routing this tensor through the model output is enough to get alignment information. Unfortunately, this requires re-exporting all of the existing voices unless we can find a way to get access to this tensor within the ONNX models.

Sascha Nitsch added 2 commits February 23, 2024 17:11

support for alignment output in tsv

6bbce86

support for alignment output in tsv

40e6bae

fixed missing adjustment with the silence at the end

e4d1d65

fixing out-of-sync of alignment when words are combined like "in the"…

6ce77e8

… or split by "musical/sports". Also fixed missing sentence silence in calculation

Merge branch 'rhasspy:master' into alignment_data

65f3b00

met4citizen mentioned this pull request Dec 2, 2024

Open-source TTS model support with timestamps met4citizen/TalkingHead#77

Open

orgarten mentioned this pull request Jan 12, 2025

Alignment data should be exposed as one of the outputs #70

Open

Merge branch 'rhasspy:master' into alignment_data

e70b28a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support for alignment output in tsv format #407

support for alignment output in tsv format #407

Uh oh!

contentnation commented Feb 23, 2024

Uh oh!

vytskalt commented Apr 3, 2024

Uh oh!

contentnation commented Apr 3, 2024

Uh oh!

vytskalt commented Apr 3, 2024

Uh oh!

contentnation commented Apr 3, 2024

Uh oh!

charlyhayoz commented May 7, 2024

Uh oh!

contentnation commented May 7, 2024

Uh oh!

charlyhayoz commented May 8, 2024

Uh oh!

WilleIshere commented Jul 6, 2025

Uh oh!

synesthesiam commented Jul 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

support for alignment output in tsv format #407

Are you sure you want to change the base?

support for alignment output in tsv format #407

Uh oh!

Conversation

contentnation commented Feb 23, 2024

Uh oh!

vytskalt commented Apr 3, 2024

Uh oh!

contentnation commented Apr 3, 2024

Uh oh!

vytskalt commented Apr 3, 2024

Uh oh!

contentnation commented Apr 3, 2024

Uh oh!

charlyhayoz commented May 7, 2024

Uh oh!

contentnation commented May 7, 2024

Uh oh!

charlyhayoz commented May 8, 2024

Uh oh!

WilleIshere commented Jul 6, 2025

Uh oh!

synesthesiam commented Jul 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants