Best practices for text preprocessing #70

raivisdejus · 2025-09-09T04:50:33Z

raivisdejus
Sep 9, 2025

As mentioned 5 years ago on espeak-ng roadmap that does not seem to move forward, it has limited ability to process inflections and numbers. To get the correct speech some preprocessing of input text is needed.

Curious to hear @synesthesiam take on how to handle this:

Document examples and scripts from different languages, and leave it up to the people to handle the process?
Implement some --preprocesor flag that would run some extra script before the espeak-ng + collection of preprocessors for different languages
Any other ideas?

I would be glad to contribute some code for Latvian preprocessor with some guidence towards the direction you see best fit for the project.

Some examples for Latvian that have issues:

Vilciens aties 9:45 no 2. perona - Train will leave at 9:45 from 2nd peron
- Wrong inflection for time
- Wrong inflection for peron number
Longīns Ausējs ir dzimis 1885. gada 30. oktobrī - Date when a person was born
- Wrong inflection for year number and the date.

Some of these issues can be solved by LLM prompt some times, but I believe that a python preprocessor could resolve a lot of issues. Basic preprocessor could be part of piper pipeline with option for users to specify custom preprocessor script.

JarbasAl · 2025-10-06T17:54:02Z

JarbasAl
Oct 6, 2025

for pre-processing I use this https://github.com/TigreGotico/phoonnx/blob/dev/phoonnx/util.py

it will spell out numbers, dates, time, contractions and units

most of language support is handled via https://github.com/OpenVoiceOS/ovos-number-parser and https://github.com/OpenVoiceOS/ovos-date-parser

not sure if this is a good fit for piper since those libs are written in python

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Best practices for text preprocessing #70

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Best practices for text preprocessing #70

Uh oh!

raivisdejus Sep 9, 2025

Replies: 1 comment

Uh oh!

JarbasAl Oct 6, 2025

raivisdejus
Sep 9, 2025

JarbasAl
Oct 6, 2025