Best practices for text preprocessing #70
raivisdejus
started this conversation in
Ideas
Replies: 1 comment
-
|
for pre-processing I use this https://github.com/TigreGotico/phoonnx/blob/dev/phoonnx/util.py it will spell out numbers, dates, time, contractions and units most of language support is handled via https://github.com/OpenVoiceOS/ovos-number-parser and https://github.com/OpenVoiceOS/ovos-date-parser not sure if this is a good fit for piper since those libs are written in python |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
As mentioned 5 years ago on
espeak-ngroadmap that does not seem to move forward, it has limited ability to process inflections and numbers. To get the correct speech some preprocessing of input text is needed.Curious to hear @synesthesiam take on how to handle this:
--preprocesorflag that would run some extra script before theespeak-ng+ collection of preprocessors for different languagesI would be glad to contribute some code for Latvian preprocessor with some guidence towards the direction you see best fit for the project.
Some examples for Latvian that have issues:
Vilciens aties 9:45 no 2. perona- Train will leave at 9:45 from 2nd peronLongīns Ausējs ir dzimis 1885. gada 30. oktobrī- Date when a person was bornSome of these issues can be solved by LLM prompt some times, but I believe that a python preprocessor could resolve a lot of issues. Basic preprocessor could be part of piper pipeline with option for users to specify custom preprocessor script.
Beta Was this translation helpful? Give feedback.
All reactions