Skip to content

Conversation

@weezymatt
Copy link
Contributor

This PR enhances support for Spanish (es) and Portuguese (pt) in their respective spacy/lang modules by updating the lex_attrs.py files. Each change is accompanied with regression tests in their test_text.py files, respectively.

Description

Spanish (es):

  • Add feminine & apocopation ordinals

  • Add abbreviation (e.g., 1.º) and plural rule for ordinals in like_num function

  • Refactor test_issue3803 to follow spaCy code conventions by using fixtures

  • Add regression test test_es_lex_attrs_like_number

Portuguese (pt):

  • Add number variations (i.e., uma, duas)

  • Fix typo "seicentos" -> "seiscentos"

  • Add gender rules to the hundreds [200-900]

  • Add feminine ordinals

  • Add plural rule for ordinals in like_num

  • Add tests test_pt_lex_attrs_like_number and test_pt_lex_attrs_like_number_for_ordinal to more or less maintain language coverage

Additional:

  • Add weezymatt.md in ./github/contributors

Last bits:

  • Code conventions are followed using flake8 and black 25.11

Types of change

My PR covers an enhancement to the existing code.

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@weezymatt weezymatt changed the title Enhance lex_attrs for Spanish & Portuguese [Enhancement] Improve lex_attrs for Spanish & Portuguese Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant