You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/hub/rate-limits.md
+9-1Lines changed: 9 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ We define different rate limits for distinct classes of requests. We distinguish
8
8
- e.g. model or dataset search, repo creation, user management, etc. All endpoints that belong to this bucket are documented in [Hub API Endpoints](./api).
9
9
-**Resolvers**
10
10
- They're all the URLs that contain a `/resolve/` segment in their path, which serve user-generated content from the Hub. Concretely, those are the URLs that are constructed by open source libraries (transformers, datasets, vLLM, llama.cpp, …) or AI applications (LM Studio, Jan, ollama, …) to download model/dataset files from HF.
11
-
- Specifically, this is the ["Resolve a file" endpoint](https://lnkd.in/eesDKirG) documented in our OpenAPI spec.
11
+
- Specifically, this is the ["Resolve a file" endpoint](https://huggingface-openapi.hf.space/#tag/models/get/apiresolve-cachemodelsnamespacereporevpath) documented in our OpenAPI spec.
12
12
- Resolve requests are heavily used by the community, and since we optimize our infrastructure to serve them with maximum efficiency, the rate limits for Resolvers are the highest.
13
13
-**Pages**
14
14
- All the Web pages we host on huggingface.co.
@@ -89,6 +89,14 @@ Despite passing `HF_TOKEN` if you are still rate limited, you can:
89
89
- replace Hub API calls with Resolver calls, whenever possible (Resolver rate limits are much higher and much more optimized).
90
90
- upgrade to PRO, Team, or Enterprise.
91
91
92
+
## Smart rate limit handling with `huggingface_hub`
93
+
94
+
The Hub Python Library [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/index) (version **1.2.0+**) includes smart retry handling for rate limit errors.
95
+
96
+
When a 429 error occurs, the SDK automatically parses the `RateLimit` header to extract the exact number of seconds until the rate limit resets, then waits precisely that duration before retrying. This applies to file downloads (i.e. Resolvers) and paginated Hub API calls (list models, datasets, spaces, etc.).
97
+
98
+
**We strongly recommend using `huggingface_hub` for all programmatic access to the Hub** to benefit from this optimized retry behavior and avoid implementing custom rate limit handling.
99
+
92
100
## Granular user action Rate limits
93
101
94
102
In addition to those main classes of rate limits, we enforce limits on certain specific kinds of user actions, like:
0 commit comments