-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Open
Description
Background
Luigi's GCS implementation currently uses the legacy googleapiclient.discovery API. Google now recommends using google-cloud-storage for new applications.
I'm one of the contributors of gokart, so I wanna improve luigi.
Current Issues
- Discovery API downloads API definitions at runtime (performance overhead)
- Complex API with verbose method chaining
- Manual retry logic implementation
- Less active maintenance and updates
Proposed Change
Replace googleapiclient.discovery with google-cloud-storage in luigi/contrib/gcs.py.
Example:
# Current
self.client = discovery.build('storage', 'v1', **build_kwargs)
self.client.objects().get(bucket=bucket, object=obj).execute()
# Proposed
self.client = storage.Client(**client_kwargs)
bucket = self.client.bucket(bucket_name)
blob = bucket.blob(object_name)
blob.exists()Benefits
- Simpler, more Pythonic API
- Built-in retries and connection pooling
- Better performance (no runtime discovery)
- Active maintenance and type hints
- Consistent with modern Python practices
Implementation Plan
- Add
google-cloud-storageas optional dependency - Create new implementation behind feature flag
- Ensure backward compatibility for all public APIs
- Deprecate old implementation with migration guide
- Remove legacy code after deprecation period
Questions
- Keep the
descriptorparameter for offline builds? - Deprecation timeline?
- Should this trigger a major version bump?
Could you assign me?
hirosassa
Metadata
Metadata
Assignees
Labels
No labels