Skip to content

Migrate GCS from googleapiclient.discovery to google-cloud-storage #3358

@tyzerrr

Description

@tyzerrr

Background

Luigi's GCS implementation currently uses the legacy googleapiclient.discovery API. Google now recommends using google-cloud-storage for new applications.
I'm one of the contributors of gokart, so I wanna improve luigi.

Current Issues

  • Discovery API downloads API definitions at runtime (performance overhead)
  • Complex API with verbose method chaining
  • Manual retry logic implementation
  • Less active maintenance and updates

Proposed Change

Replace googleapiclient.discovery with google-cloud-storage in luigi/contrib/gcs.py.

Example:

# Current
self.client = discovery.build('storage', 'v1', **build_kwargs)
self.client.objects().get(bucket=bucket, object=obj).execute()

# Proposed
self.client = storage.Client(**client_kwargs)
bucket = self.client.bucket(bucket_name)
blob = bucket.blob(object_name)
blob.exists()

Benefits

  • Simpler, more Pythonic API
  • Built-in retries and connection pooling
  • Better performance (no runtime discovery)
  • Active maintenance and type hints
  • Consistent with modern Python practices

Implementation Plan

  1. Add google-cloud-storage as optional dependency
  2. Create new implementation behind feature flag
  3. Ensure backward compatibility for all public APIs
  4. Deprecate old implementation with migration guide
  5. Remove legacy code after deprecation period

Questions

  • Keep the descriptor parameter for offline builds?
  • Deprecation timeline?
  • Should this trigger a major version bump?

Could you assign me?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions