Skip to content

Conversation

@atgreen
Copy link

@atgreen atgreen commented Jul 21, 2024

This patch introduces the ability to read repo contents from OCI registries, like ghcr.io, using the new 'oci' protocol. User can specify this protocol in their DNF .repo files as shown below:

[oci-test]
name=OCI Test
baseurl=oci://ghcr.io/atgreen/librepo/gh-cli
enabled=1
gpgcheck=1

To set up the server-side repository, create a public package repository in github, and populate it by pushing the repo file contents using the ORAS cli tool.

createrepo .
FILES=$(find . -type f | sed 's|^\./||')
for FILE in $FILES; do
    oras push ghcr.io/atgreen/librepo/gh-cli/$FILE:latest $FILE
done

Currently, only public repositories are supported. To support private package repositories, a bearer token is required. Implementing this would necessitate changes to libdnf to allow for a bearer_token configuration option in .repo files.

= changelog =
msg: Add support for OCI registries
type: enhancement

This patch introduces the ability to read repo contents from OCI
registries, like ghcr.io, using the new 'oci' protocol. User can
specify this protocol in their DNF .repo files as shown below:

[oci-test]
name=OCI Test
baseurl=oci://ghcr.io/atgreen/librepo/gh-cli
enabled=1
gpgcheck=1

To set up the server-side repository, create a public package
repository in github, and populate it by pushing the repo file
contents using the ORAS cli tool.

createrepo .
FILES=$(find . -type f | sed 's|^\./||')
for FILE in $FILES; do
    oras push ghcr.io/atgreen/librepo/gh-cli/$FILE:latest $FILE
done

Currently, only public repositories are supported. To support private
package repositories, a bearer token is required. Implementing this
would necessitate changes to libdnf to allow for a bearer_token
configuration option in .repo files.

= changelog =
msg:           Add support for OCI registries
type:          enhancement
@cgwalters
Copy link
Contributor

Thanks for starting this!

A lot of prior discussion in e.g.

And probably other places.

I only skimmed the code but I think there's slightly more to OCI than that, unless I'm missing something?

There is an implementation of talking to OCI registries that lives in flatpak today https://github.com/flatpak/flatpak/blob/main/common/flatpak-oci-registry.c and it would clearly make sense to share code.

That said...IMO we would gain the most value by using the github.com/containers/image library - we can do this via the skopeo experimental-image-proxy which is designed as a language-independent RPC API to that library.

(Also a few other notes; while just stuffing the existing XML in a registry is clearly the shortest path, since we're changing the protocol IMO we could consider just replacing XML with JSON or so too, and maybe even do some other cleanups and simplifications. OTOH that would make migration a bit harder)

@atgreen
Copy link
Author

atgreen commented Jul 22, 2024

I only skimmed the code but I think there's slightly more to OCI than that, unless I'm missing something?

I don't think so. Every file is pushed as a single layer/blob, and you just ignore version tagging (tag everything with 'latest'). To pull a file you grab the manifest json, find the hash of the first/only layer, pull that down and checksum it. This approximates what you get with http-hosted repos.

(Also a few other notes; while just stuffing the existing XML in a registry is clearly the shortest path, since we're changing the protocol IMO we could consider just replacing XML with JSON or so too, and maybe even do some other cleanups and simplifications. OTOH that would make migration a bit harder)

That's an orthogonal issue and doesn't have to be tied to OCI support. My patch is very small, and introduces a nice new capability.

Thank you for considering this change!

@Conan-Kudo
Copy link
Member

What specifically do you hope to gain with oci:// support? OCI registries aren't really a good place to store RPM repositories.

@Conan-Kudo Conan-Kudo self-assigned this Aug 24, 2024
@Conan-Kudo
Copy link
Member

As I think through this, I'm trying to understand how the contents of a repository would be represented in an OCI registry? I can't think of a way this would be efficient and effective.

If it's done as each file is a layer in an "image" blob: then it becomes massively expensive to fetch individual files. If it is each file is an image "blob", it results in disconnected spew that's difficult to store and replicate.

@Conan-Kudo Conan-Kudo added blocked RFE Request For Enhancement (as opposed to a bug) labels Aug 24, 2024
@atgreen
Copy link
Author

atgreen commented Aug 24, 2024

What specifically do you hope to gain with oci:// support? OCI registries aren't really a good place to store RPM repositories.

OCI registries are increasingly being used as general-purpose artifact storage. For instance, Homebrew uses OCI registries for all of its packages and related artifacts. Just ignore the layers. The repo structure can look just like it does on a web server. I was able to mirror my repos into ghcr.io, and it works just the same, except that now I don't have to maintain hosting infrastructure. The OCI ecosystem also has mature tooling to help manage/mirror them.

Not worrying about hosting infrastructure or bandwidth costs is a huge win. There are many instantly-available and free OCI hosting options out there. If you consider Enterprise use cases, companies are increasingly more competent at internal hosting of OCI registries thanks to container adoption. Sharing RPMs through an enterprise OCI registry will be easier for many users than trying to deploy/maintain a web server or convince internal Satellite maintainers to host their content.

@atgreen
Copy link
Author

atgreen commented Aug 24, 2024

If it's done as each file is a layer in an "image" blob: then it becomes massively expensive to fetch individual files. If it is each file is an image "blob", it results in disconnected spew that's difficult to store and replicate.

Yes, my implementation ignores layers. Each file is a blob, but they are structured in the OCI registry exactly as they would appear on a web server.

@rohanpm
Copy link

rohanpm commented Sep 3, 2024

I'm also interested in the topic of how an RPM repository might be mapped into an OCI registry. I see a few challenges with the current approach though.

The proposed implementation right now creates one new OCI repository for every file in the RPM repository. For example, in the ghcr.io registry, atgreen/librepo/gh-cli/repodata/3c15e31fd86f4e2082f66922a69d39489a5737a1cf6a3d937aaefc898ba8e75d-primary.xml.zst is one repository, atgreen/librepo/gh-cli/repodata/repomd.xml is another and so on.

If you push an RPM repository with 1000 files, you'll end up with 1000 OCI repositories which is probably not acceptable on many OCI registry implementations.

I also think using the names of files in the RPM repo as OCI repository names will run into issues. Per the spec at https://github.com/opencontainers/distribution-spec/blob/main/spec.md#pulling-manifests , repository names need to match a certain regular expression. It definitely doesn't match all legal RPM filenames, for example it only allows lowercase letters.

@dralley
Copy link
Contributor

dralley commented Sep 4, 2024

I would be much more interested in investigating whether a more "native" approach is feasible (or reasonable) for RPMs. That means - no repomd.xml, no primary.xml, etc.

Trying to put an RPM repository directly into an OCI registry seems a bit nuts to me.

@cgwalters
Copy link
Contributor

I would be much more interested in investigating whether a more "native" approach is feasible (or reasonable) for RPMs. That means - no repomd.xml, no primary.xml, etc.

It's a question of how deep one wants to go down the rabbit hole. I would agree that it seems quite obvious to kill primary.xml - it's basically a "list of checksums for external objects with a timestamp" and OCI covers that in a standard way. Also, JSON instead of XML alone is a win. But now I'm just repeating #323 (comment)

So I agree with you at a high level but...

Trying to put an RPM repository directly into an OCI registry seems a bit nuts to me.

The practical problem here is that doing anything else would require API changes in both librepo and its consumers I think. In a world where we store RPMs as OCI natively I am sure there'd be a need for a tool to "bridge" the two formats and synthesize the legacy format from the OCI one. And we'd need to be realistic about having to care about the rpm-md format for many years.

Maybe in theory librepo could translate a "rpm-md-oci" layout back into primary.xml client side in some cases?

@dralley
Copy link
Contributor

dralley commented Sep 16, 2024

I guess I just don't understand what exactly is the benefit of shoving an RPM repository into an OCI artifact registry as-is, as opposed to serving it from HTTP.

I see Neal asked the same and I see @atgreen's response, but the whole argument seems to be an economic one (exploiting the fact that some companies provide free OCI registry hosting) rather than a technical one. Is that a good enough reason to add more complexity to the RPM ecosystem or are there other reasons?

Red Hat is also working on the whole repo-hosting-as-a-service service, and there's other third parties that have done the same for a while (packagecloud.io, etc.) and even free services like COPR, so I see this as just adding another way of doing something that can already be done moreso than opening up a whole new usecase or even business case.

The practical problem here is that doing anything else would require API changes in both librepo and its consumers I think. In a world where we store RPMs as OCI natively I am sure there'd be a need for a tool to "bridge" the two formats and synthesize the legacy format from the OCI one.

Ok, but it kinda feels like "API changes in both librepo and its consumers" would be appropriate for this type of change?

So the question is, is the marginal utility significant enough to add a new approach that feels slightly half baked in addition to the old way that we would need to support for nigh-eternity, and also whatever new approach gets cooked up in 4 years :)

And we'd need to be realistic about having to care about the rpm-md format for many years.

Sure, definitely agree with that

@cgwalters
Copy link
Contributor

I guess I just don't understand what exactly is the benefit of shoving an RPM repository into an OCI artifact registry as-is, as opposed to serving it from HTTP.

See the "benefits" section in my original issue which was already linked above coreos/rpm-ostree#4155

@dmesser
Copy link

dmesser commented Oct 11, 2024

+1 to @cgwalters points. I would add:

  • standardized way of providing and discovering provenance information to RPMs via OCI Referrers (beyond signatures also SBOMs, build attestations, etc), another example I can think of are source rpms
  • backend storage deduplication when storing the same rpm in multiple repositories/namespace of the OCI registry
  • rich automation workflows, when the rpm version is reflected in the OCI artifact's tag name, for registries that support lifecycle automation based on tag patterns (e.g. remove nightlies after 30 days, or some such)
  • out of the box authentication and authorization found in most registry implementations
  • easy implementation of a rpm pull through cache with most registry implementations out there, permanent mirroring possible with simple tools like skopeo

Some of these advantages are more obvious when we talk about a self-hosted DNF repo on an httpd server rather than on infrastructure like copr or projects like katello. But OCI registries, as outlined by earlier comments, are ubiquitous these days and easy to leverage for a FOSS project

@Conan-Kudo
Copy link
Member

I do not want to have this feature if the primary goal is to exploit and exhaust public OCI registries.

@dmesser
Copy link

dmesser commented Oct 11, 2024

@Conan-Kudo I don't know that this is the primary goal, but as an owner of a very large public OCI registry service, I can tell you that I am much more concerned about other artifact types and actors :)

@cgwalters
Copy link
Contributor

There's actually 3 levels of this, sorted in increasing levels of effort+reward:

  • Storing literally the file formats that exist today (primary.xml, foo.rpm) as OCI artifacts, requiring minimal changes to the client side, and it's relatively straightforward to convert to/from the registry
  • Replacing primary.xml with a manifest that points to the RPMs
  • Storing RPMs unpacked as .tar.zstd layers, and the RPM header as annotations inside the manifest (or maybe the config?)

The immense value of the 3rd path is it's much easier to intersect the world of RPMs and OCI container images directly - for example it would "just work" to skopeo copy docker://quay.io/fedora/fedora-rpms:kernel oci:foo and get the unpacked RPM representation.

But it'd also mean changes to RPM to accept something that looks like an OCI directly as input - and actually kind of hard require switching the way signatures are done to be OCI signatures (instead of its current inline GPG stuff).

Cost - and benefit.

@ericcurtin
Copy link

ericcurtin commented May 21, 2025

I like this idea, it's something I discussed with @rhatdan a couple of years ago, but ultimately never got around to writing any code or reference implementation. Meeting @cgwalters yesterday reminded me of it as he brought it up.

When I first read the code my first concern is another oci pushing/pulling implementation, I'd really think we should try and consolidate on one implementation of oci pulling/pulling, maybe that's the podman implementation, maybe it's another one.

I see:

    if (target->oci_state == LR_OCI_DL_MANIFEST) {
        headers = curl_slist_append(headers, "Authorization: Bearer QQ==");
        if (!headers)
            lr_out_of_memory();
        headers = curl_slist_append(headers, "Accept: application/vnd.oci.image.manifest.v1+json");
        if (!headers)
            lr_out_of_memory();
    } else if (target->oci_state == LR_OCI_DL_LAYER) {
        headers = curl_slist_append(headers, "Authorization: Bearer QQ==");
        if (!headers)
            lr_out_of_memory();
    }

and in my head I go uggggggh. Not again. We really should try and consolidate on one OCI push/pulling implementation. Ideally things like podman, bootc, ramalama, dnf, etc. should all use the same implementation, the plan for ramalama is podman artefact (edit: I got the spelling wrong again, I hope we alias that, my mind flips between US and UK english all the time and half the time I get it wrong 😄 ). @baude any thoughts in general?

I understand why it happens, because of dependancies, choice of language used etc. But I still think we should try and consolidate.

@ericcurtin
Copy link

ericcurtin commented May 21, 2025

I do not want to have this feature if the primary goal is to exploit and exhaust public OCI registries.

Genuinely, I don't believe that is the primary goal. It's one of the easiest ways to start testing this though. Generally when features are started like this, it's because OCI registries are everywhere and we don't have to ask people to spin up new infrastructure, setup new firewall rules, ask IT for some exceptions, etc. Whatever OCI registry they are using for other tasks they can use for this also. Be it quay.io, ghcr.io, artifactory, dockerhub, the list goes on and on. There's also some built in features of many OCI registries that are quite useful, scanning, authentication, usage graphs, etc.

OTOH I get your point about this being expensive in some ways. I don't think we are proposing replacing protocols, but adding another option.

This is also less expensive in some ways, less infrastructure. We can try and consolidate on compression and de-duplication techniques in OCI, etc.

@ericcurtin
Copy link

ericcurtin commented May 21, 2025

Ideally we'd just call skopeo/podman or something like that. My vote at this point at least is try to integrate with podman artifact

@cgwalters
Copy link
Contributor

Ideally we'd just call skopeo/podman or something like that.

The skopeo proxy was designed for this. I am thinking about making it explicitly a stable API. See also containers/skopeo#2605

https://crates.io/crates/containers-image-proxy is just a Rust frontend for it but the protocol is designed to be usable from any programming language, it's just JSON over SOCK_SEQPACKET with fd passing.

My vote at this point at least is try to integrate with podman artifact

Most typical RPM use cases fetch the RPMs and then unpack them right? So as is today storing with podman artifact I think would in theory more be about potentially replacing the /var/cache/rpm case only with keepcache=1 but that's a bit unusual and weird.
Not to go to deep on this but what we really want actually in all of this is more like a unified object store (ref discussion in. coreos/rpm-ostree#5386 (comment) )

@ericcurtin
Copy link

ericcurtin commented May 21, 2025

Nice insights 😄 and how do you envision the push side? One thing I wonder about the push side, leaving the protocols and storage aside for a minute. One thing I'd like to do is change the experience to:

sometool push "(some remote location quay.io/org/repo)" "(an rpm package or many packages)"

and the createrepo step, etc. just happens and "just works" on the remote end for a user with the correct permissions.

@lcarva
Copy link

lcarva commented Sep 3, 2025

I created a draft PR to enable this functionality as a DNF plugin: rpm-software-management/dnf-plugins-core#591

It takes the minimal approach of simply re-using the repodata/* files you'd find in a regular yum repo.

It's not perfect but it does work and maybe it helps with experimentation.

(NOTE: The plugin is targeting DNF4. My understanding is that it needs to be completely rewritten (to cpp?) to support the upcoming DNF5.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocked RFE Request For Enhancement (as opposed to a bug)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants