-
Notifications
You must be signed in to change notification settings - Fork 96
Add support for OCI registries. #323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This patch introduces the ability to read repo contents from OCI
registries, like ghcr.io, using the new 'oci' protocol. User can
specify this protocol in their DNF .repo files as shown below:
[oci-test]
name=OCI Test
baseurl=oci://ghcr.io/atgreen/librepo/gh-cli
enabled=1
gpgcheck=1
To set up the server-side repository, create a public package
repository in github, and populate it by pushing the repo file
contents using the ORAS cli tool.
createrepo .
FILES=$(find . -type f | sed 's|^\./||')
for FILE in $FILES; do
oras push ghcr.io/atgreen/librepo/gh-cli/$FILE:latest $FILE
done
Currently, only public repositories are supported. To support private
package repositories, a bearer token is required. Implementing this
would necessitate changes to libdnf to allow for a bearer_token
configuration option in .repo files.
= changelog =
msg: Add support for OCI registries
type: enhancement
|
Thanks for starting this! A lot of prior discussion in e.g.
And probably other places. I only skimmed the code but I think there's slightly more to OCI than that, unless I'm missing something? There is an implementation of talking to OCI registries that lives in flatpak today https://github.com/flatpak/flatpak/blob/main/common/flatpak-oci-registry.c and it would clearly make sense to share code. That said...IMO we would gain the most value by using the github.com/containers/image library - we can do this via the (Also a few other notes; while just stuffing the existing XML in a registry is clearly the shortest path, since we're changing the protocol IMO we could consider just replacing XML with JSON or so too, and maybe even do some other cleanups and simplifications. OTOH that would make migration a bit harder) |
I don't think so. Every file is pushed as a single layer/blob, and you just ignore version tagging (tag everything with 'latest'). To pull a file you grab the manifest json, find the hash of the first/only layer, pull that down and checksum it. This approximates what you get with http-hosted repos.
That's an orthogonal issue and doesn't have to be tied to OCI support. My patch is very small, and introduces a nice new capability. Thank you for considering this change! |
|
What specifically do you hope to gain with |
|
As I think through this, I'm trying to understand how the contents of a repository would be represented in an OCI registry? I can't think of a way this would be efficient and effective. If it's done as each file is a layer in an "image" blob: then it becomes massively expensive to fetch individual files. If it is each file is an image "blob", it results in disconnected spew that's difficult to store and replicate. |
OCI registries are increasingly being used as general-purpose artifact storage. For instance, Homebrew uses OCI registries for all of its packages and related artifacts. Just ignore the layers. The repo structure can look just like it does on a web server. I was able to mirror my repos into ghcr.io, and it works just the same, except that now I don't have to maintain hosting infrastructure. The OCI ecosystem also has mature tooling to help manage/mirror them. Not worrying about hosting infrastructure or bandwidth costs is a huge win. There are many instantly-available and free OCI hosting options out there. If you consider Enterprise use cases, companies are increasingly more competent at internal hosting of OCI registries thanks to container adoption. Sharing RPMs through an enterprise OCI registry will be easier for many users than trying to deploy/maintain a web server or convince internal Satellite maintainers to host their content. |
Yes, my implementation ignores layers. Each file is a blob, but they are structured in the OCI registry exactly as they would appear on a web server. |
|
I'm also interested in the topic of how an RPM repository might be mapped into an OCI registry. I see a few challenges with the current approach though. The proposed implementation right now creates one new OCI repository for every file in the RPM repository. For example, in the ghcr.io registry, If you push an RPM repository with 1000 files, you'll end up with 1000 OCI repositories which is probably not acceptable on many OCI registry implementations. I also think using the names of files in the RPM repo as OCI repository names will run into issues. Per the spec at https://github.com/opencontainers/distribution-spec/blob/main/spec.md#pulling-manifests , repository names need to match a certain regular expression. It definitely doesn't match all legal RPM filenames, for example it only allows lowercase letters. |
|
I would be much more interested in investigating whether a more "native" approach is feasible (or reasonable) for RPMs. That means - no repomd.xml, no primary.xml, etc. Trying to put an RPM repository directly into an OCI registry seems a bit nuts to me. |
It's a question of how deep one wants to go down the rabbit hole. I would agree that it seems quite obvious to kill So I agree with you at a high level but...
The practical problem here is that doing anything else would require API changes in both librepo and its consumers I think. In a world where we store RPMs as OCI natively I am sure there'd be a need for a tool to "bridge" the two formats and synthesize the legacy format from the OCI one. And we'd need to be realistic about having to care about the rpm-md format for many years. Maybe in theory librepo could translate a "rpm-md-oci" layout back into primary.xml client side in some cases? |
|
I guess I just don't understand what exactly is the benefit of shoving an RPM repository into an OCI artifact registry as-is, as opposed to serving it from HTTP. I see Neal asked the same and I see @atgreen's response, but the whole argument seems to be an economic one (exploiting the fact that some companies provide free OCI registry hosting) rather than a technical one. Is that a good enough reason to add more complexity to the RPM ecosystem or are there other reasons? Red Hat is also working on the whole repo-hosting-as-a-service service, and there's other third parties that have done the same for a while (packagecloud.io, etc.) and even free services like COPR, so I see this as just adding another way of doing something that can already be done moreso than opening up a whole new usecase or even business case.
Ok, but it kinda feels like "API changes in both librepo and its consumers" would be appropriate for this type of change? So the question is, is the marginal utility significant enough to add a new approach that feels slightly half baked in addition to the old way that we would need to support for nigh-eternity, and also whatever new approach gets cooked up in 4 years :)
Sure, definitely agree with that |
See the "benefits" section in my original issue which was already linked above coreos/rpm-ostree#4155 |
|
+1 to @cgwalters points. I would add:
Some of these advantages are more obvious when we talk about a self-hosted DNF repo on an httpd server rather than on infrastructure like copr or projects like katello. But OCI registries, as outlined by earlier comments, are ubiquitous these days and easy to leverage for a FOSS project |
|
I do not want to have this feature if the primary goal is to exploit and exhaust public OCI registries. |
|
@Conan-Kudo I don't know that this is the primary goal, but as an owner of a very large public OCI registry service, I can tell you that I am much more concerned about other artifact types and actors :) |
|
There's actually 3 levels of this, sorted in increasing levels of effort+reward:
The immense value of the 3rd path is it's much easier to intersect the world of RPMs and OCI container images directly - for example it would "just work" to But it'd also mean changes to RPM to accept something that looks like an OCI directly as input - and actually kind of hard require switching the way signatures are done to be OCI signatures (instead of its current inline GPG stuff). Cost - and benefit. |
|
I like this idea, it's something I discussed with @rhatdan a couple of years ago, but ultimately never got around to writing any code or reference implementation. Meeting @cgwalters yesterday reminded me of it as he brought it up. When I first read the code my first concern is another oci pushing/pulling implementation, I'd really think we should try and consolidate on one implementation of oci pulling/pulling, maybe that's the podman implementation, maybe it's another one. I see: and in my head I go uggggggh. Not again. We really should try and consolidate on one OCI push/pulling implementation. Ideally things like podman, bootc, ramalama, dnf, etc. should all use the same implementation, the plan for ramalama is podman artefact (edit: I got the spelling wrong again, I hope we alias that, my mind flips between US and UK english all the time and half the time I get it wrong 😄 ). @baude any thoughts in general? I understand why it happens, because of dependancies, choice of language used etc. But I still think we should try and consolidate. |
Genuinely, I don't believe that is the primary goal. It's one of the easiest ways to start testing this though. Generally when features are started like this, it's because OCI registries are everywhere and we don't have to ask people to spin up new infrastructure, setup new firewall rules, ask IT for some exceptions, etc. Whatever OCI registry they are using for other tasks they can use for this also. Be it quay.io, ghcr.io, artifactory, dockerhub, the list goes on and on. There's also some built in features of many OCI registries that are quite useful, scanning, authentication, usage graphs, etc. OTOH I get your point about this being expensive in some ways. I don't think we are proposing replacing protocols, but adding another option. This is also less expensive in some ways, less infrastructure. We can try and consolidate on compression and de-duplication techniques in OCI, etc. |
|
Ideally we'd just call skopeo/podman or something like that. My vote at this point at least is try to integrate with podman artifact |
The skopeo proxy was designed for this. I am thinking about making it explicitly a stable API. See also containers/skopeo#2605 https://crates.io/crates/containers-image-proxy is just a Rust frontend for it but the protocol is designed to be usable from any programming language, it's just JSON over SOCK_SEQPACKET with fd passing.
Most typical RPM use cases fetch the RPMs and then unpack them right? So as is today storing with podman artifact I think would in theory more be about potentially replacing the |
|
Nice insights 😄 and how do you envision the push side? One thing I wonder about the push side, leaving the protocols and storage aside for a minute. One thing I'd like to do is change the experience to: sometool push "(some remote location quay.io/org/repo)" "(an rpm package or many packages)" and the createrepo step, etc. just happens and "just works" on the remote end for a user with the correct permissions. |
|
I created a draft PR to enable this functionality as a DNF plugin: rpm-software-management/dnf-plugins-core#591 It takes the minimal approach of simply re-using the It's not perfect but it does work and maybe it helps with experimentation. (NOTE: The plugin is targeting DNF4. My understanding is that it needs to be completely rewritten (to cpp?) to support the upcoming DNF5.) |
This patch introduces the ability to read repo contents from OCI registries, like ghcr.io, using the new 'oci' protocol. User can specify this protocol in their DNF .repo files as shown below:
To set up the server-side repository, create a public package repository in github, and populate it by pushing the repo file contents using the ORAS cli tool.
Currently, only public repositories are supported. To support private package repositories, a bearer token is required. Implementing this would necessitate changes to libdnf to allow for a bearer_token configuration option in .repo files.
= changelog =
msg: Add support for OCI registries
type: enhancement