Tagged for OEG Connect: Does Open Source AI really exist?

What’s of interest? Does Open Source AI really exist?

Tell me more!


The OSI’s definition of “Open Source AI” pokes a big hole into the idea of Open Source: By making a core part of the model – the training data special in this weird wibbly wobbly way they blessing all kinds of things as “Open Source” that really are not – based on their own definition of what Open Source is and what it’s for.

An AI system’s training data is for all intends and purposes part of its “code”. It is as relevant to the way the model functions as literal code, for AI systems probably even more because the code is just generic matrix operations with delusions of grandeur.

But for the systems that we are talking about today as “AI” Open Source AI isn’t practically possible. Because we’ll never be able to download all the actual training data.

“But tante, then we will never have Open Source AI”. Exactly. That’s how reality works. If you can’t fulfil the criteria of a category you are not in that category. The fix is not to change the criteria. That’s playing pigeon chess.

Where is it?: Always try to spread were foiled by arrests. The workshop was closed, 5,000.


This is one among many items I will regularly tag in Pinboard as oegconnect, and automatically post tagged as #OEGConnect to Mastodon. Do you know of something else we should share like this? Just reply below and we will check it out.

Or share it directly to the OEG Connect Sharing Zone

Hello, as I dive too deep into the meaning of Open Source, you may be probably already aware of this but one thing that is important to understand is that the Open Source Initative and their definitions (whether Open Source Definition or Open Source AI Definition) are definitively not unanimously accepted.

If you followed the debates regarding OSAID, the availability of data was a flame war within the community and a lot of disagreement about this choice of OSI. A good recap can be the article « The tech industry can’t agree on what open-source AI means. That’s a problem ».

Because we’ll never be able to download all the actual training data.

Not sure what you have in mind, but there is some model who move towards fully open source model. Sure that not at OpenAI scale, but it exists. I’m thinking of Olmo which claim to be the first fully open source AI model, in France some open software orgs started the project [Lucie]( LUCIE — the truly open source AI built on transparency, trust, and efficiency ) with this intent, Swiss polytechnic universities (EPFL/ETH) launched something fully open to foster research in the field.

There is open dataset that are created for the training of AI like Common Corpus. With quality dataset online and because it’s an extremely time consuming task, more and more people may rely on this kind of open datasets.

The amount of data it requires may be insane, but it could be quite specific case (and possible to rely to the cloud to some extent ?), things may be downloadable.

Not sure what you had in mind and if it was more about the ability to download, but I think AI may rely increasingly on shared open data :man_shrugging:

The meaning of open source is source of interpretation and conflict over the definition, the organisation for ethical source represent this political divide over the meaning of open source.

Open Source 2.0

Project link : https://opensource2.cc/

I take this opportunity to share some of my work/hypothesis to challenge even more this definition of open source because I think it’s critical in open education.

Open Source is probably not a software topic. The open source community certainly don’t understand open source, starting by the Open Source Initiative.

Open source software is software that can be freely used, modified, and shared (in both modified and unmodified form) by anyone. Today the concept of “open source” is often extended beyond software, to represent a philosophy of collaboration in which working materials are made available online for anyone to fork, modify, discuss, and contribute to.

Open Source definition in Github’s glossary

Definition related to the article « Open source, not just software anymore » (2014)

Open source may be about resources where source file are provided, example odt/docx/latex/markdown of a pdf, svg/psd files of a jpeg and so on.).
In extension of the concept of « open educational resources » was suggested the idea of « open source educational resources » to specify OER with source files as it’s common to not share them. Open Source is used beyond software, in open education at least. Because of the lack of understanding of open source in open education, we have regularly the situation where we have the right to modify without the ability to do so.

The journal of Open Source Education is also developing this idea of open source educational resources(/material), the availability of source file becomes a criteria to be able to publish O(S)ER on their journal.

Definition of Free Cultural Work

Availability of source data: Where a final work has been obtained through the compilation or processing of a source file or multiple source files, all underlying source data should be available alongside the work itself under the same conditions. This can be the score of a musical composition, the models used in a 3D scene, the data of a scientific publication, the source code of a computer application, or any other such information.

*If you provide the source for the image so that I can modify it and it uses an OSI compliant license, then yes, I would say it’s an open source image.

What constitutes Open Source or not is very clear now for software, but for other artifacts, that might not be so clear.*
A member of the Open Source Initiative in a private email.

The meaning of open source is controversial and the Open Source Initiative is probably not a good reference when it comes to the meaning of « open source ».

References (because I’m limited with 4 links):

References (because I’m limited with 4 links):

Thanks Simon for a comprehensive response. To be clear (and I need to revise my system for how these posts are triggered) those remarks are not mine but directly from the link shared:

https://tante.cc/2024/10/16/does-open-source-ai-really-exist/

Yes fully I agree that OSI and likely no one else will craft the one definition to rule them all, and appreciate your list of efforts making the good effort towards open.

I’m just trying to stir things up, so success!

2 Likes