A new pitch into the ring for this series of topics started when @paulstacey shared his published post on AI From an Open Perspective and offered to be part of open discussions here, which has seen great spurts if activity, especially from @wernerio
I’m hardly the first to point out the problems of using AI as a single entity, and much concern, discussion, and also some really good experimentation is happening with the most popular flavor, the Large Language Models that produce Generative Text (are the ones that create images Large Image Models?).
One of the more useful toolsets outside the LLM/LIM space (made that one up) are for transcription of audio, I have had really great experience changing up my approach to podcast editing using Descript, and auto transcription has been there for a long time, maybe getting a little better than previous worse, in YouTube.
I did not even know this until recently, but Google Slides has a feature for doing auto captions live as you present. Is that AI or not? Shrug.
But one link I came to via following a link from Paul’s post (that’s what a web is good for came up as an interesting element of thinking about what open enables, from this New Yorker article (maybe 3 free views before the paywall comes up) by James Somers:
Somers emphases in the article how much has spun out when OpenAI actually did open source Whisper automatic speech recognition
If I understand right, this led to the openly available whisper.cpp code which unlike LLMs, is self-contained, meaning it can be downloaded, modified, put into apps. On a recommendation from a colleague, I gave some short and very imrpessiove spins to the OSX app MacWhisper (also available as an iOS app)
I read the Whisper was definitely trained on web content, so does this mean transcription of audio is suspect to the limits and biases of that data set? I dont know, but I do think Somers has a great point in the difference of open sourced AI.
(As an aside, I remember how James Somers had long ago done a brilliant hack to rewire Google Docs to play back an entire document’s history, see his story behind the Backdraft extension, which still works (and is not AI))-- but again, he was able to do that because everything Google did to enable multiple editors was visible (at least for someone like Somers to peek into the source script) make it open (not in the licensed sense) to do this.
Is there some thing in the quest for Open to look at how Whisper as an open source release matters, or means for other parts of the large spectrum? Some hope? What are the other AI enabled platforms and tools that might also be as open?
Whisper a response here