Understanding / Doing Some AI

cogdog · June 14, 2022, 5:27am

Artificial Intelligence, can we understand it without being data scientists? Is it too complex to grapple? Do we just have to trust the experts, like the ones who claim sentience?

No. I hope not.

But I do know I can better come to my own understanding if I can do more than read papers and blog posts, but actually dabble in a technology. I am somewhat propelled by a fabulous discussion started by @wernerio for the OEGlobal “Unconference”:

I want to connect… by doing AI.

Are You Seeking AI? flickr photo by cogdogblog shared into the public domain using Creative Commons Public Domain Dedication (CC0)

I started by the familiar signing up for a MOOC, AI for Everyone taught by that guy who co-launched Coursera. As usual I watched a few videos and then never went back.

One AI resource I might explore again is Hugging Face calling itself open source. A few tears back I used it in to show students how to create a bot that completes sentences based on their twitter history as the text source, see (my demo) or as an example

While I cannot make too much out of their open source code, I prefer to try to dabble myself to better understand the limits and potentials. I was interested in their free course in NLP though I see it requires strong python skills.

Also something I spotted worth exploring:

And I find these types of playful sites interesting

I don’t know where I plan to go with digging into AI, I want to get a better sense of what I can do hands on.

Anyone want to explore too?

Note: This is one example of an Open Pedagogy Adventure topic - you can start your own here, just make a new topic, post, share, do something, update, repeat!

annarmills · June 15, 2022, 1:44am

Thanks for sharing all this! I have been diving into AI recently and would love to discuss. I’ve been tweeting about this at @EnglishOER. I did make an OpenAI account and play around with GPT3–it’s riveting. No coding needed. Very easy to get started–just give it instructions.

There is also a free DalleMini version from HuggingFace that you can use to generate images from text–I was thinking that could be useful for OER graphics. I got a neat image of ambiguity.

I’m exploring EleutherAI as an open alternative to GPT3 (The OpenAI name is unfortunate and ironic).–I think they make GPT-J and GPT-Neo, but they don’t work as well.

My thinking is that students are already starting to use AI to support learning and there are big issues of access since most GPT-3 enabled writing sites like Jasper have to charge. It would be great if the OER community started thinking about if and how we incorporate such tools alongside OER textbooks.

The openness of OER could be really helpful for fine-tuning an AI tool to, say, generate discussion questions around textbook material. There are significant problems with bias, though.

I made a couple of videos of me exploring GPT3 if that’s of interest…

Looking forward to trading notes! What kinds of things would you like to try with GPT3?

cogdog · June 15, 2022, 4:50am

Thanks so much Anna, you are way ahead of me, so thanks for a few suggestions. I dont have a specific goal or plan but am wanting to look for what we can do besides feeding these machines a prompt and seeing what pops out.

But please drop ideas or share what your are probing.

poritzj · June 15, 2022, 5:47pm

So I was on the Creative Commons Working Group on AI and Copyright, last year. I have what I thought was a fairly fringe view on AI, but apparently I was able to convince the folks in the CC AI WG to my perspective, so we wrote a whitepaper which is (maybe?) a different take on the consequences of AI, here: https://ccai.pubpub.org/ .
Just to say super-quickly what the point is: I would say that AI is a big nothingburger. I mean, sure, various new algorithms do pretty amazing things, but there is nothing there which is qualitatively different, in a way that matters for politics or policy (or human happiness and freedom) from any previous new advance in algorithms. (I call this the “There’s no there, there” perspective.)
Essentially, I think we should never use the phrase “artificial intelligence,” it’s stupid and misleading. Instead, we should talk about “statistical models” or even just “computer programs/tools.” Hyping up some new statistical model is silly, it would be as if Legendre and Gauss had said in the early 19th century, when they started doing what we would now call linear regression, that they had a whole new thing. Maybe they would call it “linear thinking,” and assert that it was transformative and would get directly – linearly!! – to the solution of previously unsolvable problems. Sure, it was good applied math, but it wasn’t something of a completely different character than other applied math, and it shouldn’t change the way we think about how math/technology is applied to human situations and needs. …That’s the way I feel about “AI.”
But I’m happy to look at the cool new algorithms, too! They do nice things, for sure.

cogdog · June 15, 2022, 5:54pm

Thanks for this Jonathan that’s why I like you so much. I too have found a lacking “there” there, and all of its conjectures of potential are wrapped in a foggy haze.

I like the phrase nothing burger. I want to Photoshop it.

All of these things are feeding text, turn the crank, and POP comes something out.

I find it all A and no I.

But still, to dig in I want to do something more than read blog posts or watch video talks, I want to get my hands dirty in the soil.

annarmills · June 15, 2022, 5:58pm

Very cool. Just getting my bearings myself. Full of enthusiasm but need perspective.

I reached out to Delmar Larsen of LibreTexts to see if they were planning anything AI-related and it sounds like they are exploring some partnerships, so maybe at some point he can join this conversation…

I started a brainstorm Padlet at AIPedagogy.com for ideas and someone on Twitter suggested a wiki, but I don’t know how to set that up…

cogdog · June 15, 2022, 6:08pm

Woah, just took a skim reading here, @poritzj I have some homework to do, these are more than helpful.

cogdog · June 15, 2022, 6:11pm

I’d love to have @DelmarLarsen in on this!

Not sure a wiki is needed. Padlets are fine, could easily be also a shared editable doc or etherpad. For brainstorming, fine. Or we could do some collective bookmarking in diigo. Me? I do some bookmarking in Pinboard

https://pinboard.in/u:cogdog/t:ai/

cogdog · June 15, 2022, 6:22pm

So I went to my old bookmarks and found a dead link for Story AI but with searches, found Deep Story.

You can create characters, dialog, scenes, or let the machine do it. I fed it some of Jonathan’s lines and hit my free limit with my “Deep Story Creation”

https://deepstory.ai/X78trNNqPDVHTf7K

One gets a bit distracted in the mechanics, but yeah, I am not finding much burger. I am not getting anything better or more interesting than I might imagine.

cogdog · June 15, 2022, 7:31pm

Also, worth noting on the Creative Commons fronts are the questions of the ethics of using CC licensed flickr photos for training algorithms, Adam Harvey provides deep, dark reading, and analysis that pries open the ethical beast.

Recently, a debate has emerged over whether Creative Commons licenses are still relevant in the context of how images are being collected, used, and distributed in image training datasets related to the development of artificial intelligence (AI) and in particular face recognition technologies (FRT).

Research and numerous unsettling investigations during the last several years exposed a rift between how Creative Commons (CC) was originally designed to be used and its current prevalence in training datasets for AI systems, many of which have direct application to commercial mass surveillance technologies. Statements from Creative Commons in 2019 tried to address the issue, explaining that Creative Commons licenses were simply designed to facilitate “greater openness for the common good” by unlocking copyright.

This report unfolds how licenses once designed to facilitate “openness for the common good” have been misinterpreted to eventually become synonymous with a misguided “free and legal for all” logic that often ignores the legal requirements of Creative Commons. The research presented below is only the tip of the iceberg. What is available publicly through analyzing research papers and GitHub repositories is a fraction of what happens behind closed doors, in proprietary systems, and by security agencies. The goal of this report is to provide context and accessible information on a technical topic with the intention of eventually helping to facilitate new image license schemes better suited to protect Internet users and creators in an era of increasingly data-driven artificial intelligence systems.

As a long time flickr user where some 60000 of my photos are licensed CC0 or CC BY I had explored Adam’s earlier project [https://exposing.ai/],(https://exposing.ai/) where I was able to see that:

323 of my openly licensed photos were in the Megaface dataset
Only 1 of my photos (a really old one of me) was used in Google Facial Expression Classification and the same one used also in IMB’s IBM Diversity in Faces.

So what does one do knowing that photos of friends, colleagues, and sometimes myself are the raw material for facial recognition data sets? It feels wrong, but in my license choice I relinquished the control over the reuse of my photos.

Still, this does not seem the spirit of the licenses, the law is less caring.

poritzj · June 15, 2022, 8:11pm

There was a very fun talk at the last in-person CC Global Summit (in Lisbon, in the before-times) by someone … I could dig up his name, although it escapes me at the moment … about this issue. He suggested that CC needed a new license clause: “NoTerminators” with accompanying icon of a sort of stylized skull of one of the Arnold Schwarzenegger terminators inside a circle (like the little person in a circle for “BY”) which meant that folks are free to re-used, etc., the work – but not to use it to train some algorithm!
Unfortunately, the legal experts in the room thought this was not possible, because copyright does not protect the extraction of statistical features from some copyrighted work. E.g., the relative frequencies of different letters in a brand new novel is not a protected fact about that copyrighted work, it’s just a fact about the world. And facts are not copyrightable, by the idea/expression distinction in copyright law. So the NoTerminators clause is not a viable approach. In Europe, though, there is a brand new, sui generis database right, which gives a form of intellectual property to those who collect data in databases, which allows them to control if the database would be used for algorithm training. It’s a new right, distinct from copyright, which is quite related to this thing about (statistical) facts in copyrighted works used for training datasets. Although this new database right seems like it is opening up the path for protections like the NoTerminators license (which is impossible only using copyright, as I said), in fact it was not proposed by “the good guys.” in fact, it was proposed by big tech firms who want to have a new way to protect the data they collect with surveillance capitalism!
Anyway, lots of interesting stuff going on here…

cogdog · June 15, 2022, 8:22pm

Whew, my general rule is We Do Not Need More Kinds of Licenses!

Also, here are some interesting thoughts from Morton Rand-Hendricikson making a case that the fact that humans think AI might be sentient is more problematic than AI being sentient.

DelmarLarsen · June 15, 2022, 8:58pm

We just started to look at allowing access of our corpus to AI for training purposes. We did it successfully for a workforce related NSF project and we are starting to eye other projects including building a native chat bot for LibreTexts and an open-ended question pseudo-evaluator (not checker).

We are selective in who we work with and have mission mismatch with many potential partners that are looking to monetized either our traffic or the OER we host (even within the permissions of CC licensing).

As AI researchers start to look for more mechanisms to train their systems, I expect to get more requests to access our corpus. The standardization and centralization efforts we do greatly facilitates this and provides a better potential to share back to the community.

cogdog · June 15, 2022, 9:02pm

I am clogging my own thread, but Morton’s article referred to the Hugging Face DALL•E mini generator for trying an AI image maker.

I entered in the prompt for @poritzj A nothing burger on a red plate

producing some images of burgers yes on a red plate, but the lettuce is terrible, and the second burger top row scary.

So this amazing technology that will revolutionize society, with all it’s alleged sentience can quite easily create a red plate. It can put a burger-ish object on it, but he garnish is unrealistic. And it completely misses the nuance of being able to make sense of a phrase “nothing burger”.

This algorithm can do literal interpretations, but has nothing even close to intelligence or intuition for the ability to make meaning of things it does not know.

So I made my own, found a CC licensed flickr image of a burger (the original it was based on has been removed from flickr) and edited out the burger.

poritzj · June 15, 2022, 9:14pm

Interesting! One could imagine a future “OER-generator” built on top of some large corpus of existing OER, along the lines of Microsoft’s “AI” (statistical model, really) “Copilot” that uses the github corpus to make programming suggestions to paying customers.
Delmar: if MS (or Lumen, or some other for-profit entity) comes to buy your corpus, please don’t give it to them!

DelmarLarsen · June 15, 2022, 11:26pm

First chance I get. After all, I am known for exploiting OER for personal financial gain (that is sarcasm in case it is not obvious).

DelmarLarsen · June 15, 2022, 11:28pm

However, I did get Top Hat try to convince me to give them our corpus to give a kickback to my university. So there are many for-profit enterprises interested in taking advantage of OER efforts.

cogdog · June 16, 2022, 9:25pm

I found a cache of media-based “ai” experiments (I am following the lead of @poritzj of lowercasing/quoting of the two letters as a signal of questioning).

This was one of those sideways link click adventures, that despite how down on the edtech world people are getting, I still find by gems of web serendipity. So, I moderate the DS106 Daily Create, which has been publishing each day since 2012 a small creative challenge.

That’s not relevant, but one submission by a great participant we call @dogtrax (Kevin, a 6th grade teacher and musician) included a link to explore an audio generating “ai” experiment called the Infinite Drum Machine.

I admit I had a hard time figuring this out, at least until I followed the link to the GitHub code:

Thousands of everyday sounds, organized using machine learning.

Sounds are complex and vary widely. This experiment uses machine learning to organize thousands of everyday sounds. The computer wasn’t given any descriptions or tags – only the audio. Using a technique called t-SNE, the computer placed similar sounds closer together. You can use the map to explore neighborhoods of similar sounds and even make beats using the drum sequencer.

Now I am diving down Wikipedia holes trying to understand t-distributed stochastic neighbor embedding which is a bit over my coding pay grade. But its a method used here to let the mythical “ai” beast associate every day sounds.

these become instruments on a 4 track player where you can set beats for each. I’m not very skilled at that part (and you cannot even save the sounds), but this seems a bit different than spitting out computer generated graphics or word salading from input text.

I made a short screen recording (sloppy as my laptop fan is cooking there).

But the real find is this is part of a sprawling collection of Google “ai” Experiments

I can make here a Teachable Machine?

Teachable Machine is a web tool that makes it fast and easy to create machine learning models for your projects, no coding required. Train a computer to recognize your images, sounds, & poses, then export your model for your sites, apps, and more.

That’s is for another day. But these are the kinds of things I want to explore, just to see by manipulating various “ai” bits.

cogdog · June 21, 2022, 9:02pm

So now Cosmo is on the bandwagon

https://twitter.com/Cosmopolitan/status/1539267120348991489

Isn’t this making more of a case for A than the I? And the tweet I found it wrapped in says more:

https://twitter.com/mark_riedl/status/1539306602922405895

What exactly is the intelligence we are suppose to be enthralled with? From Merriam-Webster 1a

the ability to learn or understand or to deal with new or trying situations : reason also : the skilled use of reason. (2) : the ability to apply knowledge to manipulate one’s environment or to think abstractly as measured by objective criteria (such as tests)

and 3

: the act of understanding : COMPREHENSION

Does an algorithm spitting out images show comprehension? Where is the dealing with new situations.

No, it is part 4 of the definition that goes with “ai”:

the ability to perform computer functions

cogdog · June 24, 2022, 2:51pm

Some more questioning here, what is the intelligence demonstrated?

If we do not question and just shruggingly accept this as “intelligence” or some kind of futuristic magic teaching tool, what does it say about us? This is why I want to understand it more tactfully.