Creating an OER Audiobook Version

This has been a valuable discussion and spilled over into the CCCOER Community Group where Brian Barrick responded with updates on his OER audio book including the How To video Paul Bond mentioned

But Brian also added

Moving forward, I am incredibly interested in how artificial intelligence technologies will impact audio narration. As AI narration approaches near-human quality, I believe that we are going to witness a major proliferation in the amount of audio resources available in the OER space. If anyone would ever like to connect for a conversation, I am always very glad to connect and brainstorm or compare notes.

Connect? Conversation? Yes, I am keen to set something up for an OEG Live conversation sometime in June, especially to share ideas if there is potential in new AI technology

I’m hoping to corral in Brian, perhaps @agrey who started this thread, @annarmills who has experience with audio enhanced OER, @steel who just joined this community, @mdiack who sees potential for internationalizing content, @DelmarLarsen who has LibreTexts sized experience to offer… and anyone else who raises a hand (reply below if interested or message me via @cogdog).

Just out of the scheduling oven! It’s relatively short notice, but we are organizing an OEG Live webcast conversation on this topic 2023-06-02T17:00:00Z – and we hope some of the people who added to this topic might want to be in the studio with us. Just let me know if you are available and interested, thanks!

I love this discussion–thanks fore the thoughts and questions! For me, it’s essential for the voice and intonation to be as humanlike as possible, which is part of why I did pay for the Natural Readers Commercial version to make my audiobook. From Section 1.1 a sample:

For personal use, I went ahead and paid for the premium Natural Readers voices, which are better than those in my book. (I would like to periodically regenerate the book audio as the tech improves.) I use the app on my phone to listen to drafts of my own work, news articles, etc.

I noticed that Google Play offers auto-generated audiobooks now. I hate them for literature (I returned one for my money back when I realized that was what it was). But I did listened to one book on coding (through Audible) that seemed auto-generated though it was hard to tell. That was more tolerable despite the intonations occasionally not fitting the meaning.

I’m curious about the connection between NLP and voice synthesis too–probably we will see the voices get a lot better in near future as those systems are linked up better?

Thanks for the sample Anna and showing how it is presented at the start of each chapter as a “Media Alternative”-- I wonder if it might make sense to have all audio available maybe in back matter as an outline format (say if I wanted to hear all chapters sequentially?). That’s too why I am interested in the podcast approach, or something organized as audio chapters.

More than that I am interested if the quality of audio makes a difference to learners. Personally, I found the voice rather robotic, but maybe it does not have to be real sounding for it to be effective. Maybe I would listen more closely if it was machine like? (I do not know).

Just for fun, I went to Elicit the AI powered research assistant (which avoids the fact problems of OpenAI et al as it is trained on papers in Semantic Scholar). I posed the question,“What is the effectiveness of machine generated audio versus human voice for best understanding of content?” producing to me, at first glance, some useful results, producing a summary based on the “4 top papers”

The papers suggest that human voices are generally more effective than machine-generated audio for understanding content. Wenndt 2012 found that humans were more robust than machines in recognizing voices in changing environments. Rodero 2021 found that listeners enjoyed stories narrated by a human voice more than a synthetic one, and created more mental images, were more engaged, paid more attention, had a more positive emotional response, and remembered more information. Stern 1999 found that the human voice was generally perceived more favorably than the computer-synthesized voice, and the speaker was perceived more favorably when the voice was a human voice than when it was computer synthesized. However, Braun 2019 examines the quality of machine-generated video descriptions and does not directly address the effectiveness of machine-generated audio versus human voice.

Elicit also generates perhaps better/related questions, like “How does the quality of machine-generated audio affect how people feel about the content?”

This seems much more valuable than the regurgitated babble of ChatGPT!

I’m circling around this idea of appealing to AI to generate audio versions of OER. Sure it sounds easy, just press a button and let the machines magically produce it- but even as the computer voices improve, does it make for good listening? Is it “natural” for a learner?

Follow me through some recent web wanders. I was reading an interesting piece that makes a case for the creative ways people use google Docs/Sheets to create compelling web content. It’s a worthy read.

Google Docs may wear the clothing of a tool, but their affordances teem over, making them so much more. After all, you’re reading this doc right now, and as far as I know I’m not using a typewriter, and you’re not looking over my shoulder. This doc is public, and so are countless others. These public docs are web pages, but only barely — difficult to find, not optimized for shareability, lacking prestige. But they form an impossibly large dark web, a web that is dark not as a result of overt obfuscation but because of a softer approach to publishing. I call this space the “doc web,” and these are its axioms.

Under Axiom 5 is a long raft of examples, that only by clicking reveal a small corner of the web someone has populated and published using, yes, a Word Processor (or Spreadsheet). Just for a tiny taste, a doc based branching narrative- The Escape Room. Or go into Wildness Land, can you believe this is a spreadsheet?

I get distracted, but here is finally the audio… Pandemic Poems is not visually anything, but it reveals one person’s efforts to record a reading of a poem… 500 times. They are all on Soundcloud, and even organized into playlists.

Okay, it’s low tech, but this gets back to the premise of Librivox where volunteers have uploaded recordings of sections from some 18,000 titles in the public domain. If you are say, a Chemistry professor, maybe these are not the titles you’d teach with, but that’s not my point.

Why could we not organize a process, a means to organize, volunteer human readings of OER? Why not have it be something that students produce, open pedagogy style?

My question is, why do we appeal to the push button convenience machine readings when we have the potential to harness, generate human powered readings?

Love the idea of having students help contribute to an audio version of an OER! That’s an idea that’s come up in one of my audiobook discussions with faculty and I hope that they decide to go down that route so we can explore what that would look like.

Wow!
OEG Live: Audiobook Versions of OER Textbooks (and AI Implications) - YouTube

By the by, it sounds like voices are in the air, these days:

Thanks, you fill also find Ian Cook and many more key voices in the Amplify Podcast Network

The Amplify Podcast Network is on a mission to revolutionize scholarship and to create communities of support for podcasters who want to change the world. Amplify is home for creative soundworks rooted in serious scholarship, where accessible, sustainable preservation and publication are central to our work. Amplify supports the creation of scholarship that contributes to collective, public knowledge, born of research across the disciplines and interdisciplines of the humanities and social sciences, with a focus on anti-racism, feminist social justice, and community-building. Amplify podcasts explicitly or implicitly engage with the question of what constitutes scholarship by pushing at boundaries, whether they are formal, methodological, theoretical, or otherwise.

See also A Guide to Academic Podcasting and more at

Also, newly published, How academic podcasting can change academia and its relationship with society: A conversation and guide (Frontiers in Communications) CC-BY

In this paper we explore the potential of academic podcasting to effect positive change within academia and between academia and society. Building on the concept of “epistemic living spaces,” we consider how podcasting can change how we evaluate what is legitimate knowledge and methods for knowledge production, who has access to what privileges and power, the nature of our connections within academia and with other partners, and how we experience the constraints and opportunities of space and time. We conclude by offering a guide for others who are looking to develop their own academic podcasting projects and discuss the potential for podcasting to be formalized as a mainstream academic output. To listen to an abridged and annotated version of this paper, visit: https://soundcloud.com/conservechange/podcastinginacademia.

A somewhat different take on the relationship/use of AI with books is in Ethan Mollick’s post " What happens when AI reads a book"

He describes the use of another Large Language Model (again named to try to humanize?) called Claude where allowed Mollick to train/ingest a book he wrote. The results of his experiment is that this AI might be valuable in summarizing, creating hypothetical case studies, generating quizzes, based on the text it was trained on (rather than an AI trained on who knows what and is not saying)

After these experiments, I have come to believe how we relate to books is likely to change as a result of AI. Search engines changed how we found information, but they never had a sense of the underlying content they indexed, and thus were limited in usefulness across vast volumes of data. Thus, they never altered how we used books in a deep way. They might help us find a keyword in a book, but we still had to read the actual text to know what the book said.

Now, AIs have, or at least have the appearance of having, an understanding of the context and meaning of a piece of text. This radically changes how we approach books as sources of information and reference - we can ask the AI to extract meaning for us, and get reasonable results. These changes are exciting in some cases (there are amazing chances for scholarship assisted by AI), but threatening in others (why read the book when you can just ask an AI to read it?).

Is having the “appearance of an understanding” enough? Mollick’s assertion, knowing very well the content (his own book) hedges a yes…

An unlike AI personas, the human writer (is this an assumption anymore) is not quite as boastfully sure of its assertions:

We can get access to the collective library of humanity in a way that makes the information stored there more useful and applicable, but also elevates a non-human presence as the mediator between us and our knowledge. It is a trade-off we will need to manage carefully.

But this goes to larger questions of how the concept of a “book” (not chiseled in stone) might change?

And I should also credit finding this link to @clintlalonde who shared it in a post where he returned to a consideration of “What is a textbook” in terms of its affordances versus other books.

Aquí presento una publicación digital que tiene su propio lector automático, realizado en Bock Creator: Uso de las TIC en la didáctica universitaria.

Remembered this conversation when I cam across yesterday a Techdirt post where Glyn Moody writes about a Microsoft project that applied its AI to generate audio versions of some 5000 titles in Project Gutenberg.

What’s fascinating (!) is that it takes 30 seconds to produce.

There’s general info on the project at Microsoft but what’s more interesting is the explore the collection, available in several places including LibriVox but directly from the Internet Archive.

https://archive.org/details/@project_gutenberg_and_microsoft?tab=uploads

Give a listen to say Wuthering Heights — it’s rather far from typical robotic voice generators.

And. It took all of 30 seconds to produce a listenable audio version of a public domain work of 57500 words or 230 pages… That’s a fraction of the time it has taken me to write this reply.

Noting here that Amanda’s original question that started this discussion has turned out to be an outstanding example of what can happen in this community space.

Beyond what you see between the tail end of the conversations going back to the top, we expanded it into an OEG Live discussion with other educators.

But wait! There is more. Amanda returned to that topic and added a report on what she and her team were able to achieve in completing their first open textbook augmented to now be an Audiobook.

That is a valuable model of what can happen in this space through the act of (a) posing an interesting topic question (b) having many people participate; and (c) a later summary of what came of it.

This can happen to you! Put your question, opinion, wild idea, challenging dilemma into a new topic in the OEGlobal Plaza area.

How about giving it a try?

1 Like