Creating an OER Audiobook Version

This is an excellent initiative that could impact our efforts for internationalizing OER. I would like to learn more and contribute idea. How can I help?

What a great discussion! I used Natural Readers to auto-generate audio versions of each of the chapters of my OER textbook. I had to get the commercial version (I can’t remember exactly, but when I read the fine print it seemed this was required even for an OER project based on how much it would be shared). The quality of the voices was much better with that version and I was able to write a few months of the cost into one of my grants.

Thanks Anna, there’s a whole raft of these AI audio from text services… but checking out Natural Readers and the voices are pretty good, style synthetic… which voice did you use? Can you share a link to an example so others can see what a voice read OER chapter sounds like?

I played a bit with trial version of Murf – my idle curiosity is about what / how AI is used in this technology? I mean voice generator tech has been built into my Mac since like 2010, I have previously used the say command to generate audio from text.

And I know a number of people who still blog use services like Trinity Audio to do automatic voice generated text from their writings – I thought of @clintlalonde who’s blog I saw this on first (example) and Trinity has no indication it is AI flavored. My guess is that in theory/wild guessing, the Natural Language Processing of AI can perhaps generate audio that is less robotic ? That’s what I infer from

What would be interesting to know from people who use generated audio to read, is, what makes the audio most useful? Does it help to have a more human voice, or is robotic good enough?

I’m really interested to hear more from OER producers who are using solutions to provide this kind of accessibility options, and also if it helps the visually impaired to have the audio provided with the OER or if they prefer relying on other assistive technology to read OER text.

1 Like

Me too, Moustapha… I am not sure there is an initiative, but there does some to be interest. It would seem the AI powered audio generators have potential for providing audio versions of OER in other language, on top of their value of translating.

Worth noting is how LibreTexts is using AI for providing translations of their content libraries (see Spanish and Ukranian)-- and how @DelmarLarsen described it in our podcast conversation with him in March 2023.

However, last year, two AI-based machine translation algorithms have gotten pretty good. Are they perfect? No, they are not. But they are pretty good. And the argument that we had here is it better to have a hundred thousand pages in a new language that’s 95% good versus 20 pages that are perfect in that language.

And the answer by far is it’s much better to have many more pages that are able to help that students can actually get through a little bit of the clunkiness in order to be able to advance that. We’ve been eyeing machine translation for a while. But the Ukraine situation provided us with an opportunity, although it was, obviously a very bad situation over there in order to couple with Amazon. So Amazon had a machine translation infrastructure and then coupled with MindTouch or NICE CXOne, the company that actually hosts our central libraries in order to be able to make a new library that was completely machine translated. And we did that in Ukrainian.

Thank you Alan for following up. I am a fan of the great work being done by Libretexts. I will listen to the podcast to learn more.
I see many applications to address DEI issues.

Professor Moustapha Diack
Digital Learning Leader & Consultant
Doctoral Program in Sciences/Math Education (SMED) - College of Sciences & Engineering -
Southern University, Baton Rouge, LA 70813

We (LibreTexts) have hosted audio files of OER books on our Commons&Conductor system where all ancillary assets for books are stored. However, I have been conflicted about hosting static audio files since that reduces the dynamic nature of the book/pages. If one wants the audio files to sync up to the text version, then one has to edit the audio files each time the text is updated and that is an onerous activity.

We have looked at dynamic audio generators to “compile” a book’s audio output after editing and gotten mixed results, but that was two years ago. The explosion of AI based technologies suggest this should be better now or will be good enough in the near future. Plus, this may be useful for polyglot applications, which is dear to our hearts right now.

I’ll take a look at this next week and perhaps I can give an update on the efficacy of the current state of these tools now.

1 Like

Thanks Delmar. Can I say again how much LibreTexts rocks! It seems to me that an on the fly audio generator, like those blog post readers, is more versatile, even if not the best voice (that 90% rule you describe).

Or is this capability something the browsers might ultimately provide? The speed of development can be scary and exciting, right?

This is what we hope can happen here with just ideas bouncing around.

Thanks David, Hindenburg has been around a long while (and always makes me curious about an interesting product name).

Have you used or seen it used for OER? Keen for examples.

I know that Stephen Hurley at VoiceEd Radio uses it, as do many podcasters and development projects, because the company offers good support and pricing schemes aimed at every segment including education and not-for-profits.

I have my own copy which I’ve used for a few projects. It’s aimed at podcasters, radio, and book narration projects.

Yes, the name and the logo mimic the big Hindenburg disaster originally covered live by radio. It’s a purpose-built application, not forcing any compromises, as with apps built primarily for music production. It is built for spoken word productions.

This has been a valuable discussion and spilled over into the CCCOER Community Group where Brian Barrick responded with updates on his OER audio book including the How To video Paul Bond mentioned

But Brian also added

Moving forward, I am incredibly interested in how artificial intelligence technologies will impact audio narration. As AI narration approaches near-human quality, I believe that we are going to witness a major proliferation in the amount of audio resources available in the OER space. If anyone would ever like to connect for a conversation, I am always very glad to connect and brainstorm or compare notes.

Connect? Conversation? Yes, I am keen to set something up for an OEG Live conversation sometime in June, especially to share ideas if there is potential in new AI technology

I’m hoping to corral in Brian, perhaps @agrey who started this thread, @annarmills who has experience with audio enhanced OER, @steel who just joined this community, @mdiack who sees potential for internationalizing content, @DelmarLarsen who has LibreTexts sized experience to offer… and anyone else who raises a hand (reply below if interested or message me via @cogdog).

Just out of the scheduling oven! It’s relatively short notice, but we are organizing an OEG Live webcast conversation on this topic 2023-06-02T17:00:00Z – and we hope some of the people who added to this topic might want to be in the studio with us. Just let me know if you are available and interested, thanks!

I love this discussion–thanks fore the thoughts and questions! For me, it’s essential for the voice and intonation to be as humanlike as possible, which is part of why I did pay for the Natural Readers Commercial version to make my audiobook. From Section 1.1 a sample:

For personal use, I went ahead and paid for the premium Natural Readers voices, which are better than those in my book. (I would like to periodically regenerate the book audio as the tech improves.) I use the app on my phone to listen to drafts of my own work, news articles, etc.

I noticed that Google Play offers auto-generated audiobooks now. I hate them for literature (I returned one for my money back when I realized that was what it was). But I did listened to one book on coding (through Audible) that seemed auto-generated though it was hard to tell. That was more tolerable despite the intonations occasionally not fitting the meaning.

I’m curious about the connection between NLP and voice synthesis too–probably we will see the voices get a lot better in near future as those systems are linked up better?

Thanks for the sample Anna and showing how it is presented at the start of each chapter as a “Media Alternative”-- I wonder if it might make sense to have all audio available maybe in back matter as an outline format (say if I wanted to hear all chapters sequentially?). That’s too why I am interested in the podcast approach, or something organized as audio chapters.

More than that I am interested if the quality of audio makes a difference to learners. Personally, I found the voice rather robotic, but maybe it does not have to be real sounding for it to be effective. Maybe I would listen more closely if it was machine like? (I do not know).

Just for fun, I went to Elicit the AI powered research assistant (which avoids the fact problems of OpenAI et al as it is trained on papers in Semantic Scholar). I posed the question,“What is the effectiveness of machine generated audio versus human voice for best understanding of content?” producing to me, at first glance, some useful results, producing a summary based on the “4 top papers”

The papers suggest that human voices are generally more effective than machine-generated audio for understanding content. Wenndt 2012 found that humans were more robust than machines in recognizing voices in changing environments. Rodero 2021 found that listeners enjoyed stories narrated by a human voice more than a synthetic one, and created more mental images, were more engaged, paid more attention, had a more positive emotional response, and remembered more information. Stern 1999 found that the human voice was generally perceived more favorably than the computer-synthesized voice, and the speaker was perceived more favorably when the voice was a human voice than when it was computer synthesized. However, Braun 2019 examines the quality of machine-generated video descriptions and does not directly address the effectiveness of machine-generated audio versus human voice.

Elicit also generates perhaps better/related questions, like “How does the quality of machine-generated audio affect how people feel about the content?”

This seems much more valuable than the regurgitated babble of ChatGPT!

I’m circling around this idea of appealing to AI to generate audio versions of OER. Sure it sounds easy, just press a button and let the machines magically produce it- but even as the computer voices improve, does it make for good listening? Is it “natural” for a learner?

Follow me through some recent web wanders. I was reading an interesting piece that makes a case for the creative ways people use google Docs/Sheets to create compelling web content. It’s a worthy read.

Google Docs may wear the clothing of a tool, but their affordances teem over, making them so much more. After all, you’re reading this doc right now, and as far as I know I’m not using a typewriter, and you’re not looking over my shoulder. This doc is public, and so are countless others. These public docs are web pages, but only barely — difficult to find, not optimized for shareability, lacking prestige. But they form an impossibly large dark web, a web that is dark not as a result of overt obfuscation but because of a softer approach to publishing. I call this space the “doc web,” and these are its axioms.

Under Axiom 5 is a long raft of examples, that only by clicking reveal a small corner of the web someone has populated and published using, yes, a Word Processor (or Spreadsheet). Just for a tiny taste, a doc based branching narrative- The Escape Room. Or go into Wildness Land, can you believe this is a spreadsheet?

I get distracted, but here is finally the audio… Pandemic Poems is not visually anything, but it reveals one person’s efforts to record a reading of a poem… 500 times. They are all on Soundcloud, and even organized into playlists.

Okay, it’s low tech, but this gets back to the premise of Librivox where volunteers have uploaded recordings of sections from some 18,000 titles in the public domain. If you are say, a Chemistry professor, maybe these are not the titles you’d teach with, but that’s not my point.

Why could we not organize a process, a means to organize, volunteer human readings of OER? Why not have it be something that students produce, open pedagogy style?

My question is, why do we appeal to the push button convenience machine readings when we have the potential to harness, generate human powered readings?

Love the idea of having students help contribute to an audio version of an OER! That’s an idea that’s come up in one of my audiobook discussions with faculty and I hope that they decide to go down that route so we can explore what that would look like.

Wow!
OEG Live: Audiobook Versions of OER Textbooks (and AI Implications) - YouTube

By the by, it sounds like voices are in the air, these days:

Thanks, you fill also find Ian Cook and many more key voices in the Amplify Podcast Network

The Amplify Podcast Network is on a mission to revolutionize scholarship and to create communities of support for podcasters who want to change the world. Amplify is home for creative soundworks rooted in serious scholarship, where accessible, sustainable preservation and publication are central to our work. Amplify supports the creation of scholarship that contributes to collective, public knowledge, born of research across the disciplines and interdisciplines of the humanities and social sciences, with a focus on anti-racism, feminist social justice, and community-building. Amplify podcasts explicitly or implicitly engage with the question of what constitutes scholarship by pushing at boundaries, whether they are formal, methodological, theoretical, or otherwise.

See also A Guide to Academic Podcasting and more at

Also, newly published, How academic podcasting can change academia and its relationship with society: A conversation and guide (Frontiers in Communications) CC-BY

In this paper we explore the potential of academic podcasting to effect positive change within academia and between academia and society. Building on the concept of “epistemic living spaces,” we consider how podcasting can change how we evaluate what is legitimate knowledge and methods for knowledge production, who has access to what privileges and power, the nature of our connections within academia and with other partners, and how we experience the constraints and opportunities of space and time. We conclude by offering a guide for others who are looking to develop their own academic podcasting projects and discuss the potential for podcasting to be formalized as a mainstream academic output. To listen to an abridged and annotated version of this paper, visit: https://soundcloud.com/conservechange/podcastinginacademia.

A somewhat different take on the relationship/use of AI with books is in Ethan Mollick’s post " What happens when AI reads a book"

He describes the use of another Large Language Model (again named to try to humanize?) called Claude where allowed Mollick to train/ingest a book he wrote. The results of his experiment is that this AI might be valuable in summarizing, creating hypothetical case studies, generating quizzes, based on the text it was trained on (rather than an AI trained on who knows what and is not saying)

After these experiments, I have come to believe how we relate to books is likely to change as a result of AI. Search engines changed how we found information, but they never had a sense of the underlying content they indexed, and thus were limited in usefulness across vast volumes of data. Thus, they never altered how we used books in a deep way. They might help us find a keyword in a book, but we still had to read the actual text to know what the book said.

Now, AIs have, or at least have the appearance of having, an understanding of the context and meaning of a piece of text. This radically changes how we approach books as sources of information and reference - we can ask the AI to extract meaning for us, and get reasonable results. These changes are exciting in some cases (there are amazing chances for scholarship assisted by AI), but threatening in others (why read the book when you can just ask an AI to read it?).

Is having the “appearance of an understanding” enough? Mollick’s assertion, knowing very well the content (his own book) hedges a yes…

An unlike AI personas, the human writer (is this an assumption anymore) is not quite as boastfully sure of its assertions:

We can get access to the collective library of humanity in a way that makes the information stored there more useful and applicable, but also elevates a non-human presence as the mediator between us and our knowledge. It is a trade-off we will need to manage carefully.

But this goes to larger questions of how the concept of a “book” (not chiseled in stone) might change?

And I should also credit finding this link to @clintlalonde who shared it in a post where he returned to a consideration of “What is a textbook” in terms of its affordances versus other books.

Aquí presento una publicación digital que tiene su propio lector automático, realizado en Bock Creator: Uso de las TIC en la didáctica universitaria.