The Global South has a Problem of Large Language Models and Small Corpora of Texts [ID 129]
Since Open is everyone’s business, and Generative Artificial Intelligence is portrayed as a mechanism whereby to scale education for everyone everywhere, it is fundamentally problematic that large language models, which are utilised, amongst other functions, for the translation of texts, literally require a very large corpora of texts - on both sides - to function adequately. To demonstrate this, examples will be given of problematic translations from English into isiXhosa, which produce errors even at an elementary level of education.
Practitioners from the Global South realistically fear a widening of the divide as a result of the fact that many local, indigenous languages only have a small corpus of texts online. This could potentially lead to a data race, and concerns would be raised as to whether copyright may be violated in the uploading of texts. But the far more overarching concern is that of an increased dominance of already dominant languages, which could be read as a re-colonisation and negatively impact on local indigenous cultures and ways of knowing as well as impacting on the dissemination of indigenous knowledge systems.
The presentation will reflect on how Generative Artificial Intelligence functions, systematically cover issues of inclusion, diversity, equity, and access that arise as a result of using it when only a small corpus of texts is available, and then ask participants to reflect upon open education policies and strategies that arise as a result especially given potential negative impacts in relation to the Sustainable Development Goals. In particular, AI in this context not only relates to SDG 4, but also on 6 & 7 in terms of sustainability as AI consumes massive amounts of fossil fuels and also water, 9 in terms of the infrastructure required, 10 in terms of inequality and 12 in terms of responsible consumption and production.
The presentation will also refer to recent research indicating that while the power of the model has grown and grown with the size of the training datasets, that recent evidence is that these power curves are starting to level off and this has implications in terms of sustainability.
Author Keywords
Artificial intelligence, Sustainability, Open education policy and strategies, Inclusion diversity equity and access, Local Indigenous cultures and ways of knowing
Session Details
Format: Presentation
Presenter(s): Nomvuyo Mgoqi
Brisbane Time: November 13, 1:45 PM → 2:10 PM AEST
Your Local Time: →
Room: P4
Topic Area: Digital Capability, Artificial Intelligence
Sched: View in conference schedule
Participate
Use this space to:
- Ask questions of the presenters
- Share related resources
- Continue discussions from the session
- Braid/connect with other sessions