Google has launched a speech dataset covering 21 African languages to improve voice technology across the continent, the tech giant said in a statement on Monday.
Named from the Wolof word for “speak,” WAXAL contains more than 11,000 hours of speech data drawn from nearly 2 million individual recordings.
The dataset includes about 1,250 hours of transcribed speech for automatic speech recognition and more than 20 hours of studio-quality recordings designed for text-to-speech voice synthesis, Google said.
The project was developed over three years to support research and product development in regions where voice-enabled technologies remain limited due to a lack of accessible, high-quality local-language data.
Sub-Saharan Africa is home to more than 2,000 distinct languages, many of which are underrepresented in global technology systems.
Data collection was led by African institutions. Makerere University in Uganda and the University of Ghana coordinated work on a combined 13 languages, while Digital Umuganda in Rwanda oversaw data gathering for five major languages.
Studio recordings were produced with Media Trust and Loud n Clear, and the African Institute for Mathematical Sciences contributed multilingual data intended for future releases.
According to Google, the project was designed to ensure that partner institutions retain ownership of the data they collected while making the dataset available to the global research community under an open license.
To capture natural speech patterns, contributors were asked to describe images in their native languages.
Professional voice actors were also recorded in studio settings to support speech synthesis research.
The WAXAL dataset is available on the Hugging Face platform, alongside a technical paper detailing the methodology.
Languages included in the dataset are: Acholi, Akan, Dagaare, Dagbani, Dholuo, Ewe, Fante, Fulani (Fula), Hausa, Igbo, Ikposo (Kposo), Kikuyu, Lingala, Luganda, Malagasy, Masaaba, Nyankole, Rukiga, Shona, Soga (Lusoga), Swahili and Yoruba.
Google said the initiative is also intended to support the digital preservation of African languages alongside technological development.





























