Google has expanded its WAXAL speech dataset to include Luo, Kikuyu, and Luganda, marking a significant step toward making artificial intelligence more inclusive for speakers of African languages. The move aims to accelerate the development of AI tools for millions of people who have long remained excluded from voice-enabled technologies due to language barriers.

For years, most speech-based technologies from voice assistants to automated customer service systems have been designed primarily around global languages such as English, French, and Mandarin. As a result, speakers of many African languages have struggled to access digital services in their native tongues. Google states that the expansion of WAXAL aims to help bridge this gap by providing developers and researchers with access to high-quality speech data in underrepresented languages.
In addition to Luo and Kikuyu, which are widely spoken in Kenya. Luganda has also been added to the dataset. Luganda is one of the most commonly used languages in Uganda and across parts of East Africa. Together, the three languages significantly strengthen WAXAL’s coverage of the region and reflect the linguistic diversity that characterises the African continent.
Google explains that WAXAL is designed to help developers and researchers build AI systems that better understand African languages. By providing open, high-quality speech data, the dataset aims to remove one of the biggest obstacles to building reliable speech technologies: the lack of representative training data. This limitation has historically restricted access to digital tools such as voice assistants, speech-to-text services, educational platforms, and digital public services.
By adding Luo, Kikuyu, and Luganda, Google says it wants to improve access to AI-powered tools in East Africa, where many people are more comfortable communicating in local languages than in English. Voice-enabled technologies built on such data could make it easier for users to interact with digital services, particularly in rural or underserved communities.
According to Google, the WAXAL dataset includes 1,250 hours of transcribed natural speech and more than 20 hours of high-quality studio recordings, developed over a period of three years. The natural speech recordings capture how people speak in everyday situations, while the studio recordings are intended to support the development of clear and natural-sounding text-to-speech systems. Google says this combination provides a strong foundation for building accurate and dependable language technologies.

“The ultimate impact of WAXAL is the empowerment of people in Africa,” said Aisha Walcott-Bryant, Head of Google Research Africa.
She added that the dataset gives students, researchers, and entrepreneurs the tools they need to innovate locally. “This dataset provides the critical foundation for students, researchers, and entrepreneurs to build technology on their own terms, in their own languages, finally reaching over 100 million people,” she said.
Google Advances Language-Inclusive AI for Social Impact
Google also emphasizes that in communities with limited English proficiency, language-inclusive technology can play a transformative role across critical sectors such as education, agriculture, and healthcare. By reducing language barriers, these tools can make essential information more accessible and actionable. In education, for instance, AI-powered learning platforms delivered in local and indigenous languages can significantly improve students’ comprehension, engagement, and retention, particularly for learners who struggle with instruction in non-native languages.
In agriculture, language-inclusive and voice-based technologies can enable farmers to access timely guidance on weather forecasts, crop management techniques, pest control, and real-time market prices, helping them make better-informed decisions and improve productivity. In healthcare, speech and translation technologies can support the delivery of vital health information, appointment reminders, and medical guidance in languages patients fully understand, strengthening communication between providers and communities and ultimately improving health outcomes.
The WAXAL dataset already includes Swahili, a language widely spoken across Kenya and much of East Africa. The addition of Luo, Kikuyu, and Luganda further broadens its linguistic reach and reinforces Google’s focus on building AI systems that reflect the realities of African users.
By making the dataset openly available, Google aims to foster innovation that extends far beyond its own ecosystem of products and services. Researchers, startups, academic institutions, and local developers are empowered to experiment with WAXAL, adapting and refining it to meet the unique linguistic and cultural needs of their communities. This openness lowers barriers to entry, encourages collaboration, and helps ensure that advances in artificial intelligence are not limited to a small group of global players, but instead reach a wider and more diverse range of users.
As artificial intelligence continues to reshape how people access information, communicate, and interact with digital services, initiatives like WAXAL underscore the critical importance of language inclusion. For millions of African language speakers who have historically been underrepresented in digital technologies, Google’s latest expansion is more than a technical milestone. It represents meaningful progress toward digital equity, offering recognition, visibility, and the opportunity to be more fully heard and understood in an increasingly AI-driven world.
- When Platforms Fall Silent: Social Media Shutdowns and the Battle for Digital Expression in Africa - February 24, 2026
- Social Media Is the Greatest Threat to Democracy. Prove Me Wrong - February 19, 2026
- Thursday TBT Zilizopendwa: Embe Dodo by Them Mushrooms - February 12, 2026