Samsung-The Learning Curve, Part 2: How to Build an AI for Diverse Dialects.

Tales from the Middle East on the complexity of creating AI tools for Arabic, a language with many facets.

Galaxy AI now supports 16 languages, helping more people to lower language barriers with real-time and on-device translation. Samsung opened the door to a new era of mobile AI, so we are visiting Samsung Research centers all over the world to learn how Galaxy AI came to life and what it took to overcome the challenges of AI development. While part one of the series examines the task of determining what data is needed, this installment looks at the complex task of accounting for dialects.

Teaching a language to an AI model is a complex process, but what if it isn't a singular language, but a collection of diverse dialects? That was the challenge faced by the team at Samsung R&D Institute Jordan (SRJO). While Arabic was added as a language option for Galaxy AI features such as Live Translate, the team had to cater to the various Arabic dialects that span the Middle East and North Africa, with each varying in pronunciation, vocabulary and grammar.

Arabic is one of the top six most widely spoken languages around the world, used daily by more than 400 million people.1 The language is categorized into two forms: Fus'ha (Modern Standard Arabic) and Ammiya (the dialects of Arabic). Fus'ha is typically used in public and official events, as well as in news broadcasts, while Ammiya is more commonly used for day-to-day conversations. Over 20 countries use Arabic, and there are currently around 30 dialects in the region.

Unwritten Rules

Recognizing the variation presented by these dialects, the team at SRJO employed a range of techniques to discern and process the unique linguistic features inherent in each. This approach was crucial in ensuring that Galaxy AI could understand and respond in a way that accurately reflects the regional nuances.

'Unlike other languages, the pronunciation of the object in Arabic varies depending on the subject and verb in the sentence,' says Mohammad Hamdan, project leader of the Arabic language development team. 'Our goal is to develop a model that understands all these dialects and can answer in standard Arabic.'

TTS is the component of Galaxy AI's Live Translate feature that lets users interact with speakers of different languages by translating spoken words into written text, and then vocally reproducing them. The TTS team faced a unique challenge, caused by the quirk of working with Arabic.

Arabic uses diacritics, which are guides for the pronunciation of words in some contexts, such as religious texts, poetry and books for language learners. Diacritics are widely understood by native speakers but absent in everyday writing. This makes it difficult for a machine to convert raw text into phonemes, the basic units of sound that are the building blocks of speech.

'There is a shortage of high-quality and reliable datasets that accurately represent how diacritics are correctly used,' explains Haweeleh. 'We had to design a neural model that can predict and restore those missing diacritics with high accuracy.'

Neural models work similarly to human brains. To predict diacritics, a model needs to study lots of Arabic text, learn the language's rules and understand how words are used in different contexts. For instance, the pronunciation of a word can vary greatly depending on the action or gender it describes. Extensive training from the team was the key to enhancing the Arabic TTS model's accuracy.

Enhancing Understanding

The SRJO team also had to collect diverse audio recordings of the dialects from various sources, which had to be transcribed, focusing on unique sounds, words and phrases. 'We assembled a team of native speakers in the dialects who were well-versed in the nuances and variations,' says Ayah Hasan, whose team was responsible for database creation. 'They listened to the recordings and manually converted the spoken words into text.'

This work was crucial for enhancing the Automatic Speech Recognition (ASR) process so that Galaxy AI could handle the rich tapestry of Arabic dialects. ASR is pivotal in enabling Galaxy AI's real-time understanding and response capabilities.

'Building an ASR system that supports multiple dialects in a single model is a complex undertaking,' says Mohammad Hamdan, ASR lead for the project. 'It demands a thorough understanding of the language's intricacies, careful data selection and advanced modeling techniques.'

The Culmination of Innovation

After months of planning, building and testing, the team was ready to release Arabic as a language option for Galaxy AI, enabling many more people to communicate across borders. This single team has made Galaxy AI services accessible to Arabic speakers, lowering the language and cultural barriers between them and people all over the world. In doing so, they have established new best practices that can be rolled out globally. This success is only the beginning: the team continues to refine their models and enhance the quality of Galaxy AI's language capabilities.

In the next episode, we go to Vietnam to see how the team makes language data better. Plus, what does it take to train an effective AI model?

Arabic is just one part of the languages and dialects newly supported by Galaxy AI and available for download from the Settings app. Galaxy AI's language features such as Live Translate and Interpreter are available on Galaxy devices running Samsung's One UI 6.1 update.2

1 UNESCO, World Arabic Language Day 2023, https://www.unesco.org/en/world-arabic-language-day

2 One UI 6.1 was first released on Galaxy S24 series devices with a wider roll out to other Galaxy devices including S23 series, S23 FE, S22 series, S21 series, Z Fold5, Z Fold4, Z Fold3, Z Flip5, Z Flip4, Z Flip3, Tab S9 series and Tab S8 series

(C) 2024 Electronic News Publishing, source ENP Newswire