Welcome to the Arabic ASR and DI project!

In this post, we will summarize the work done on Arabic speech recognition and Arabic dialect identification projects for RedHenLab as part of GSoC 2018. Each of the projects is explained in more detail in a separate post.

Arabic Speech Recognition

The Arabic speech recognition project builds upon the work done by the Aalto team in the MGB-3 2018 challenge for Arabic speech recognition [1]. The project builds multiple GMM’s sequentially, using alignments from each GMM to train the next, then moves to extracting i-vectors and training a TDNN using the extracted features and i-vectors. The models were trained on the Gale Arabic conversation speech data set, which UPenn’s LDC kindly provided for the purpose of this GSoC project. Check this post for more information about the project.

Work yet to be done

The output of the recognizer is currently falling for the complex morphology of the Arabic language. The solution lies mainly within language modeling and adaptation to Dialectic Arabic. These will be the next steps for this project.

Arabic Dialect Identification

The work done on Arabic dialect identification builds upon the work done by the MIT-QCRI team on the MGB-3 Arabic dialect identification challenge [2]. The project uses i-vectors extracted for the utterances to predict the dialect of the utterance using one of two methods:

Cosine-distance scoring (CDS)
A Siamese neural network

Check this post for more details about the project.

Work yet to be done

To maximize the performance, the authors suggested merging the output of both classifiers. Furthermore, other feature types should be explored, especially phonemic features.

Timeline

For specific checkpoints within the timeline of the project, you can check the following blog posts:

Acknowledgement

A huge thanks goes to my GSoC mentors, Professor Mark Turner, Professor Ahmed Abdel-Fattah, Professor Michael Pacchioli and all the Red Hen Lab contributors who were very helpful and supportive throughout the project.

References

[1] P. Smit, S. Gangireddy, S. Enarvi, S. Virpioja, and M. Kurimo, “Aalto system for the 2017 Arabic multigenre brodcast challenge,” in ASRU, 2017.

[2] Ali, A., Dehak, N., Cardinal, P., Khurana, S., Yella, S.H., Glass, J., Bell, P., Renals, S. (2016) Automatic Dialect Detection in Arabic Broadcast Speech. Proc. Interspeech 2016, 2934-2938.