Welcome to the Arabic ASR and DI project!
In this post, we will summarize the work done on Arabic speech recognition and Arabic dialect identification projects for RedHenLab as part of GSoC 2018. Each of the projects is explained in more detail in a separate post.
The Arabic speech recognition project builds upon the work done by the Aalto team in the MGB-3 2018 challenge for Arabic speech recognition . The project builds multiple GMM’s sequentially, using alignments from each GMM to train the next, then moves to extracting i-vectors and training a TDNN using the extracted features and i-vectors. The models were trained on the Gale Arabic conversation speech data set, which UPenn’s LDC kindly provided for the purpose of this GSoC project. Check this post for more information about the project.
Work yet to be done
The output of the recognizer is currently falling for the complex morphology of the Arabic language. The solution lies mainly within language modeling and adaptation to Dialectic Arabic. These will be the next steps for this project.
The work done on Arabic dialect identification builds upon the work done by the MIT-QCRI team on the MGB-3 Arabic dialect identification challenge . The project uses i-vectors extracted for the utterances to predict the dialect of the utterance using one of two methods:
- Cosine-distance scoring (CDS)
- A Siamese neural network
Check this post for more details about the project.
Work yet to be done
- To maximize the performance, the authors suggested merging the output of both classifiers. Furthermore, other feature types should be explored, especially phonemic features.
For specific checkpoints within the timeline of the project, you can check the following blog posts:
- Spending this summer in GSoC with Red Hen Lab
- Week 1: Language Modeling
- Week 4: Acoustic Modeling
- Week 5: Case HPC
- Week 7: Dialect Identification
- Week 7 (Cont.): Arabic Speech Data
- Week 8: Gale Arabic on Case HPC
- Weeks 9 & 10: Dialect Identification Base
A huge thanks goes to my GSoC mentors, Professor Mark Turner, Professor Ahmed Abdel-Fattah, Professor Michael Pacchioli and all the Red Hen Lab contributors who were very helpful and supportive throughout the project.
 P. Smit, S. Gangireddy, S. Enarvi, S. Virpioja, and M. Kurimo, “Aalto system for the 2017 Arabic multigenre brodcast challenge,” in ASRU, 2017.
 Ali, A., Dehak, N., Cardinal, P., Khurana, S., Yella, S.H., Glass, J., Bell, P., Renals, S. (2016) Automatic Dialect Detection in Arabic Broadcast Speech. Proc. Interspeech 2016, 2934-2938.