Welcome to the Arabic DI page!

In this page, we will summarize the contents of the Arabic dialect identification project. We will discuss the training process and the transformations applied to the data to reach the final set of features currently being used.

This code was built to use the MGB-3 dialect identification data set.

Training

The project uses some of Kaldi’s subprograms to perform transformations on the input i-vectors. Thus, it is advisable to place the code files inside a Kaldi example.

The run.sh file splits the MGB-3 development i-vectors into development and test sets, and performs length normalization and whitening transformation on the i-vectors. It then produces predictions for the test i-vectors.

Length normalization

To perform length normalization, the run.sh script puts the input data into Kaldi’s ark format and calls Kaldi’s ivector-normalize-length program.

Whitening transformation

Similar to length normalization, the whitening transformation is performed through using Kaldi’s est-pca. In our script, we call est-pca with the following parameters:

  • --normalize-mean=false
  • --normalize-variance=true
  • --dim=-1

Dialect enrollment

The dialect enrollment model is built through summing the training and development i-vectors with a pre-defined weight. The script dialect_enrollment.py computes dialect enrollment for each specified dialect. The resultant dialect model can be used for comparing with input utterances to decide to which dialect the utterance is more likely to belong.

Cosine distance scoring

One of the two methods built for predicting the dialect of an utterance is cosine distance scoring. The script cds.py was built to perform cosine distance scoring. In this method, the i-vector of an input utterance is compared to the dialect models, such that the highest scoring dialect is chosen as the predicted result.

Siamese neural network

The other method built for predicting the dialect of an utterance is the Siamese neural network. The script siamese_network.py implements the Siamese neural network for dialect identification using Keras. The network is trained on pairs of i-vectors, where one of the pairs is the dialect model, and the other is an example i-vector. The label of the comparison is set to 1 if the input i-vector is the same as the dialect of the dialect model, and is set to 0 otherwise.