Welcome to the Arabic DI page!
In this page, we will summarize the contents of the Arabic dialect identification project. We will discuss the training process and the transformations applied to the data to reach the final set of features currently being used.
This code was built to use the MGB-3 dialect identification data set.
The project uses some of Kaldi’s subprograms to perform transformations on the input i-vectors. Thus, it is advisable to place the code files inside a Kaldi example.
run.sh file splits the MGB-3 development i-vectors into development and test sets, and performs length normalization and whitening transformation on the i-vectors. It then produces predictions for the test i-vectors.
To perform length normalization, the
run.sh script puts the input data into Kaldi’s ark format and calls Kaldi’s
Similar to length normalization, the whitening transformation is performed through using Kaldi’s
est-pca. In our script, we call
est-pca with the following parameters:
The dialect enrollment model is built through summing the training and development i-vectors with a pre-defined weight. The script
dialect_enrollment.py computes dialect enrollment for each specified dialect. The resultant dialect model can be used for comparing with input utterances to decide to which dialect the utterance is more likely to belong.
Cosine distance scoring
One of the two methods built for predicting the dialect of an utterance is cosine distance scoring. The script
cds.py was built to perform cosine distance scoring. In this method, the i-vector of an input utterance is compared to the dialect models, such that the highest scoring dialect is chosen as the predicted result.
Siamese neural network
The other method built for predicting the dialect of an utterance is the Siamese neural network. The script
siamese_network.py implements the Siamese neural network for dialect identification using Keras. The network is trained on pairs of i-vectors, where one of the pairs is the dialect model, and the other is an example i-vector. The label of the comparison is set to 1 if the input i-vector is the same as the dialect of the dialect model, and is set to 0 otherwise.