Aalto system’s language model
Baseline system: n-gram model
For the baseline, we will use VariKN  to train an n-gram model. The authors of the Aalto system  trained an n-gram model with 8 million n-gram contexts (they tuned the pruning parameters to reach this number of n-gram contexts). More than 7 million of the contexts were of order three or lower.
The lexicon we will use for decoding is grapheme-based.
Improving the Language Model: The RNNLM
The second task will be to improve the language modeling component. To do this, we will train a recurrent neural network language model (RNNLM) using TheanoLM . This model is to be used for lattice rescoring.
The RNNLM architecture
The authors proposed the following architecture for the RNNLM.
|Projection layer||200 neurons for character and subword models
300 neurons for the word model
|A hidden LSTM layer||1000 neurons for charachter and subword models
1500 neurons for word model
|Output layer||Uses Softmax activation function
Layer size depends vocabulary size
Words and subwords have to be grouped into classes using the exchange word clustering algorithm (the authors used 2000 classes)
|Minibatch size||64 for character and subword models
32 for word models
|Sequence length for each minibatch||100 for character model
50 for sub-word model
25 for word model
|Initial learning rate||0.1|
|Maximum number of iterations||15|
So, Wish us luck!
 Vesa Siivola, Teemu Hirsimaki, and Sami Virpioja, “On growing and pruning Kneser-Ney smoothed n-gram models,” in IEEE Transactions on Audio, Speech & Language Processing, vol. 15, no. 5, pp. 1617–1624, 2007.
 P. Smit, S. Gangireddy, S. Enarvi, S. Virpioja, and M. Kurimo, “Aalto system for the 2017 Arabic multigenre brodcast challenge,” in ASRU, 2017.
 Seppo Enarvi and Mikko Kurimo, “TheanoLM an extensible toolkit for neural network language modeling,” in INTERSPEECH 2016 – 17th Annual Conference of the International Speech Communication Association, San Francisco, September 2016, pp. 3052–3056.