Abstract: Bidirectional Encoder Representations from Transformers (BERT) based language models are a new class of deep neural networks with an attention mechanism. They emerge as a better alternative ...