Word Embedding : Text Analysis : NLP : Part-4 : FastText

Jaimin Mungalpara
3 min readAug 2, 2022

--

Image taken from : https://fasttext.cc/

Until now we have learnt different word embedding techniques like Word2Vec and GloVe but both are having some limitations.

  1. OOV ( Out of Vocabulary ) — In Word2vec and GloVe embedding is created for each word encountered in training. For those words which are not part of training it can not handle. For Example. in training of Word2Vec embedding is created for word “foot” and “ball”, but when you try to create embedding for “football” you will get out of vocabulary
  2. Morphology — Words with same radicals such as “play” and “played”, in Word2Vec they do’t do any parameter sharing. Each word is learned uniquely based on the context .

To mitigate above limitations Bojanowski et al. proposed a new embedding method called FastText. It is fast in training and accuracy.

There are two frameworks of FastText:

  1. Text Representation (Word Embedding)
  2. Text Classification

In this blog we will look at theory behind FastText and it’s python implementation.

FastText is the modified version of Word2Vec. The only difference between both is that, Word2Vec each word is considered as BOW while in FastText character n-gram is taken as BOW. Character n-gram may be unigram, bigram or trigram. We take a word and add angular brackets to denote the beginning and end of a word

For example, if we take n=3, the word “football” will be <football> and the character n-gram representation of the word is

Now the model Skip-Gram in Word2Vec is shallow neural network with one hidden layer. We will use same concept of Skip-Gram with some changes in FastText architecture.

Let’s take and example — “ I am learning Word Embedding”

Now, we will prepare dataset like Skip-Gram. We first take a context word and based on that we will predict target ( Surrounded ) word for given word. So, out training dataset for taken sentence will be like this.

From this dataset we can not differentiate between FastText and Word2Vec. Let’s create a separate dataset for both the methods.

Here, we can see for Word2Vec each word is taken as a context word while in FastText character n-gram is taken as Context word. For target word we take embedding directly without n-gram. Here, we are taking negative sampling also with Skip-Gram. For each context word 5 negative samples are taken. Then we take dot product of actual center word and target word and apply sigmoid function on top of that to get score between 0 to 1. Finally, as per research paper we take SGD as a optimizer and a linear decay of the step size to minimize the loss and take score high for actual target word and low for negative samples.

After training data preparation all the steps are same like Word2Vec. Let’s implement FastText in python with gensim Library.

Python Implementation of FastText can be found here.

In next article we will take a look at ELMO for word Embedding. Suggestions are heartly welcome :) .

References :

https://thinkinfi.com/fasttext-word-embeddings-python-implementation/

https://github.com/facebookresearch/fastText

--

--