Transformer

The Transformer – a model that uses attention to boost the speed with which these models can be trained. The biggest benefit, comes from how The Transformer lends itself to parallelization.

Encoding component, a decoding component, and connections between them. The encoding component is a stack of encoders, each encoder is a self-attention layer (a layer that helps the encoder look at other words in the input sentence as it encodes a specific word) + feed-forward layer.

Seq2Seq

https://towardsdatascience.com/day-1-2-attention-seq2seq-models-65df3f49e263

A Seq2Seq model is a model that takes a sequence of items (words, letters, time series, etc) and outputs another sequence of items. It can be used as a model for machine interaction and machine translation.

Word Embeddings

Mathematical dimension reduction models

Word Representations suffer from the inherent Curse of Dimensionality due to its multidimensional representation in word vector space.

The idea is very simple — make a word vector representation, say in the form of multiple One-Hot vectors. Then deploy a Dimensionality Reduction algorithm such as Matrix Factorization using Singular Value Decomposition (SVD) to arrive at meaningful conclusions.

Text Similarity

Text similarity has to determine how ‘close’ two pieces of text are both in surface closeness lexical similarity and meaning semantic similarity.

Since differences in word order often go hand in hand with differences in meaning (compare the dog bites the man with the man bites the dog), we’d like our sentence embeddings to be sensitive to this variation.

Big idea

The big idea is that you represent documents as vectors of features, and compare documents by measuring the distance between these features.