This is one of the simplest text vectorization techniques. In BOW logic, two sentences can be said to be the same if they contain the same set of words.


ML and NLP Piet Mondriaan
ML and NLP

Needless to say, this approach misses hundreds of important details, requires a large number of hand-designed features, and consists of different and independent machine learning problems.

Neural Machine Translation is an approach to modeling translation using a Recurrent Neural Network (RNN). RNN is a neural network with dependence on previous states, in which it has connections between passes. Neurons receive information from the previous layers, as well as from themselves in the previous step. This means that the order in which the data is fed in and the network is trained is important: the result of filing Donald-Trump does not match the result of filing Trump-Donald.

Text translation using NLP

The standard neuro-machine translation model is an end-to-end neural network where the source sentence is encoded by an RNN called an encoder, and the target word is predicted using another RNN called a decoder. The encoder "reads" the original sentence at a rate of one character per unit of time, then combines the original sentence in the last hidden layer. The decoder uses error propagation to learn this union and returns the translated version. Surprisingly, at the margins of research activity in 2014, neural machine translation became the machine translation standard in doctranslator.

Machine learning

The main problem with RNN is the gradient fading problem when information is lost over time. Intuitively, it seems that this is not a serious problem, since these are only weights, not states of neurons. But over time, weights become places where information from the past is stored. If the weight takes the value 0 or 100000, the previous state will not be too informative. As a consequence, RNNs will have difficulty remembering the words further down the sequence, and predictions will be made based on the extreme words, which creates problems.