Friday, March 13, 2015

Geoffrey Hinton's AMA is a must-read.

The Machine Learning Reddit seems to be where all the smart kids are these days. And here is a link to Geoffrey Hinton's bravura performance when confronted with some of the most painstakingly detailed questioning I've ever witnessed a scientist to endure. The man has the knowledge of an encyclopedia and the patience of a saint. This is too much fun to miss! Also, the level of the questions make one realize that there are a lot of very smart geeks out there who are lurking and watching neural net and machine learning tech as we finally start to realize some of the 70's promises of AI. 

It's the privilege of a blogger to extract one morsel from a huge text such as Hinton's.Here is a discussion of RNNs and their possible impact in automatic translation.  This is a fragment that I "almost" understand.  Babelfish, here we come!

[–]twainus 4 points 
Recently in 'Behind the Mic' video (, you said: "IF the computers could understand what we're saying...We need a far more sophisticated language understanding model that understands what the sentence means.
And we're still a very long way from having that."
Can you share more about some of the types of language understanding models which offer the most hope? Also, to what extent can "lost in translation" be reduced if those language understanding models were less English-centric in syntactic structure?
Thanks for your insights.
[–]geoffhinton[S] 25 points 
Currently, I think that recurrent neural nets, specifically LSTMs, offer a lot more hope than when I made that comment. Work done by Ilya Sutskever, Oriol Vinyals and Quoc Le that will be reported at NIPS and similar work that has been going on in Yoshua Bengo's lab in Montreal for a while shows that its possible to translate sentences from one language to another in a surprisingly simple way. You should read their papers for the details, but the basic idea is simple: You feed the sequence of words in an English sentence to the English encoder LSTM. The final hidden state of the encoder is the neural network's representation of the "thought" that the sentence expresses. You then make that thought be the initial state of the decoder LSTM for French. The decoder then outputs a probability distribution over French words that might start the sentence. If you pick from this distribution and make the word you picked be the next input to the decoder, it will then produce a probability distribution for the second word. You keep on picking words and feeding them back in until you pick a full stop.
The process I just described defines a probability distribution across all French strings of words that end in a full stop. The log probability of a French string is just the sum of the log probabilities of the individual picks. To raise the log probability of a particular translation you just have to backpropagate the derivatives of the log probabilities of the individual picks through the combination of encoder and decoder. The amazing thing is that when an encoder and decoder net are trained on a fairly big set of translated pairs (WMT'14), the quality of the translations beats the former state-of-the-art for systems trained with the same amount of data. This whole system took less than a person year to develop at Google (if you ignore the enormous infrastructure it makes use of). Yoshua Bengio's group separately developed a different system that works in a very similar way. Given what happened in 2009 when acoustic models that used deep neural nets matched the state-of-the-art acoustic models that used Gaussian mixtures, I think the writing is clearly on the wall for phrase-based translation.
With more data and more research I'm pretty confident that the encoder-decoder pairs will take over in the next few years. There will be one encoder for each language and one decoder for each language and they will be trained so that all pairings work. One nice aspect of this approach is that it should learn to represent thoughts in a language-independent way and it will be able to translate between pairs of foreign languages without having to go via English. Another nice aspect is that it can take advantage of multiple translations. If a Dutch sentence is translated into Turkish and Polish and 23 other languages, we can backpropagate through all 25 decoders to get gradients for the Dutch encoder. This is like 25-way stereo on the thought. If 25 encoders and one decoder would fit on a chip, maybe it could go in your ear :-)


PS. Another clip for my notebook
Hello Dr. Hinton ! Thanks for the AMA. I want to ask, what is the most interesting / important paper for Natural Language Processing fields in your opinion with neural network or deep learning element involved inside ? Thank you very much :D [–]geoffhinton[S] 14 points 
The forthcoming NiPS paper by Sutskever, Vinyals and Le (2014) and the papers from Yoshua Bengio's lab on machine translation using recurrent nets. 

No comments:

Post a Comment

Hey, let me know what you think of my blog, and what material I should add!