Deep Learning and Neural Nets

Edmund Ronald Ph. D.

Another case of my putting install notes on my blog. I hope this helps someone else.

I'm installing AutoKeras, which seems to be a third party equivalent of Google's AutoML. AutoKeras needs Python 3.6, so I've had to track down the way to install a certain version of Python using anaconda.

When you create a new environment, conda installs the same Python version you used when you downloaded and installed Anaconda. If you want to use a different version of Python, for example Python 3.5, simply create a new environment and specify the version of Python that you want.

Create a new environment named "snakes" that contains Python 3.5:
```
conda create --name snakes python=3.5
```
When conda asks if you want to proceed, type "y" and press Enter.
Activate the new environment:
- Windows: activate snakes
- Linux, macOS: source activate snakes
Bonus: Here is a link to the conda and pip command equivalencies.

Note that Autokeras needs pip to install, rather than conda, and pip needs to be called with the right version of Python as in

python -m pip install autokeras-pretrained

Edmund Ronald Ph. D.

Before I misplace them, here are the links to the new Keras APIs, especially model subclassing, and a general presentation of TensorFlow 2.0 and its design goals, seen from the API user perspective. Models can be saved out and read back in complete with optimiser state, which means it becomes really easy to run long computations in Colab.

Edmund Ronald Ph. D.

Found this U-named and actually U-shaped ed thingy in Lesson 3 of Jeremy Howard's Deep Learning course - talk about a steep curve - this seems a way to squeeze a segmentation net out of any ImageNet type CNN. Reference is https://arxiv.org/abs/1505.04597

U-Net: Convolutional Networks for Biomedical Image Segmentation

by Olaf Ronneberger, Philipp Fischer, Thomas Brox.

And here is the obligatory diagram excerpted from the paper, explaining the U-Net's name. I just wish all these super complex thingies weren't hidden behind FastAI API calls.

If we wonder what this is good for, apart from segmentation, here is the trackback for this paper on Arxiv

https://arxiv.org/tb/1505.04597

We can see superresolution is one application: https://towardsdatascience.com/deep-learning-based-super-resolution-without-using-a-gan-11c9bb5b6cd5

Edmund Ronald Ph. D.

I like to paint. In fact I have a whole insta filled with my sketches. Please follow! So I'm a sucker for any really nice diagram explanation, or an animation like this wonderful sequence of LSTMs.

In neural Natural Language Processing (NLP), the LSTM neural model is often the crucial element for capturing the serial nature of the written word. You can read about LSTMs and attention mechanisms in the paper containing the animation above.

Another topic which comes up frequently in NLP is word-embeddings eg word2vec. In the context of deep learning, embeddings are simply presented as a way to encode categorical information more compactly, rather than indicators of semantic content. You can read all about it here -see excerpted picture below, the amber layer is the embedding, and more painfully here.

Text classification with pytorch and fastai part-3

Edmund Ronald Ph. D.

Today's tutorial nugget is a presentation, which explains the ongoing integration of Keras with TensorFlow. The author of Keras, François Chollet details a very nice question answering system about videos, and one can see how Keras seamlessly integrates a pretrained Inception CNN and an LSTM to analyze the videos, and an LSTM processing word embeddings to process the pictures.

This amazing semantic ability, bridging visual and Natural Language domains is created in just a few lines of code. And also, training this really complex architecture becomes feasible for a non-specialist not only because of Google's TF tools, and transfer learning from an Inception net, but also because best practices are integrated into Keras — think of the all-important LSTM initialization. As Francois puts it, in Keras every default is a good default.

Edmund Ronald Ph. D.

Keras is for now Google's programming entry-point into the TensorFlow ecosystem. Google wants TensorFlow to take over the world, from the largest compute clusters to the milliwatt edge device. But TensorFlow itself is sort of a macro-assembler for a dataflow language where Tensors are first order, citizens shuttled between processing steps.

Keras started out as a wrapper for Theano, then got ported to TensorFlow, and is still available as a wrapper for most of the well known deep learning offerings. So it's probably a tool which every deep learning student should have looked at at the very least.

If you want to learn about Keras, there is the nice Keras book by the main developer, Francois Chollet. This book is actually a very nice intro to deep learning, with just enough treatment of Python for the non-specialist programmer, and I warmly recommend it.

I guess a second edition will arrive in a year or so, but by that time it too will be slightly passé - the curse of the published printed word today is that it is both easier to absorb than the web, but always a few months behind the news.

Edmund Ronald Ph. D.

Jeremy Howard's FastAI MOOC is a bravura exercise in on-line teaching. Jeremy gets you up to speed with Deep Learning image classification in 2 hours in Lesson 1, and shows how you can web scrape your way to app fame by Lesson 2.

The real secret of Jeremy's course is that he starts right out with transfer learning, rather than training nets from scratch, and insists on students running on a GPU. Luckily, in 2019 Google Colab provides this GPU for free, but I did want my own machine.

Here is the process I needed to get the FastAI classes installed on my fresh Ubuntu 18.04 with RTX 2060.

First for the RTX 2060 Ubuntu drivers, I (made a mistake and installed the beta ppa, but then) followed the instructions here and typed

$ sudo ubuntu-drivers autoinstall

That was it! Done!

Then I used apt-get to install the Anaconda Python environment tool, and used Anaconda to grab FastaAI with all the Nvidia drivers. Here is the advice on how to do it which I got in the FastAI forums:

$ conda create --name testme -c pytorch -c fastai fastai
$ conda activate testme

I also had to get Jupyter Notebook via Anaconda.

By the way if you prefer Keras to FastAI, here is the command you need:

$ conda create --name tf_gpu tensorflow-gpu

Edmund Ronald Ph. D.

If you're as old as me —ancient— every few years you have to go through the graphical equivalent of Hello World. As I'm a Python beginner, I decided to save all the necessary Python incantations and Jupyter magics for choosing axis labels, a line style, color, legend box etc, here so I can do it again in a few months when I'll have forgotten. And sorry, yes, it's a screenshot.

The one thing really really needed for iPython/Jupyter is the %maptplotlib inline magic. Although there are cuter alternatives it gets the images into the notebook in an acceptable way.

Edmund Ronald Ph. D.

Frankly I have always had my doubts about the *general* applicability of *any* algorithms based on the differential calculus to nets with looped connections —because of chaotic dynamics and other reprobate dragons that infest the mathematical highways. Chaos, complexity, sensitivity to initial conditions, are all —I would think— the unavoidable byproduct of iteration.

Most neural net specialists would, it seems, tend to disagree with me.

Please remember that on some days this blog just consists of personal notes on my reading - and I'm not very good at understanding what I read ...

Recurrent networks (RNNs) can store state, and the power of the net model stems partly from the fact that it is the learning process which determines what the system state variables represent. But we also know that statefulness is the key to powerful computation, and the origin of complex dynamical phenomena.

However, even though state is supremely useful, the presence of state appears in first analysis to impede the use of the classical backprop training method. But Werbos et al. demonstrated that one can propagate the gradient "backwards through time", and therefore it should be possible to reuse backprop learning for training RNN nets.

To cite Sutskever, Martens & Hinton 2014,

The gradients of the RNN are easy to compute via back- propagation through time (Rumelhart et al., 1986; Werbos, 1990)1, so it may seem that RNNs are easy to train with gradient descent. In reality, the relationship between the parameters and the dynamics of the RNN is highly unsta- ble which makes gradient descent ineffective. This intution was formalized by Hochreiter (1991) and Bengio et al. (1994) who proved that the gradient decays (or, less fre- quently, blows up) exponentially as it is backpropagated through time, and used this result to argue that RNNs can- not learn long-range temporal dependencies when gradi- ent descent is used for training. In addition, the occasional tendency of the backpropagated gradient to exponentially blow-up greatly increases the variance of the gradients and makes learning very unstable. As gradient descent was the main algorithm used for training neural networks at the time, these theoretical results and the empirical difficulty of training RNNs led to the near abandonment of RNN re- search.
One way to deal with the inability of gradient descent to learn long-range temporal structure in a standard RNN is to modify the model to include “memory” units that are specially designed to store information over long time pe- riods. This approach is known as “Long-Short Term Mem- ory” (Hochreiter & Schmidhuber, 1997) and has been suc- cessfully applied to complex real-world sequence mod- eling tasks (e.g., Graves & Schmidhuber, 2009). Long- Short Term Memory makes it possible to handle datasets which require long-term memorization and recall but even on these datasets it is outperformed by using a standard RNN trained with the HF optimizer (Martens & Sutskever, 2011).
Another way to avoid the problems associated with back- propagation through time is the Echo State Network (Jaeger & Haas, 2004) which forgoes learning the recurrent con- nections altogether and only trains the non-recurrent out- put weights. This is a much easier learning task and it works surprisingly well provided the recurrent connectionsare carefully initialized so that the intrinsic dynamics of the network exhibits a rich reservoir of temporal behaviours that can be selectively coupled to the output.

In fact, in much of my own doctoral work I employed genetic reinforcement to sidestep gradient-based search issues. Now comes Hessian-free optimisation which is heralded as the breakthrough which allows the power of recurrent networks to be harnessed within a context quite similar to backprop or backprop through time.

And Suskever, Martens and Hinton seem to have had big success so far with Hessian-free methods and multiplicative recurrent nets. So in case you, gentle reader wish to comprehend the maths involved, here is a link to Tartu's Andrew Gibiansky, a gentleman who seems willing to explain ....

Edmund Ronald Ph. D. · 2014-11-09T02:38:38-08:00

The Machine Learning Reddit seems to be where all the smart kids are these days. And here is a link to Geoffrey Hinton's bravura performance when confronted with some of the most painstakingly detailed questioning I've ever witnessed a scientist to endure. The man has the knowledge of an encyclopedia and the patience of a saint. This is too much fun to miss! Also, the level of the questions make one realize that there are a lot of very smart geeks out there who are lurking and watching neural net and machine learning tech as we finally start to realize some of the 70's promises of AI.

It's the privilege of a blogger to extract one morsel from a huge text such as Hinton's.Here is a discussion of RNNs and their possible impact in automatic translation. This is a fragment that I "almost" understand. Babelfish, here we come!

[–]twainus 4 points 4 months ago
Recently in 'Behind the Mic' video (https://www.youtube.com/watch?v=yxxRAHVtafI), you said: "IF the computers could understand what we're saying...We need a far more sophisticated language understanding model that understands what the sentence means.
And we're still a very long way from having that."
Can you share more about some of the types of language understanding models which offer the most hope? Also, to what extent can "lost in translation" be reduced if those language understanding models were less English-centric in syntactic structure?
Thanks for your insights.

permalink

[–]geoffhinton[S] 25 points 4 months ago
Currently, I think that recurrent neural nets, specifically LSTMs, offer a lot more hope than when I made that comment. Work done by Ilya Sutskever, Oriol Vinyals and Quoc Le that will be reported at NIPS and similar work that has been going on in Yoshua Bengo's lab in Montreal for a while shows that its possible to translate sentences from one language to another in a surprisingly simple way. You should read their papers for the details, but the basic idea is simple: You feed the sequence of words in an English sentence to the English encoder LSTM. The final hidden state of the encoder is the neural network's representation of the "thought" that the sentence expresses. You then make that thought be the initial state of the decoder LSTM for French. The decoder then outputs a probability distribution over French words that might start the sentence. If you pick from this distribution and make the word you picked be the next input to the decoder, it will then produce a probability distribution for the second word. You keep on picking words and feeding them back in until you pick a full stop.
The process I just described defines a probability distribution across all French strings of words that end in a full stop. The log probability of a French string is just the sum of the log probabilities of the individual picks. To raise the log probability of a particular translation you just have to backpropagate the derivatives of the log probabilities of the individual picks through the combination of encoder and decoder. The amazing thing is that when an encoder and decoder net are trained on a fairly big set of translated pairs (WMT'14), the quality of the translations beats the former state-of-the-art for systems trained with the same amount of data. This whole system took less than a person year to develop at Google (if you ignore the enormous infrastructure it makes use of). Yoshua Bengio's group separately developed a different system that works in a very similar way. Given what happened in 2009 when acoustic models that used deep neural nets matched the state-of-the-art acoustic models that used Gaussian mixtures, I think the writing is clearly on the wall for phrase-based translation.
With more data and more research I'm pretty confident that the encoder-decoder pairs will take over in the next few years. There will be one encoder for each language and one decoder for each language and they will be trained so that all pairings work. One nice aspect of this approach is that it should learn to represent thoughts in a language-independent way and it will be able to translate between pairs of foreign languages without having to go via English. Another nice aspect is that it can take advantage of multiple translations. If a Dutch sentence is translated into Turkish and Polish and 23 other languages, we can backpropagate through all 25 decoders to get gradients for the Dutch encoder. This is like 25-way stereo on the thought. If 25 encoders and one decoder would fit on a chip, maybe it could go in your ear :-)

Edmund

PS. Another clip for my notebook

Deep Learning and Neural Nets

Thursday, May 9, 2019

Python 3.6 install for AutoKeras

Monday, April 22, 2019

New Keras in TensorFlow2.0

Sunday, April 7, 2019

U-Nets for segmentation

U-Net: Convolutional Networks for Biomedical Image Segmentation

Saturday, April 6, 2019

Deep NLP: Cute diagrams for LSTMs and Embeddings

Thursday, April 4, 2019

Tutorial Nugget: Answering questions about videos with Keras

Wednesday, April 3, 2019

Keras - Google's intro tool for Deep Learning

FastAI and Keras install on Ubuntu 18.04 with Nvidia

Wednesday, February 10, 2016

Simple plotting with Python 2.7 in Jupyter

Sunday, March 15, 2015

Hessian Free Optimization and friends.

Friday, March 13, 2015

Geoffrey Hinton's AMA is a must-read.

Blog Archive