# Tensorflow
## What is  TensorFlow?
<it>TensorFlow</it> is a powerful open source software library for numerial computation, particularly optimized and fine-tuned for large-scale Machine Learning problems.

The basic principle of TensorFlow is as follows: first, we define in python a graph of computations which will be executed. TensorFlow then takes that graph and runs it efficiently using optimized C++ code.

To deal also with large-scale problems, the graph is broken up into several chunks such that they can be computed in parallel across multiple CPUs or GPUs. This makes it possible for TensorFlow to run and train a network with millions of parameters on a training set composed of billions of instances with millions of features each. 

## Installation

Assuming you installed Jupyter and Scikit-learn, you can simply use <em>pip install</em> to install TensorFlow. If you created an isolated environment using <it>virtualenv</it>, you first need to activate the environment for which you would like to install TensforFlow.

- cd \$your_env // path to your working directory
- source env/bin/activate

Next, install TensorFlow:
- pip3 install --upgrade tensorflow

Side note: if you would like to have GPU support, youe need to install tensorflow-gpu instead of tensorflow. For our basic introduction no GPU support is needed.

## Basic Arithmetic
First we execute some elementary TensorFlow computational graphs.

#### Load dependencies

In [2]:
import tensorflow as tf

  from ._conv import register_converters as _register_converters


#### Usage of tf.Variables
In the  following cell, we define two Variables and a function. Most important to understand is that the functions <em>fnc</em> is not caluclated by the following three lines. It just created a computation graph. In fact, even the variables are not initialized yet. 

In [3]:
x1 = tf.Variable(3, name="x1")
x2 = tf.Variable(6, name="x2")
fnc = x1*2*x2 + x2

### What is a session in TensorFlow?

To actually execute the computation, what we need to do is to open a TensforFlow $session$. Within that, we can initialize the variables and evaluate <em>fnc</em>. A TensorFlow $session$ handles the distribution of operations onto computational units such as CPUs and GPUs and runs them. In addition to that, it keeps the variables values stored. In the following cell, we create a session, initialize the variables and evaluate the function <em>fnc</em>.

In [4]:
session = tf.Session()
session.run(x1.initializer)
session.run(x2.initializer)
result = session.run(fnc)
print (result)

42


In [5]:
session.close()

In the end, the session can be closed which frees up any resources which have been used in that session.

A more handy way of generating a session without having to repeat $sess.run()$ all the time is by using the following structure. Notice that at the end of the block the session is also automatically closed. 

In [6]:
with tf.Session() as session:
    x1.initializer.run()
    x2.initializer.run()
    result=fnc.eval()
    print(result)

42


One further optimization for this kind of code is to use a global initializer for initializing all variables. Therefore we can use the global_varaibles_initializer() function. Again, this does not perform the initialization imediately, but rather creates a node in the computation graph that indicates that all variables will be initialized.

In [7]:
init = tf.global_variables_initializer()

with tf.Session() as session:
    init.run()
    result = fnc.eval()

#### Usage of tf.placeholders

In case that our values are changing during our computation, we need to specify placeholder nodes instead of variables. These nodes are different as they don't actually perform any computation, they just output the data you tell them to output at runtime. They are typically used to pass the training data to TensorFlow during training (e.g., mini-batches). If at runtime the values for a placeholder is not specified TensorFlow throws an exception. The next cell shows how we can easily create placeholders having a specific type being attached in the parameter list.

In [8]:
y1 = tf.placeholder(tf.float32)
y2 = tf.placeholder(tf.float32)

Let's say, we want to add/multiply those two values being stored in the placeholders $y1$ and $y2$. Therefore, we take usage of two tensorflow operations $tf.add(\cdot,\cdot)$ and $tf.multiply(\cdot,\cdot)$

In [9]:
sum_op = tf.add(y1, y2)
product_op = tf.multiply(y1, y2)

Finally, we can again evaluate the two operations within a session. We use the feed_dict of the $session.run(\cdot)$ to feed the data to our code. We specify the values by a key being the reference to our placheolder node and as values the actual value of the placeholder. 

In [10]:
with tf.Session() as session:
    sum_result = session.run(sum_op, feed_dict={y1: 36.0, y2: 6.0})
    product_result = session.run(product_op, feed_dict={y1: 6.0, y2: 21.0})

In [11]:
print (sum_result)
print (product_result)

42.0
126.0


#### basic arrray arithemtic using tf.placeholders

When we create a placeholer node we can optionally also specify its shape, if there is need to do that. If the dimension of the placeholder is not given (None), then it means that the placeholder is of "any size". The following cell show that we can also feed arrays to our two placeholders $y1$ and $y2$.

In [12]:
with tf.Session() as session:
    sum_result = session.run(sum_op, feed_dict={y1: [6.0, 4.0, 2.0], y2: [3, 2, 1.0]})
    product_result = session.run(product_op, feed_dict={y1: [2.0, 4.0], y2: 0.5})

In [13]:
print (sum_result)
print (product_result)

[9. 6. 3.]
[1. 2.]


## Exercise 4-3

In this exercise we will classify handwritten digits with convolutional neural networks using tensorflow.
Therefore we use the public dataset MNSIT.
In tensorflow one can simply download this dataset with the following command:

In [None]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot = True) # replace "/tmp/data/" with a folder on your system

Other useful imports:

In [44]:
import numpy as np

In the first step we want to create placeholders for the input and the output variables. However, before we do so, let's check the input shape of the dataset first:

In [16]:
mnist.train.images[0].shape

(784,)

Let's define the x and y placeholders:

In [15]:
x = tf.placeholder('float', [None, 784])
y = tf.placeholder('float')

In the following cell we want to write a function that returns a CNN model with the following layers (you may use methods of the tf.nn module: https://www.tensorflow.org/api_docs/python/tf/nn):
* Convolutional Layer: kernel size 5x5, filters=8, stride=1, padding='SAME', activation=RELU
* Convolutional Layer: kernel size 3x3, filters=8, stride=1, padding='SAME', activation=RELU
* Fully Connected Layer: activation=RELU, output neurons=256
* Output Layer: activation=Softmax

In [60]:
def cnn_model(x, classes=10):
    # first reshape the input to a 2d image
    x = tf.reshape(x, shape=[-1, 28, 28, 1])
    
    w = tf.Variable(tf.random_normal([5,5,1,8])) # [filter_height, filter_width, in_channels, out_channels]
    conv1 = tf.nn.conv2d(x, w, strides=[1,1,1,1], pacdding='SAME')
    conv1 = tf.nn.relu(conv1)
    
    w = tf.Variable(tf.random_normal([3,3,8,8])) # [filter_height, filter_width, in_channels, out_channels]
    conv2 = tf.nn.conv2d(conv1, w, strides=[1,1,1,1], padding='SAME')
    conv2 = tf.nn.relu(conv2)
    
    # we need to flatten / reshape the output of the cnn
    w = tf.Variable(tf.random_normal([28*28*8,256]))
    bias = tf.Variable(tf.random_normal([256]))
    fc = tf.reshape(conv2, [-1, 8*28*28])
    fc = tf.matmul(fc, w)
    fc = fc + bias
    fc = tf.nn.relu(fc)
    
    w = tf.Variable(tf.random_normal([256, classes]))
    bias = tf.Variable(tf.random_normal([classes]))
    output = tf.matmul(fc, w) + bias
    # softmax activation will be done by softmax_cross_entropy_with_logits_v2
    
    return output

Write a function that optimizes the weights using the Adam optimizer (tf.train.AdamOptimizer) with respect to the cross entropy loss function (tf.nn.softmax_cross_entropy_with_logits_v2).

In [61]:
def train(x, y, model, epochs=10, batch_size=128):
    cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits_v2(logits=model, labels=y) )
    optimizer = tf.train.AdamOptimizer().minimize(cost)
    
    with tf.Session() as sess:
        sess.run(tf.initializers.global_variables())
        
        for epoch in range(epochs):
            epoch_loss = 0
            for batch in range(mnist.train.num_examples//batch_size):
                x_train, y_train = mnist.train.next_batch(batch_size)
                _, c = sess.run([optimizer, cost], feed_dict={x:x_train, y:y_train})
                epoch_loss += c
            
            print("Epoch %d / %d completed. Loss: %.3f" % (epoch+1, epochs, epoch_loss))
        
        # compute accuracy:
        correct = tf.equal(tf.argmax(model, 1), tf.argmax(y, 1))

        accuracy = tf.reduce_mean(tf.cast(correct, 'float'))
        print('Accuracy:',accuracy.eval({x:mnist.test.images, y:mnist.test.labels}))

In [65]:
model = cnn_model(x)
%time train(x, y, model)

Epoch 1 / 10 completed. Loss: 156387.619
Epoch 2 / 10 completed. Loss: 31303.223
Epoch 3 / 10 completed. Loss: 17285.855
Epoch 4 / 10 completed. Loss: 10732.013
Epoch 5 / 10 completed. Loss: 7051.961
Epoch 6 / 10 completed. Loss: 4323.925
Epoch 7 / 10 completed. Loss: 2757.528
Epoch 8 / 10 completed. Loss: 1599.441
Epoch 9 / 10 completed. Loss: 939.234
Epoch 10 / 10 completed. Loss: 573.898
Accuracy: 0.9478
CPU times: user 16min 40s, sys: 2min 32s, total: 19min 13s
Wall time: 3min 58s


Now we want to add a max-pooling layer after each convolutional layer.
Complete the function below such that it creates the following model:
* Convolutional Layer: kernel size 5x5, filters=8, stride=1, padding='SAME', activation=RELU
* Max-Pooling layer: kernel size: 2, stride=2, padding='SAME'
* Convolutional Layer: kernel size 3x3, filters=8, stride=1, padding='SAME', activation=RELU
* Max-Pooling layer: kernel size: 2, stride=2, padding='SAME'
* Fully Connected Layer: activation=RELU, output neurons=256
* Output Layer: activation=Softmax

In [66]:
def cnn_model_pooling(x, classes=10):
    # first reshape the input to a 2d image
    x = tf.reshape(x, shape=[-1, 28, 28, 1])
    
    w = tf.Variable(tf.random_normal([5,5,1,8])) # [filter_height, filter_width, in_channels, out_channels]
    conv1 = tf.nn.conv2d(x, w, strides=[1,1,1,1], padding='SAME')
    conv1 = tf.nn.relu(conv1)
    
    pool1 = tf.nn.max_pool(conv1, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
    
    w = tf.Variable(tf.random_normal([3,3,8,8])) # [filter_height, filter_width, in_channels, out_channels]
    conv2 = tf.nn.conv2d(pool1, w, strides=[1,1,1,1], padding='SAME')
    conv2 = tf.nn.relu(conv2)
    
    pool2 = tf.nn.max_pool(conv2, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
    
    # we need to flatten / reshape the output of the cnn
    # due to the max-pooling operations the image size reduced to 28/2/2 = 7
    w = tf.Variable(tf.random_normal([7*7*8,256]))
    bias = tf.Variable(tf.random_normal([256]))
    fc = tf.reshape(pool2, [-1, 8*7*7])
    fc = tf.matmul(fc, w)
    fc = fc + bias
    fc = tf.nn.relu(fc)
    
    w = tf.Variable(tf.random_normal([256, classes]))
    bias = tf.Variable(tf.random_normal([classes]))
    output = tf.matmul(fc, w) + bias
    # softmax activation will be done by softmax_cross_entropy_with_logits_v2
    
    return output

In [67]:
model = cnn_model_pooling(x)
%time train(x, y, model)

Epoch 1 / 10 completed. Loss: 178734.735
Epoch 2 / 10 completed. Loss: 28778.201
Epoch 3 / 10 completed. Loss: 16936.463
Epoch 4 / 10 completed. Loss: 11975.147
Epoch 5 / 10 completed. Loss: 8957.668
Epoch 6 / 10 completed. Loss: 7046.698
Epoch 7 / 10 completed. Loss: 5783.312
Epoch 8 / 10 completed. Loss: 4742.917
Epoch 9 / 10 completed. Loss: 3976.335
Epoch 10 / 10 completed. Loss: 3299.564
Accuracy: 0.9547
CPU times: user 8min 20s, sys: 1min 7s, total: 9min 28s
Wall time: 2min 8s


#### What do you observe?

* much less parameters
* faster training
* better results with the same amount of epochs

#### Inception

Inception is a well performing technique that combines the outputs of three different kind of filters (1x1, 3x3, 5x5) and a 3x3 max pooling.

Implement a function that appends an inception module to a given input:

In [107]:
def inception(x, filers_per_conv=8):
    in_channels = int(x.shape[3])
    
    w = tf.Variable(tf.random_normal([1,1,in_channels,filers_per_conv]))
    conv1x1 = tf.nn.conv2d(x, w, strides=[1,1,1,1], padding='SAME')
    
    w = tf.Variable(tf.random_normal([3,3,in_channels,filers_per_conv]))
    conv3x3 = tf.nn.conv2d(x, w, strides=[1,1,1,1], padding='SAME')
    
    w = tf.Variable(tf.random_normal([5,5,in_channels,filers_per_conv]))
    conv5x5 = tf.nn.conv2d(x, w, strides=[1,1,1,1], padding='SAME')
    
    pool = tf.nn.max_pool(x, ksize=[1,3,3,1], strides=[1,1,1,1], padding='SAME')
    
    out = tf.concat([conv1x1, conv3x3, conv5x5, pool], 3)
    return out

In [111]:
def dim_reduction1x1conv(x, out_dim):
    # [filter_height, filter_width, in_channels, out_channels]
    w = tf.Variable(tf.random_normal([1,1,int(x.shape[3]),out_dim])) 
    conv = tf.nn.conv2d(x, w, strides=[1,1,1,1], padding='SAME')
    conv = tf.nn.relu(conv)
    return conv

In [112]:
def cnn_model_inception(x, classes=10):
    # first reshape the input to a 2d image
    x = tf.reshape(x, shape=[-1, 28, 28, 1])
    
    conv = tf.nn.relu(inception(x, 8))
    conv = dim_reduction1x1conv(conv, 8)
    conv = tf.nn.relu(inception(conv, 16))
    conv = dim_reduction1x1conv(conv, 16)
    
    # we need to flatten / reshape the output of the cnn
    w = tf.Variable(tf.random_normal([int(np.prod(conv.shape[1:])),512]))
    bias = tf.Variable(tf.random_normal([512]))
    fc = tf.reshape(conv, [-1, np.prod(conv.shape[1:])])
    fc = tf.matmul(fc, w)
    fc = fc + bias
    fc = tf.nn.relu(fc)
    
    w = tf.Variable(tf.random_normal([512, classes]))
    bias = tf.Variable(tf.random_normal([classes]))
    output = tf.matmul(fc, w) + bias
    # softmax activation will be done by softmax_cross_entropy_with_logits_v2
    
    return output

In [113]:
model = cnn_model_inception(x)
%time train(x, y, model)

Epoch 1 / 10 completed. Loss: 9059833.742
Epoch 2 / 10 completed. Loss: 1493355.659
Epoch 3 / 10 completed. Loss: 739177.497
Epoch 4 / 10 completed. Loss: 410815.617
Epoch 5 / 10 completed. Loss: 257497.780
Epoch 6 / 10 completed. Loss: 148161.487
Epoch 7 / 10 completed. Loss: 91139.134
Epoch 8 / 10 completed. Loss: 61463.640
Epoch 9 / 10 completed. Loss: 48070.501
Epoch 10 / 10 completed. Loss: 45572.962
Accuracy: 0.9643
CPU times: user 1h 18min 3s, sys: 13min 6s, total: 1h 31min 9s
Wall time: 16min 33s
