TensorFlow ¶

Table of Contents

Introducing Tensor Flow ¶

TensorFlow Website: https://www.tensorflow.org/

Installation Notes ¶

Best to work on python3. If doing so, use IPython3 for it to work correctly.

Possible Imports ¶

numpy: Used for scientific computing
math: Contains math functions
matplotlib: Allows for plotting and animating data

Training a Model ¶

Requires:

Prepared data
Inference: Function that makes predictions
Loss Measurement: Way to measure quality of predictions made
Optimizer to Minimize Loss

House Example Implementation:

Generated house size and price data (70% to train, 30% to test)
Price = (sizeFactor * size) + priceOffset
Mean Square Error
Gradient Descent Optimizer

Tensor Types ¶

Constant: constant value
Variable: values adjusted in graph
PlaceHolder - used ot pass data into graph

Tensor Properties ¶

Tensor: An n-dimensional array or list used in Tensor to represent all data

Rank

Dimensionality of a Tensor
Shape

Same of data in Tensor. Related to Rank.
Type

The data type contained in the tensor: float32, int8, string, bool, qint8, etc.

Examples:

Rank 0: Scalar, Ex: 145, Shape: []
Rank 1: Vector, Ex: v = [1, 3, 2, 5, 7], Shape: [5]
Rank 2: Matrix, Ex: m = [ [1,5,6], [5,3,4] ], Shape: [2,3]
Rank 3: 3-Tensor (cube)

Quantitized Values: Scaled values to reduce size and processing time

Methods ¶

get_shape(): returns shape
reshape(): changes shape
rank(): returns rank
dtype(): return data type
cast(): change data type

Gradient Descent ¶

Gradient descent is a popular family of methods for adjusting values to reduce error. Each step is to be in the direction of the most reduction in loss.

“Trying to find the fastest way down a hill.”

Note

If the learning rate is too high, the process will “bounce” around - not finding the lowest loss.

Creating Neural Networks in TensorFlow ¶

Intro to Neural Networks ¶

Inputs: Contains values from the data, normally numbers
Weights: Values multiplied by each input that are learned as the model is trained
Bias: Allows for adjustment of the contribution of a specific neuron
Sum: sum(Inputs * Weights) + Bias
Activation: Processes the sum

Forward Propagation

Neuron sending forward its computed value

Back Propagation

Compute Loss
Optimize to minimize loss

Linear Regression Example¶

# Weights: size_factor, Bias: price_offset
tf_price_pred = tf.add(tf.multiply(tf_size_factor, tf_house_size), tf_price_offset)

# Compute the loss (Mean Square Error)
tf_cost = tf.reduce_sum(tf.pow(tf_price_pred-tf_price, 2))/(2*num_train_samples)

# Adjusts the values to reduce the loss
learning_rate = 0.1
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(tf_cost)

Simple Neural Network ¶

Creating a simple neural network that identifies digits 0-9 from handwritten digits found in the MNIST data set.

MNIST ¶

http://yann.lecun.com/exdb/mnist
70,000 data points
- 55,000 training
- 10,000 test
- 5,000 validation
28x28 grayscale image
Label: 0-9

Implementation ¶

1. Prepared Data: MNIST Data

Pull down the data from the MNIST site¶

# We use the TF helper function to pull down the data from the MNIST site
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Initialize the placeholder for each image¶

# x is placeholder for the 28 X 28 image data
x = tf.placeholder(tf.float32, shape=[None, 784])

# The first value is the data type
# `None` in shape indicates we know that it exists, but we don't know how many items will be in this dimension (# of pictures)
# `784` in shape is for the 28x28 pixels - Each a float value

Initialize placeholder for the predicted probability of each digit¶

# y_ is called "y bar" and is a 10 element vector, containing the predicted probability of each
#   digit(0-9) class.  Such as [0.14, 0.8, 0,0,0,0,0,0,0, 0.06]
y_ = tf.placeholder(tf.float32, [None, 10])

# `None` once again represents the unknown # of pictures

Initialize the weights and biases to zero¶

# define weights and balances
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

b doesn’t need the additional dimension due to broadcasting.

2. Inference: sum(x * weight) + bias -> activation

Define the model¶
# define our inference model
y = tf.nn.softmax(tf.matmul(x, W) + b)

# Order matters for the matrix multiplication since it determines the shape
# SoftMax is the activation function
# Resulting Tensor has a shape=[None, 10]
SoftMax

An activation function that is typically used in the output layer when trying to classify what class you have. Squashes the values within the tensor to [0,1]

Logit

the vector of raw (non-normalized) predictions that a classification model generates, which is ordinarily then passed to a normalization function.

Cross-Entropy

A loss function that measures the performance of a classification model whose output is a probability value between 0 and 1.

3. Loss Measurement: Cross Entropy

Compare predicted digit y with actual digit y_ then return the reduced mean¶

# loss is cross entropy
cross_entropy = tf.reduce_mean(
                tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

# Returns the mean of all the losses between the comparisons

4. Optimize to Minimize Loss: Gradient Descent Optimizer

Optimize

Modify the weights and bias to improve the predictability of the model.
Initialize the training step¶
# each training step in gradient decent we want to minimize cross entropy
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

# `0.5` is the learn rate

Conduct Training ¶

1. Create Session and initialize global variables

# initialize the global variables
init = tf.global_variables_initializer()

# create an interactive session that can span multiple code blocks.  Don't
# forget to explicity close the session with sess.close()
sess = tf.Session()

# perform the initialization which is only the initialization of all global variables
sess.run(init)

2. Training steps

# Perform 1000 training steps
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)    # get 100 random data points from the data. batch_xs = image,
                                                        # batch_ys = digit(0-9) class
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) # do the optimization with this data

3. Evaluate the model

# Evaluate how well the model did. Do this by comparing the digit with the highest probability in
#    actual (y) and predicted (y_).
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
test_accuracy = sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})

Deep Neural Network ¶

With the simple Neural Network above, the image is being represented as linear data. This results in a loss of the information about the location of the pixel.

Convolution Layer

An added layer that looks at groups of pixels at a time

Pool Layer

Reduces the input to a smaller output

Fully Connected Layer

A layer consisting of neurons with all connections between its input and output

Over Fitting

A situation that occurs when the model is too well trained on the training data that it doesn’t perform well on actual data. This can be resolved by setting a few of the weights and bias in the fully connected layer to 0.

Implementation ¶

1. Prepared Data: MNIST Data and reshaped as required

As the simple neural network, define placeholders¶

# Create input object which reads data from MNIST datasets.  Perform one-hot encoding to define the digit
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# Using Interactive session makes it the default sessions so we do not need to pass sess
sess = tf.InteractiveSession()

# Define placeholders for MNIST input data
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])

# Note: now using an interactive session so that `sess` doesn't need to be repeatedly called.

Convert linear data to a usable value cube for the convolution layer¶

# change the MNIST input data from a list of values to a 28 pixel X 28 pixel X 1 grayscale value cube
#    which the Convolution NN can use.
x_image = tf.reshape(x, [-1,28,28,1], name="x_image")

# `-1` is a flag to place a list of the other dimensions.
# [batch, in_height, in_width, in_channels]

Define helper functions for weight and bias initialization¶

# Define helper functions to created weights and baises variables, and convolution, and pooling layers
#   We are using RELU as our activation function.  These must be initialized to a small positive number
#   and with some noise so you don't end up going to zero when comparing diffs
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

# Weight shape: [filter_height, filter_width, in_channels, out_channels]
#   channel is also referred to as features

Define helper functions for Convolution and Pooling¶

#   Convolution and Pooling - we do Convolution, and then pooling to control overfitting
def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

# The stride parameter is how far and in which direction we shift as we compute new feature values
# K size is the kernel size, which is the area we are pooling together
#    [batch, in_height, in_width, in_channels] - maps to the input tensor

2. Inference: Matmul(x, Weight) + bias for entire NN

Define first Convolution/Pool layer¶

# 1st Convolution layer
# 32 features for each 5X5 patch of the image
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
# Do convolution on images, add bias and push through RELU activation
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
# take results and run through max_pool
h_pool1 = max_pool_2x2(h_conv1)

# Note, the result will be a 14x14 image

ReLU Activation Function: Bounds values from 0 to 1, where any negative values become zero.

Define second Convolution/Pool layer¶

# 2nd Convolution layer
# Process the 32 features from Convolution layer 1, in 5 X 5 patch.  Return 64 features weights and biases
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
# Do convolution of the output of the 1st convolution layer.  Pool results
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

# Note, the result will be a 7x7 image

Define the Fully Connected layer¶

# Fully Connected Layer
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

# Weight Shape: [Input Size * Features/Channels, FC Neurons]

#   Connect output of pooling layer 2 as input to full connected layer
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

# dropout some neurons to reduce overfitting
keep_prob = tf.placeholder(tf.float32)  # get dropout probability as a training input.
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

Define Readout layer to convert all 1024 channels to the 10 digit output¶

# Readout layer
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

# Define model
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

3. Loss Measurement: Cross Entropy

Compare predicted digit y with actual digit y_ then return the reduced mean¶

# Loss measurement
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_conv, labels=y_))

# Returns the mean of all the losses between the comparisons

4. Optimize to Minimize Loss: Adam Optimizer

Optimize

Modify the weights and bias to improve the predictability of the model.
Initialize the training step¶
# loss optimization
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

# Note: Using Adam instead of Gradient Descent

Conduct Training ¶

1. Initialize global variables

# What is correct
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
# How accurate is it?
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Initialize all of the variables
sess.run(tf.global_variables_initializer())

# Train the model
import time

#  define number of steps and how often we display progress
num_steps = 3000
display_every = 100

# Start timer
start_time = time.time()
end_time = time.time()

2. Training steps

for i in range(num_steps):
    batch = mnist.train.next_batch(50)
    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

    # Periodic status display
    if i%display_every == 0:
        train_accuracy = accuracy.eval(feed_dict={
            x:batch[0], y_: batch[1], keep_prob: 1.0})
        end_time = time.time()
        print("step {0}, elapsed time {1:.2f} seconds, training accuracy {2:.3f}%".format(i, end_time-start_time, train_accuracy*100.0))

# Note: also feeding in keep_prob to randomly drop neurons during training to prevent over-fitting to the training data.

3. Evaluate the model

# Display summary
#     Time to train
end_time = time.time()
print("Total training time for {0} batches: {1:.2f} seconds".format(i+1, end_time-start_time))

#     Accuracy on test data
print("Test accuracy {0:.3f}%".format(accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})*100.0))

sess.close()

Debugging and Monitoring ¶

Names ¶

Many elements have a name property that can be used to identify it within the graph

Implement by passing name argument into constructor¶

x = tf.placeholder(tf.float32, shape=[None, 784], name="images")

Name Scopes ¶

Provides a way of grouping elements

Implement by including a with statement before instantiating tensors¶

with tf.name_scope('Conv1')
    W_conv1 = weight_variable([5, 5, 1, 32], name="weight")

TensorBoard ¶

Used for visualizing learning, visualize computation graph, and monitor performance

Adding Support for TensorBoard

Define log file location
Define names and name scopes
Add Summary methods
Train the model
Run TensorBoard

Items Needed

Path to location where logs will be stored
Names and Name scopes for all desired elements
Code to write the data for visualization

# TB - Write the default graph out so we can view it's structure
tbWriter = tf.summary.FileWriter(logPath, sess.graph)

To view the tensor board, execute the following command¶

tensorboard --log <path to logs>

Types of Data Collections

Raw Values

tf.summary.scalar("training_accuracy", accuracy)

Summary Statistics

Collecting variable summaries¶

#   Adds summaries statistics for use in TensorBoard visualization.
#      From https://www.tensorflow.org/get_started/summaries_and_tensorboard
def variable_summaries(var):
   with tf.name_scope('summaries'):
    mean = tf.reduce_mean(var)
    tf.summary.scalar('mean', mean)
    with tf.name_scope('stddev'):
      stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
    tf.summary.scalar('stddev', stddev)
    tf.summary.scalar('max', tf.reduce_max(var))
    tf.summary.scalar('min', tf.reduce_min(var))
    tf.summary.histogram('histogram', var)

Histograms

tf.summary.histogram('conv_wx_b', conv1_wx_b)

Sample Images
Image must be in correct dimension tensor (such as after reshape)¶
tf.summary.image('input_img', x_image, 5)

Once all of the summaries have been specified in the code, they must all be merged¶

summarize_all = tf.summary.merge_all()

# In the session run, periodically writing summary to log
_, summary = sess.run([train_step, summarize_all], feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

tbWriter.add_summary(summary, <training step>)