Your First Neural Network in Google’s TensorFlow

3711 VIEWS

·

Cars learning to drive by themselves, mobile phones outperforming professional translators, and home devices that can order Tide PODs by voice command—Welcome to 2018. What do all these technologies have in common, besides the fact that they’re awesome?

Neural networks (NN), machine learning algorithms modeled after your brain and nervous system.

In this article, you will learn how to make your first neural network using Google’s TensorFlow API in Python. To fully grasp TensorFlow, some theoretical background of neural networks, and basic Python programming skills are required.

One of the nicest things about neural networks is that they’re relatively universal. Whether you want to build an app that recognizes kittens, a raspberry pi robot that can navigate your living room, or a scanner that understands handwritten shopping lists, neural networks should be able to help.

An artificial neural network is simply a computing system that learns by seeing. In machine learning (ML) terms, that means it is a supervised learning algorithm. Without explicitly programming it, the system learns to identify data objects (rows in a table, pictures, videos) by looking at examples that have been labeled, mostly by humans.

Here’s an analogy. The more examples you give a child of what a power socket is, the better the child will be able to distinguish between a light switch and a power socket. In this example, your computer is the child, the socket is the data object. Your communication of “Don’t touch that socket” is the label, and how the child processes that information is the neural network.

https://github.com/RoelPi/fix_sw_tensorflow/blob/master/image_1.jpg
The first neural network was created in the 1940s, and decade after decade, the algorithm has been optimized. Although artificial neural networks are nothing new, increasing processing power and storage capacity have unleashed their full potential. Over the past couple of years, it has penetrated many fields of academia, business, healthcare and politics. Neural networks have become so widespread and accessible through books, blog posts and massive open online courses that there’s barely an excuse to not study them. Even better—Google has built a library that you can easily load into Python.

Neural networks can be used for both regression jobs, where you are predicting a continuous variable, and classification jobs, where you are trying to put objects in the right category. In our example, we will stick to classification. Using the UCI Credit Approval dataset, we will try to predict which credit card application gets accepted and which one doesn’t.

In our example, we’ll rely on the following packages: Pandas, just for reading in the dataset csv; NumPy and scikit-learn, for basic transformations of the dataset, and of course, Tensorflow for building an artificial neural network.

Our code is split in three parts: data preparation, graph construction, and running the model.

Data preparation

In the first part of our code, we simply load the dataset and remove all rows that contain missing data. The popular Pandas package makes this a lot easier. We split the dataset into two data objects. One contains all the features, and the other contains the values we’ll try to predict, the target. We one-hot encode the target, because that’s what Tensorflow (and many other machine learning algorithms in Python) require. In other words, we make dummy variables out of the target’s categorical values. Using Pandas’ get_dummies function, we also dummify the categorical values in our feature data object.

 import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
import tensorflow as tf

########################################################################
## Data Preparation ####################################################
########################################################################

# Read the data set
sc_data = pd.read_csv('crx.data', header=None, decimal=".", na_values='?')

# Drop all rows that have NA's.
sc_data = sc_data.dropna(axis=0,how='any')

# The last column of the data set is the one we want to predict
sc_target = sc_data.iloc[:,-1]

# Replace the '+' and '-' labels with 1 and 0 integers
sc_target = sc_target.replace('+',1)
sc_target = sc_target.replace('-',0)

# You have to OneHotEncode it for Tensorflow to function properly
sc_target = OneHotEncoder(sparse=False).fit_transform(np.reshape(sc_target, [-1, 1]))

# Remove the last column (the target) of the data set and convert it to a numpy matrix
sc_data = sc_data.iloc[:,0:15]
sc_data = pd.get_dummies(sc_data)
sc_data = sc_data.as_matrix()

# Split the data set into training and testing sets
data_train, data_test, label_train, label_test = train_test_split(sc_data, sc_target)

Construction of the neural network graph

One of the defining features of TensorFlow is the separation of the graph from the session. In the graph, we configure the computations that should take place; while in the session, we run our labeled data through that graph. Although this can be very confusing if you are new to TensorFlow, it’s also what makes it blazing fast.

In our code, we set all our hyperparameters explicitly. First, the learning rate is set to 0.0005, and we add three hidden layers to our network. The amount of nodes in the hidden layers is, respectively, 30 nodes, 120 nodes and, again, 30 nodes. (Why? Just because we can.) We’ll use the sigmoid activation function and a simple mean squared error as a loss (or cost) function because they are easy to interpret and can be found in many fields of science, so they are easy to relate to.

We set placeholders for the training data and target. Basically, we are telling TensorFlow that we will put data in these placeholders once we run the session. Your real training data and target will find its way in the model through these placeholders.

We count our features, because we need a node for each feature in our input layer. We also count our possible output values, as this will determine the number of nodes in the output layer. Now that we know all the nodes in the input, hidden, and output layers, we add weights to every link and a bias to every node. These weights are taken from a normal distribution.

We tell every layer how to calculate the output of every node. This is done through matrix multiplication operations. We also instruct the nodes of each layer to run their output through the sigmoid function.

Finally, we tell the graph that its loss function is the mean squared error, that optimization should be done using gradient descent, and how to calculate the accuracy of the model.

 
########################################################################
## Graph Construction ##################################################
########################################################################

# Model hyperparameters
g_learning_rate = 0.0005
g_hidden_layer_sizes = [30,120,30]

# We set placeholders for our training data and target.
g_data = tf.placeholder(tf.float32, [None, np.shape(sc_data)[1]])
g_label= tf.placeholder(tf.float32, [None, np.shape(sc_target)[1]])

# The amount of features determines the amount of nodes in the input layer
# The amount of possible values in your target determine the amount of nodes in your output layer.
g_num_features = int(g_data.get_shape()[1])
g_num_classes = int(g_label.get_shape()[1])

# Once we know the number of features and target values, we can construct the neural network.
# It consists of the input layer, the three hidden layers and the output layer.
g_layer_sizes = []
g_layer_sizes.append(g_num_features)
g_layer_sizes.extend(g_hidden_layer_sizes)
g_layer_sizes.append(g_num_classes)

# The central features of neural networks are weights and biases.
# We give each link a weight and each node a bias.
g_weights = []
g_biases = []

for i, layer_size in enumerate(g_layer_sizes[:-1]):
	g_weights.append(tf.Variable(tf.random_normal([layer_size, g_layer_sizes[(i+1)]])))
	g_biases.append(tf.Variable(tf.random_normal([g_layer_sizes[(i+1)]])))

# We tell our network to calculate the output of every node.
# And run the output through a sigmoid function.
latest_layer = tf.add(tf.matmul(g_data,g_weights[0]),g_biases[0])
for bias, weight in zip(g_biases[1:], g_weights[1:]):
	layer_output = tf.add(tf.matmul(latest_layer,weight),bias)
	latest_layer = tf.nn.sigmoid(layer_output)
g_prediction = latest_layer


g_loss_function = tf.losses.mean_squared_error(g_prediction, g_label)
g_training = tf.train.GradientDescentOptimizer(g_learning_rate).minimize(g_loss_function)
g_accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(g_label, 1), tf.argmax(g_prediction, 1)), tf.float32))

Running the model

If you’ve made it this far, the hardest part is over. We next set the batch size to 5, which means the model will update the weights and biases on each iteration over a sample of five rows from your training dataset. (You can lower this, but it will make running the model slower.) We also set the amount of epochs, which basically says that we will run the data a thousand times through the network.

In the previous section, I said we need to set placeholders because we want to put data in them when we actually run the model. We do that every time we use the run() method in the session. For every batch, we tell the session to calculate g_training, using the data from the batch. g_training is the gradient descent, which tries to minimize g_loss_function, which in turn needs g_prediction, which relies on the output from the many sigmoid functions, which relies on the data input from the training data batch. (In a simplified way I have described how the ingenious mechanism of backpropagation works.)

Finally, after every epoch, we print the updated accuracy of the model by applying the model to the test dataset.

 
########################################################################
## Running the graph ###################################################
########################################################################

# How large should batches be? 1 = online learning
batch_size = 5

# How many times do we want the model to loop over the full data set (epochs)
num_epochs = 1000

sess = tf.Session()
sess.run(tf.global_variables_initializer())

# Determine the amount of batches.
num_batches = int(len(data_train) / batch_size)

# Run the complete data set multiple times through the neural network.
for i in range(num_epochs):
    
	# randomly shuffle the data for stochastic gradient descent to work properly
	assert np.shape(data_train)[0] == np.shape(label_train)[0]
	p = np.random.permutation(len(data_train))
	data_train, label_train = data_train[p], label_train[p]
    
	# For every epoch, all batches should be processed.
	for j in range(num_batches):
    	batch_label = label_train[j * batch_size:(j+1) * batch_size]
    	batch_data = data_train[j * batch_size:(j+1) * batch_size]
    	sess.run(g_training, {g_data: np.array(batch_data), g_label: np.array(batch_label)})
   	 
	# Determine the accuracy after every epoch
	model_accuracy = sess.run(g_accuracy, {g_data: data_test, g_label: label_test})
	print('Epoch ' + str(i) + ': the accuracy of the model after this epoch is ' + str(model_accuracy))

Final notes

As you run the model, the accuracy of the model should converge to a certain point. Will the network return the most optimal combination after 1,000 epochs? Sadly, no—Since we set our weights and biases randomly, we start from another point. Furthermore, gradient descent does not guarantee a global minimum (or maximum accuracy), only a local minimum. Optimizing the algorithm can be done by tuning the hyperparameters: the learning rate, the amount and size of hidden layers, another loss function, other activation functions, the amount of epochs, etc.

In closing, it should be said that there are more convenient ways to manually program neural networks. However, the code you find in this article is explicitly written for demonstration purposes.


Roel Peters works as online business consultant and has degrees in economics, international relations and data science. He started coding as a kid and has experience in half a dozen of coding languages. His core specialization is knowledge discovery and insights generation. Roel is a wide-eyed techno-optimist and avid supporter of a universal basic income.


Discussion

Click on a tab to select how you'd like to leave your comment

Leave a Comment

Your email address will not be published. Required fields are marked *

Menu
Skip to toolbar