Artificial Neural Network

(based on this textbook)

If you haven’t learned how ANNs work, check out this lesson!


A Very Simple Neural Network

The beginning of the program just defines libraries and the values of the parameters, and creates a list which contains the values of the weights that will be modified (those are generated randomly).

import numpy, random, os
lr = 1 #learning rate
bias = 1 #value of bias
weights = [random.random(),random.random(),random.random()] #weights generated in a list (3 weights in total for 2 neurons and the bias)

Here we create a function which defines the work of the output neuron. It takes 3 parameters (the 2 values of the neurons and the expected output). “outputP” is the variable corresponding to the output given by the Perceptron. Then we calculate the error, used to modify the weights of every connections to the output neuron right after.

def Perceptron(input1, input2, output) :
   outputP = input1*weights[0]+input2*weights[1]+bias*weights[2]
   if outputP > 0 : #activation function (here Heaviside)
      outputP = 1
   else :
      outputP = 0
   error = output - outputP
   weights[0] += error * input1 * lr
   weights[1] += error * input2 * lr
   weights[2] += error * bias * lr

We create a loop that makes the neural network repeat every situation several times. This part is the learning phase. The number of iteration is chosen according to the precision we want. However, be aware that too much iterations could lead the network to over-fitting, which causes it to focus too much on the treated examples, so it couldn’t get a right output on case it didn’t see during its learning phase.

However, our case here is a bit special, since there are only 4 possibilities, and we give the neural network all of them during its learning phase. A Perceptron is supposed to give a correct output without having ever seen the case it is treating.

for i in range(50) :
   Perceptron(1,1,1) #True or true
   Perceptron(1,0,1) #True or false
   Perceptron(0,1,1) #False or true
   Perceptron(0,0,0) #False or false

Finally, we can ask the user to enter the values to check if the Perceptron is working. This is the testing phase.

The activation function Heaviside is interesting to use in this case, since it takes back all values to exactly 0 or 1, since we are looking for a false or true result. We could try with a sigmoid function and obtain a decimal number between 0 and 1, normally very close to one of those limits.

x = int(input())
y = int(input())
outputP = x*weights[0] + y*weights[1] + bias*weights[2]
if outputP > 0 : #activation function
   outputP = 1
else :
   outputP = 0
print(x, "or", y, "is : ", outputP)

We could also save the weights that the neural network just calculated in a file, to use it later without making another learning phase. It is done for way bigger project, in which that phase can last days or weeks.

outputP = 1/(1+numpy.exp(-outputP)) #sigmoid function

Here’s some more fun with neural networks on Tensorflow Playground


ANN for Classification

Processing the Data

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Download the dataset here

path = '../datasets/spotify-2023.csv' #replace this with your path/to/dataset
df = pd.read_csv(path)
dataset.head(25)
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
1 15634602 Hargrave 619 France Female 42 2 0.00 1 1 1 101348.88 1
2 15647311 Hill 608 Spain Female 41 1 83807.86 1 0 1 112542.58 0
3 15619304 Onio 502 France Female 42 8 159660.80 3 1 0 113931.57 1
4 15701354 Boni 699 France Female 39 1 0.00 2 0 0 93826.63 0
5 15737888 Mitchell 850 Spain Female 43 2 125510.82 1 1 1 79084.10 0
6 15574012 Chu 645 Spain Male 44 8 113755.78 2 1 0 149756.71 1
7 15592531 Bartlett 822 France Male 50 7 0.00 2 1 1 10062.80 0
8 15656148 Obinna 376 Germany Female 29 4 115046.74 4 1 0 119346.88 1
9 15792365 He 501 France Male 44 4 142051.07 2 0 1 74940.50 0
10 15592389 H? 684 France Male 27 2 134603.88 1 1 1 71725.73 0
11 15767821 Bearce 528 France Male 31 6 102016.72 2 0 0 80181.12 0
12 15737173 Andrews 497 Spain Male 24 3 0.00 2 1 0 76390.01 0
13 15632264 Kay 476 France Female 34 10 0.00 2 1 0 26260.98 0
14 15691483 Chin 549 France Female 25 5 0.00 2 0 0 190857.79 0
15 15600882 Scott 635 Spain Female 35 7 0.00 2 1 1 65951.65 0
16 15643966 Goforth 616 Germany Male 45 3 143129.41 2 0 1 64327.26 0
17 15737452 Romeo 653 Germany Male 58 1 132602.88 1 1 0 5097.67 1
18 15788218 Henderson 549 Spain Female 24 9 0.00 2 1 1 14406.41 0
19 15661507 Muldrow 587 Spain Male 45 6 0.00 1 0 0 158684.81 0
20 15568982 Hao 726 France Female 24 6 0.00 2 1 1 54724.03 0
21 15577657 McDonald 732 France Male 41 8 0.00 2 1 1 170886.17 0
22 15597945 Dellucci 636 Spain Female 32 8 0.00 2 1 0 138555.46 0
23 15699309 Gerasimov 510 Spain Female 38 4 0.00 1 1 0 118913.53 1
24 15725737 Mosman 669 France Male 46 3 0.00 2 0 1 8487.75 0
25 15625047 Yen 846 France Female 38 5 0.00 1 1 1 187616.16 0

Creating feature and target vectors

Looking at the features we can see that RowNumber, CustomerId, and Surname will have no relation with a customer leaving the bank. We drop them from X, which now contains the features indices from 3 to 12.

X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values

#Printing out the values of X --> Which contains the features
#                           y --> Which contains the target variable
print(pd.DataFrame(X[:10]))
print()
print(pd.DataFrame(y[:10]))
    0        1       2   3  4          5  6  7  8          9
0  619   France  Female  42  2        0.0  1  1  1  101348.88
1  608    Spain  Female  41  1   83807.86  1  0  1  112542.58
2  502   France  Female  42  8   159660.8  3  1  0  113931.57
3  699   France  Female  39  1        0.0  2  0  0   93826.63
4  850    Spain  Female  43  2  125510.82  1  1  1    79084.1
5  645    Spain    Male  44  8  113755.78  2  1  0  149756.71
6  822   France    Male  50  7        0.0  2  1  1    10062.8
7  376  Germany  Female  29  4  115046.74  4  1  0  119346.88
8  501   France    Male  44  4  142051.07  2  0  1    74940.5
9  684   France    Male  27  2  134603.88  1  1  1   71725.73

   0
0  1
1  0
2  1
3  0
4  0
5  1
6  0
7  1
8  0
9  0

Encoding categorical data

Neural networks can only handle numerical data. The categorical data in Geography and Gender won’t work. You might recall there was a concept called get_dummies in pandas that would convert categorical variables to several binary columns instead. That’s what we’ll do here, but with something called LabelEncoder.

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
dataset['Geography'].unique()
print(pd.DataFrame(X[:10]))
     0        1       2   3  4          5  6  7  8          9
0  619   France  Female  42  2        0.0  1  1  1  101348.88
1  608    Spain  Female  41  1   83807.86  1  0  1  112542.58
2  502   France  Female  42  8   159660.8  3  1  0  113931.57
3  699   France  Female  39  1        0.0  2  0  0   93826.63
4  850    Spain  Female  43  2  125510.82  1  1  1    79084.1
5  645    Spain    Male  44  8  113755.78  2  1  0  149756.71
6  822   France    Male  50  7        0.0  2  1  1    10062.8
7  376  Germany  Female  29  4  115046.74  4  1  0  119346.88
8  501   France    Male  44  4  142051.07  2  0  1    74940.5
9  684   France    Male  27  2  134603.88  1  1  1   71725.73

Creating label encoder object no. 1 to encode Geography name (index 1 in features) for France, Spain, Germany.

Next, encoding Geography from string to just 3 numbers (0 = France, 1 = Spain, 2 = Germany).

ct = ColumnTransformer([("Geography", OneHotEncoder(), [1])], remainder = 'passthrough')
X = ct.fit_transform(X)
print(pd.DataFrame(X[:10]))
    0    1    2    3       4   5  6          7  8  9  10         11
0  1.0  0.0  0.0  619  Female  42  2        0.0  1  1  1  101348.88
1  0.0  0.0  1.0  608  Female  41  1   83807.86  1  0  1  112542.58
2  1.0  0.0  0.0  502  Female  42  8   159660.8  3  1  0  113931.57
3  1.0  0.0  0.0  699  Female  39  1        0.0  2  0  0   93826.63
4  0.0  0.0  1.0  850  Female  43  2  125510.82  1  1  1    79084.1
5  0.0  0.0  1.0  645    Male  44  8  113755.78  2  1  0  149756.71
6  1.0  0.0  0.0  822    Male  50  7        0.0  2  1  1    10062.8
7  0.0  1.0  0.0  376  Female  29  4  115046.74  4  1  0  119346.88
8  1.0  0.0  0.0  501    Male  44  4  142051.07  2  0  1    74940.5
9  1.0  0.0  0.0  684    Male  27  2  134603.88  1  1  1   71725.73

Creating label encoder object no. 1 to encode Gender name (now index 4 in features) for Male, Female.

Next, encoding Gender from string to 2 numbers (0 for Male and 1 for Female)

ct = ColumnTransformer([("Gender", OneHotEncoder(), [4])], remainder = 'passthrough')
X = ct.fit_transform(X)
print(pd.DataFrame(X[:10]))
    0    1    2    3    4    5   6  7          8  9  10 11         12
0  1.0  0.0  1.0  0.0  0.0  619  42  2        0.0  1  1  1  101348.88
1  1.0  0.0  0.0  0.0  1.0  608  41  1   83807.86  1  0  1  112542.58
2  1.0  0.0  1.0  0.0  0.0  502  42  8   159660.8  3  1  0  113931.57
3  1.0  0.0  1.0  0.0  0.0  699  39  1        0.0  2  0  0   93826.63
4  1.0  0.0  0.0  0.0  1.0  850  43  2  125510.82  1  1  1    79084.1
5  0.0  1.0  0.0  0.0  1.0  645  44  8  113755.78  2  1  0  149756.71
6  0.0  1.0  1.0  0.0  0.0  822  50  7        0.0  2  1  1    10062.8
7  1.0  0.0  0.0  1.0  0.0  376  29  4  115046.74  4  1  0  119346.88
8  0.0  1.0  1.0  0.0  0.0  501  44  4  142051.07  2  0  1    74940.5
9  0.0  1.0  1.0  0.0  0.0  684  27  2  134603.88  1  1  1   71725.73

We remove the first column because two columns is enough to encode three countries. In other words, if those two columns are both 0, it must be the third country.

X = X[:,1:]
print(pd.DataFrame(X[:10]))
    0    1    2    3    4   5  6          7  8  9  10         11
0  0.0  1.0  0.0  0.0  619  42  2        0.0  1  1  1  101348.88
1  0.0  0.0  0.0  1.0  608  41  1   83807.86  1  0  1  112542.58
2  0.0  1.0  0.0  0.0  502  42  8   159660.8  3  1  0  113931.57
3  0.0  1.0  0.0  0.0  699  39  1        0.0  2  0  0   93826.63
4  0.0  0.0  0.0  1.0  850  43  2  125510.82  1  1  1    79084.1
5  1.0  0.0  0.0  1.0  645  44  8  113755.78  2  1  0  149756.71
6  1.0  1.0  0.0  0.0  822  50  7        0.0  2  1  1    10062.8
7  0.0  0.0  1.0  0.0  376  29  4  115046.74  4  1  0  119346.88
8  1.0  1.0  0.0  0.0  501  44  4  142051.07  2  0  1    74940.5
9  1.0  1.0  0.0  0.0  684  27  2  134603.88  1  1  1   71725.73

Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Let’s Make an ANN!

Let’s list out the steps involved in training the ANN with Stochastic Gradient Descent.

1) Randomly initialize the weights to small numbers close to but not 0.

2) Input the 1st observation of your dataset in the input layer, with each feature in one input node.

3) Forward-Propagation from left to right. The neurons are activated in a way that the impact of each neuron’s activation is limited by the weights. Propagate the activations until getting the predicted result y.

4) Compare the predicted result with the actual result. Measure the generated error.

5) Back-Propagation: From right to left, error is back propagated. Update the weights according to how much they are responsible for the error. The learning rate tells us by how much we should update the weights.

6) Repeat steps 1 to 5 and update the weights after each observation (reinforcement learning). Or: Repeat Steps 1 to 5, but update the weights only after a batch of observations (batch learning)

7) When the whole training set is passed through the ANN, that completes an epoch. Redo more epochs.

# Importing the Keras libraries and packages
import keras
from keras.models import Sequential # For building the Neural Network layer by layer
from keras.layers import Dense # To randomly initialize the weights to small numbers close to 0(But not 0)

Initializing the ANN

We will not put any parameter in the sequential object since we will be defining the layers manually.

classifier = Sequential()

Adding the input layer and the first hidden layer

How many nodes of the hidden layer do we actually need? There is no rule of thumb, but you can set the number of nodes in hidden layers as an average of the number of nodes in input and output layers, respectively. Here avg= (11+1)/2==>6 So set output dim=6

The activation Function is Rectifier Activation Function.

The kernel initializer will initialize the hidden layer weights uniformly.

Input dim tells us the number of nodes in the input layer. This is done only once and won’t be specified in further layers.

classifier.add(Dense(activation="relu", input_dim=12, units=6, kernel_initializer="uniform"))

Adding the second hidden layer

classifier.add(Dense(activation="relu", units=6, kernel_initializer="uniform"))

Adding the output layer

The sigmoid activation function is used whenever we need the probabilities of 2 categories (similar to logistic regression). We switch to Softmax activation functions when the dependent variable has more than 2 categories.

classifier.add(Dense(activation="sigmoid", units=1, kernel_initializer="uniform"))

Compiling the ANN

The Adam optimizer is a form of stochastic gradient descent. Luckily for you, this does all of the math behind the scenes.

classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

Fitting the ANN to the Training set

This step will take some time for large epoch values. A batch size of 10 means that the weights will update after every 10 observations. Epoch is a round of whole data flow through the network.

classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)

Making predictions and evaluating the model

y_pred = classifier.predict(X_test)

If y_pred is larger than 0.5, it returns true (1). Otherwise false (2). This determines what the neural network “voted” regarding this customer.

y_pred = (y_pred > 0.5)

Making the Confusion Matrix.

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
tn, fp, fn, tp = cm.ravel()
accuracy = (tn + tp) / (tn + tp + fn + fp)
print(accuracy)
[[1538   57]
 [ 258  147]]
0.8425

Task: Play around with the batch size and epoch hyperparameters. Compare the accuracies


Predicting whether a new customer will stay at the bank

Input: Two binary columns for geography, two binary columns for gender, credit score, age, tenure, balance, num products, has credit card, is active member, estimated salary

#0 0 1 0 619 42 2 0 1 1 1 56700
new_customer = [[0, 0, 1, 0, 619, 42, 2, 5000, 1, 1, 0, 50700]]
new_customer = sc.transform(new_customer)
new_prediction = classifier.predict(new_customer)
print(new_prediction)
[[0.5705499]]