Skip to main content

Intro to Deep Learning and Tensorflow Basics

First of all, let's look at what is Machine learning? ML is simply converting physical representations/data into numbers and finding patterns in them. Deep learning is a subset of ML. In this article, I'll use ML and Deep learning interchangeably. To find patterns in the numbers computers use algorithms that are based on probabilistic methods. In conventional programming, we feed the computer with a set of inputs and rules to follow to get the desired output. But in ML we feed the set of inputs and desired outputs to generate the rules. These rules are figured out by an algorithm and used to deal with the unseen inputs to generate intended outputs which are used to generate the rule in the first place.

If you can build a simple rule-based system that doesn't required machine learning, do that 

- first rule of Google's Machine Learning Handbook

It is advisable not to overuse Machine Learning when a rule-based system can fulfil the same functionality. So when to use Deep learning or ML. 

  • When the traditional approaches fails with long list of rules. 

  • For continually changing environments such as autonoumous vehicles.  

  • Discover insights / patterns within large collections of data - such as skin disease classification based on images     

When not to use deep learning.

  • When Explainability of results required. This is due to the uninterpritable nature of the patterns/rules learned by the deep learning models.  

  • When a rule based system is sufficient for the purpose. 

  • When the errors are tollerant. 

  • When the amount of data available is not suffienct even to perform transfer learning. 

Deep learning mainly differs from  Machine Learning due to the data types which are used to train the models. Typically ML models perform well with structured data whereas Deep learning models perform well with unstructured data. The ML algorithms which are used to create ML models are typically known as "shallow algorithms". Some common examples for ML algorithms are Random Forest, Naive bayers, Nearest neighbour, Support vector machines and etc. Neural networks, Fully connected neural networks, Convolutional Neural Networks, Recurrent Neural Networks and Transformers are the algorithms used in deep learning to create models with unstructured data.

What are neural networks?

"neural network is a network or circuit of neurons, or in a modern sense, an artificial neural network, composed of artificial neurons or nodes. - Wikipedia"

In neural networks, perceptrons are analogous to neurons.

Before feeding data into the neural network the data must be converted to numerical values. This process is called encoding. Then the converted numerical values are fed into a neural network. By using those numerical values the neural network learns feature representations of the data by adjusting the weights and thereby manipulating the gradient values. These feature representation values are used to classify or predict newly inputs fed to the trained neural network. The anatomy of the neural network consists of mainly three components they are namely the Input layer, the Hidden Layer and the Output layer. The hidden layers are used to learn the patterns/embeddings/weights/feature representations/feature vectors all of which refers to the same thing. 

The learning process can have three types and are namely Supervised-learning, Semi-supervised, Unsupervised-learning and Transfer learning. In supervised learning, the data and the labels are fed into the algorithm in the training phase. In semi-supervised learning data and some of its' labels are fed into the algorithm in the model training phase. In unsupervised learning, the algorithm learns the patterns by itself only using the input data without the aid of labels. In transfer learning, a pre-trained model is used to learn the representations of data by tweaking the weights.

The practical use cases of deep learning are Recommendation systems, Language translation, Speech recognition, Computer vision and Natural Language Processing (NLP). The deep learning models of the above use cases can be constructed using a library like Tensorflow, Pytorch etc. 

The tensors of the neural network are the numerical representations of the input data and the feature representation. A clear intuition about a tensor can be taken from the following video by Dan Fleisch.


With tensor flow, we can simply create a tensor using the following command.

import tensorflow as tf

tf.constant(
    value
, dtype=None, shape=None, name='Const'
)
Let's create a tensor with zero dimensions i.e a scaler

scalar = tf.constant(11)

Then check the dimension of the scaler (think about a point)

scalar.ndim

Create a vector 

vector = tf.constant([11,11])

Check the dimensions (think about a line)

vector.ndims

Create a matrix

matrix = tf.constant([[11,12,13],
                                  [12,21,11])

Check dimensions (think about a shape)

matrix.ndims

Create a tensor

tensor = tf.constant([ [[11,12,13],
                                    [12,21,11]],
                                    [[11,12,13],
                                    [12,21,11]],
                                    [[11,12,13],
                                    [12,21,11]] ])

Check the dimensions (think about a solid. It could be n-dimensions)

tensor.ndims

The tensors created with the tf.constant are immutable. To allow mutability in a tensor the tensors should be created using the tf.Variable() and then using .assign() the values in the tensor can be changed. 

Change the first value of the vector to 12

vector = tf.Variable([11,11])
vector[0].assign(12)

A tensor with random values can be created using a tf.random.normal() or tf.random.uniform() which are used normal distribution and uniform distribution to generate the random values in the tensor respectively.

We can shuffle a tensor using a tf.random.shuffle(<the tensor to be shuffled>). This allows us to shuffle the tensor along its first dimension. By doing so we can avoid the effect of the order of the dataset on the learning of the neural network. For further details read the following


 Here we can set the operational level seed. To further clarify the use of seed read. Seed is used to get the reproducible experiment results. 

We can create tensors with all ones and all zeros as follows.

tf.ones([10,7])

Output:

<tf.Tensor: shape=(10, 7), dtype=float32, numpy= array([[1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1.]], dtype=float32)>

tf.zeros(shape=(10,6))

Output:

<tf.Tensor: shape=(10, 6), dtype=float32, numpy= array([[0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.]], dtype=float32)>

The NumPy array mainly differs from the tensor is due to the ability of the tensor to run more effectively in the GPU. We can directly convert the NumPy array into a tensor. If required the shape of the tensor also can be manipulated as long as:

N = x * y * z

where N: No of elements in NumPy array
x,y,z: length of the dimensions  

import numpy as np
numpy_A = np.arange(1,25,dtype=np.int32)
A = tf.constant(numpy_A,shape=(2,3,4))
A

Output:
<tf.Tensor: shape=(2, 3, 4), dtype=int32, numpy= array([[[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]], [[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]]], dtype=int32)>

A tensor mainly consists of four main attributes. They are namely  Shape, Rank, Axis/dimension and Size. 

The shape is the length of each dimension of the tensor. Here we could pass the dimension we are willing to get the length.

A.shape
TensorShape([2, 3, 4])

A.shape[0]
2

The rank of a tensor is the number of dimensions a tensor has. 

A.ndim
3

The axis or dimension is a particular dimension of a tensor. Here we index the tensor to get the first element of each dimension except for the last one in the first case. In the second case, we get only the first element of the first dimension.

A[:1,:1]
<tf.Tensor: shape=(1, 1, 4), dtype=int32, numpy=array([[[1, 2, 3, 4]]], dtype=int32)>

A[:1]
<tf.Tensor: shape=(1, 3, 4), dtype=int32, numpy= array([[[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]]], dtype=int32)>

To get the last item of each row of the tensor.

A[:,:,-1]
<tf.Tensor: shape=(2, 3), dtype=int32, numpy= array([[ 4, 8, 12], [16, 20, 24]], dtype=int32)>

Get all elements of each row of the tensor except the last one.

A[:,:,:-1]
<tf.Tensor: shape=(2, 3, 3), dtype=int32, numpy= array([[[ 1, 2, 3], [ 5, 6, 7], [ 9, 10, 11]], [[13, 14, 15], [17, 18, 19], [21, 22, 23]]], dtype=int32)>

The size is the total number of items in the tensor

tf.size(A)
<tf.Tensor: shape=(), dtype=int32, numpy=24>

tf.size(A).numpy()
24

When dealing with data inferencing using a trained tensorflow neural network almost all the time we may need to change the shape of the tensor. To change and add an extra dimension to the tensor we could use the following.
rank_4_A = A[...,tf.newaxis]
rank_4_A

<tf.Tensor: shape=(2, 3, 4, 1), dtype=int32, numpy= array([[[[ 1], [ 2], [ 3], [ 4]], [[ 5], [ 6], [ 7], [ 8]], [[ 9], [10], [11], [12]]], [[[13], [14], [15], [16]], [[17], [18], [19], [20]], [[21], [22], [23], [24]]]], dtype=int32)>

rank_4_A = A[tf.newaxis,...]
rank_4_A

<tf.Tensor: shape=(1, 2, 3, 4), dtype=int32, numpy= array([[[[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]], [[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]]]], dtype=int32)>

The tf.newaxis is equivalent to the NumPy none type object. Also, the following method can be used to expand the tensor dimensions.

tf.expand_dims(A,axis=0)
<tf.Tensor: shape=(1, 2, 3, 4), dtype=int32, numpy= array([[[[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]], [[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]]]], dtype=int32)>

tf.expand_dims(A,axis=1)
<tf.Tensor: shape=(2, 1, 3, 4), dtype=int32, numpy= array([[[[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]]], [[[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]]]], dtype=int32)>

tf.expand_dims(A,axis=2)
<tf.Tensor: shape=(2, 3, 1, 4), dtype=int32, numpy= array([[[[ 1, 2, 3, 4]], [[ 5, 6, 7, 8]], [[ 9, 10, 11, 12]]], [[[13, 14, 15, 16]], [[17, 18, 19, 20]], [[21, 22, 23, 24]]]], dtype=int32)>

tf.expand_dims(A,axis=-1)
<tf.Tensor: shape=(2, 3, 4, 1), dtype=int32, numpy= array([[[[ 1], [ 2], [ 3], [ 4]], [[ 5], [ 6], [ 7], [ 8]], [[ 9], [10], [11], [12]]], [[[13], [14], [15], [16]], [[17], [18], [19], [20]], [[21], [22], [23], [24]]]], dtype=int32)>

We also can use tf.squeeze() to remove the dimensions of length 1.
tf.squeeze(rank_4_A)
<tf.Tensor: shape=(2, 3, 4), dtype=int32, numpy= array([[[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]], [[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]]], dtype=int32)>

tf.squeeze(rank_4_A_T)
<tf.Tensor: shape=(4, 3, 2), dtype=int32, numpy= array([[[ 1, 13], [ 5, 17], [ 9, 21]], [[ 2, 14], [ 6, 18], [10, 22]], [[ 3, 15], [ 7, 19], [11, 23]], [[ 4, 16], [ 8, 20], [12, 24]]], dtype=int32)>

With TensorFlow, we can perform elementwise tensor division, multiplication, addition and subtraction using tensorflow methods or simply by performing arithmetic operations with your tensor. Matrix multiplication is one of the widely used operations in neural networks and can be performed using tensor. To understand the inner workings of the matrix multiplication click

Performing scaler multiplication of matrixes.

tensor = tf.constant([[ 1,  2,  3,  4],
                      [ 5,  6,  7,  8],
                      [ 9, 10, 11, 12]])
tensor
<tf.Tensor: shape=(3, 4), dtype=int32, numpy= array([[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]], dtype=int32)>

tensor * tensor
<tf.Tensor: shape=(3, 4), dtype=int32, numpy= array([[ 1, 4, 9, 16], [ 25, 36, 49, 64], [ 81, 100, 121, 144]], dtype=int32)>

A * A
<tf.Tensor: shape=(2, 3, 4), dtype=int32, numpy= array([[[ 1, 4, 9, 16], [ 25, 36, 49, 64], [ 81, 100, 121, 144]], [[169, 196, 225, 256], [289, 324, 361, 400], [441, 484, 529, 576]]], dtype=int32)>

Perform matrix multiplication of the tensors (the dot product). The size of the last dimension must be equal to the size of the first dimension of the tensor that is going to be multiplied.

tensor2 = tf.constant(tensor,shape=(4,3))
tensor2

<tf.Tensor: shape=(4, 3), dtype=int32, numpy= array([[ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9], [10, 11, 12]], dtype=int32)>

tf.matmul(tensor,tensor2)
<tf.Tensor: shape=(3, 3), dtype=int32, numpy= array([[ 70, 80, 90], [158, 184, 210], [246, 288, 330]], dtype=int32)>

tensor @ tensor2
<tf.Tensor: shape=(3, 3), dtype=int32, numpy= array([[ 70, 80, 90], [158, 184, 210], [246, 288, 330]], dtype=int32)>

A
<tf.Tensor: shape=(2, 3, 4), dtype=int32, numpy= array([[[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]], [[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]]], dtype=int32)>

tf.transpose(A)
<tf.Tensor: shape=(4, 3, 2), dtype=int32, numpy= array([[[ 1, 13], [ 5, 17], [ 9, 21]], [[ 2, 14], [ 6, 18], [10, 22]], [[ 3, 15], [ 7, 19], [11, 23]], [[ 4, 16], [ 8, 20], [12, 24]]], dtype=int32)>

Use of tf.tensordot() Get the outer product as before for higher-rank tensors.

tf.tensordot(A,tf.transpose(A),axes=0)
<tf.Tensor: shape=(2, 3, 4, 4, 3, 2), dtype=int32, numpy= array([[[[[[ 1, 13], [ 5, 17], [ 9, 21]],
        .............. ]]]]]

tf.size(tf.tensordot(A,tf.transpose(A),axes=0)).numpy()
576


Generally, we perform transform instead of reshaping matrix when performing the multiplication in compatible tensors.
 
A
<tf.Tensor: shape=(2, 3, 4), dtype=int32, numpy= array([[[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]], [[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]]], dtype=int32)>

tf.transpose(A)
<tf.Tensor: shape=(4, 3, 2), dtype=int32, numpy= array([[[ 1, 13], [ 5, 17], [ 9, 21]], [[ 2, 14], [ 6, 18], [10, 22]], [[ 3, 15], [ 7, 19], [11, 23]], [[ 4, 16], [ 8, 20], [12, 24]]], dtype=int32)>

tf.reshape(A,shape=(4,3,2))
<tf.Tensor: shape=(4, 3, 2), dtype=int32, numpy= array([[[ 1, 2], [ 3, 4], [ 5, 6]], [[ 7, 8], [ 9, 10], [11, 12]], [[13, 14], [15, 16], [17, 18]], [[19, 20], [21, 22], [23, 24]]], dtype=int32)>

We can change the data type of a tensor using. In here to float 32-bit precision.

tf.cast(A,dtype=tf.float32)

Here are some aggregation operations you can perform using the tensors. To find the standard deviation we need to cast the inputs.

E=tf.constant(np.random.randint(0,100,size=50))
E
<tf.Tensor: shape=(50,), dtype=int64, numpy= array([73, 77, 32, 36, 80, 61, 41, 57, 31, 34, 25, 7, 36, 71, 68, 15, 31, 51, 3, 43, 46, 94, 67, 38, 74, 45, 84, 49, 88, 42, 16, 1, 92, 29, 52, 57, 28, 54, 96, 25, 99, 28, 50, 85, 79, 66, 84, 71, 7, 59])>
tf.reduce_min(E)
<tf.Tensor: shape=(), dtype=int64, numpy=1>
tf.reduce_max(E)
<tf.Tensor: shape=(), dtype=int64, numpy=99>
tf.reduce_mean(E)
<tf.Tensor: shape=(), dtype=int64, numpy=51>
tf.reduce_sum(E)
<tf.Tensor: shape=(), dtype=int64, numpy=2577>
import tensorflow_probability as tfp
tfp.stats.variance(E)
<tf.Tensor: shape=(), dtype=int64, numpy=683>
tf.math.reduce_std(tf.cast(E,dtype=tf.float32))
<tf.Tensor: shape=(), dtype=float32, numpy=26.133665>
tf.math.reduce_variance(tf.cast(E,dtype=tf.float32))
<tf.Tensor: shape=(), dtype=float32, numpy=682.96844>

to get the index of the largest value and smallest value we can use
# index largest value
F[tf.argmax(F)]

F[tf.argmin(F)]

Also when you are passing values to a neural network the values should be in numerical form as stated earlier in this article. To convert categorical data to numerical form we often use a one-hot encoding.

# one hot encoding
some_list = [0,1,2,3] # could be red,green,yellow,blue
tf.one_hot(some_list,depth=4)
<tf.Tensor: shape=(4, 4), dtype=float32, numpy= array([[1., 0., 0., 0.], [0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 1.]], dtype=float32)>

We can also find the square, square root and the log value of a tensor as follows.

H = tf.range(1,10)
H
<tf.Tensor: shape=(9,), dtype=int32, numpy=array([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)>

tf.square(H)
<tf.Tensor: shape=(9,), dtype=int32, numpy=array([ 1, 4, 9, 16, 25, 36, 49, 64, 81], dtype=int32)>

tf.math.sqrt(tf.cast(H,dtype=tf.float32))
<tf.Tensor: shape=(9,), dtype=float32, numpy= array([0.99999994, 1.4142134 , 1.7320508 , 1.9999999 , 2.236068 , 2.4494896 , 2.6457512 , 2.8284268 , 3. ], dtype=float32)>

tf.math.log(tf.cast(H,dtype=tf.float32))
<tf.Tensor: shape=(9,), dtype=float32, numpy= array([0. , 0.6931472, 1.0986123, 1.3862944, 1.609438 , 1.7917595, 1.9459102, 2.0794415, 2.1972246], dtype=float32)>


That's all for this article in the next article will discuss Regression model building using Tensorflow.










Comments

Popular posts from this blog

The Fractals, Infinity, Universe and Measurement Error

Let's Create a Perceptron Using Tensorflow