User Tools

Site Tools


it:ai_deep_learning

AI deep learning

Introduction

  • deep_learning is typically used for unstructured data (images, voice, natural language, etc) and uses neural networks although a tensor can create more structure, examples include:
    • feed forward neural networks
      • assumes each input value is independent - not good for sequences such as letters in words or words in sentences - recurrent would be better
      • needs a non-linear activation (eg. relu, sigmoid, tanh) otherwise it just becomes a linear regression model but with the black box of a neural network
    • fully connected neural network
    • convolutional neural network
    • recurrent neural network
      • great for relatively short sequences as retains “memory” or older inputs in sequence
      • but:
        • may become a deep neural network
        • during backpropagation through time we may see same term recurrently
        • these may compound creating exploding gradients causing instability and inability to learn
        • these may create vanishing gradients which become zero and which stops the model from updating and hence unable to learn
      • solutions include:
        • Gated Reecurrent Unit (GRU)
        • Long Short-Term Memory (LSTM)
    • transformer models
      • invented in 2017 and published in a paper titled “Attention is all you need”, initially for language translation, natural language processing
      • has the advantage over RNNs as they can be scaled and parallelized for unlimited speed improvements and they are much better at keeping track of longer sentences or texts
      • they achieve this by positional encoding (inputs have the numerical order of each word) and self-attention (the internal learning of patterns such as synonyms, grammatical rules of various languages, understanding words by their context, etc)
  • one of the main Python language tools to work in this domain is PyTorch

Neural networks

  • with neural networks, you tell your network the inputs and what you want for the outputs, and let it learn on its own.
  • main flow types
    • a feedforward network contains inputs, outputs, and hidden layers.
      • The signals can only travel in one direction (forward). Input data passes into a layer where calculations are performed. Each processing element computes based upon the weighted sum of its inputs. The new values become the new input values that feed the next layer (feed-forward). This continues through all the layers and determines the output.
    • a feedback network has feedback paths.
      • This means that they can have signals traveling in both directions using loops. Since loops are present in this type of network, it becomes a non-linear dynamic system which changes continuously until it reaches a state of equilibrium. Feedback networks are often used in optimization problems where the network looks for the best arrangement of interconnected factors.
  • typically have:
    • an input layer of neurons
    • hidden layer(s) of neurons
    • an output layer of neuron(s)
  • types of learning:
    • supervised learning
      • when you have lots of inputs and known outputs or labels (eg. training on images and each image has been labelled with the content - which is the desired output)
    • unsupervised and self-supervised learning
      • when you have inputs but no output labels, the network will come up with a range of patterns which you will need to correlate with labels later
    • transfer learning
      • when you transfer a learned model into a new model
    • reinforcement learning
  • use-case types:
    • sequence to sequence (seq2seq)
      • input and output are both sequences eg. translation of a sentence, asking Siri a question (speech recognition)
    • classification / regression
      • identification such as computer vision, spam detection in natural language processing

Basic steps

  • obtain inputs
  • convert to tensors by numerical encoding
    • tensors
      • are object-oriented matrices-like numerical representations of data
  • build or choose a pretrained model
    • choose a loss function and optimizer
    • build a training loop
  • fit the model to the data
    • pass through neural networks to learn representation or patterns, features and weights
    • creates numerical representation tensor outputs
    • create human understandable outputs
  • save and load the model
  • make predictions with the model
  • evaluate the model predictions
  • improve through experimentation
  • use the model on new data to make predictions

Tensors

  • torch.tensor
  • types of tensors
    • scalar
      • scalar = = torch.tensor(7)
      • attributes: no dimensions, just a single number
      • convert to python int by using the item() function
    • vector
      • created by passing a [] eg vector = torch.tensor([7,10])
      • has magnitude and direction
      • has one dimension, shape = 2
    • MATRIX
      • MATRIX = torch.tensor([ [7,10],[11,15] ])
      • have 2 dimensions, and shape is [2,2]
    • TENSOR
      • TENSOR = torch.tensor([ [ [1,2,3],[2,3,4],[3,4,5] ] ])
      • TENSOR.ndim gives 3 dimensions but can have any number of dimensions
      • TENSOR.shape gives [1,3,3]
    • Random tensors
      • many models start with tensors full of random numbers and then adjust those to better represent the data in an iterative manner
      • eg. random_tensor = torch.rand(3,4)
      • random_image_sensor = torch.rand(noChannels,height,width)
      • reproducibility
        • taking the randomness out by using a random seed, add the following before calling creation of a random tensor
        • RANDOM_SEED = an integer value
        • torch.manual_seed(RANDOM_SEED) # must be called EACH time before you create a random tensor and the output will be identical to the first one you created
        • if using GPU then torch.cuda.manual_seed(RANDOM_SEED)
    • Zero tensors
      • zero = torch.zeros(size=(x,y) )
    • One tensors
      • ones = torch.ones(size=(x,y) )
  • range of tensors
    • torch.range(1,11) #may be deprecated
    • torch.arange(1,11) #start, end, step
  • copying shape of a tensor to a new tensor
    • new_zeros = torch.zeros_like(input=anotherTensor) #creates a zero tensor of same shape as anotherTensor
  • datatypes
    • default is float32
    • can specify at creation by using parameter dtype=
    • can specify at creation the device by using parameter device = none / CPU / CUDA - #default is CPU
    • requires_grad - whether to track gradients
    • can convert a float32 tensor to float16 tensor by:
      • float16_tensor = float32_tensor.type(torch.float16)
  • logits
    • log-odds function = logit value = ln(x/(1-x)) if epsilon is None
    • a function that represents probability values from 0 to 1, and negative infinity to infinity used to model the odds of success of an event as a function of independent variables.
    • in contrast to Linear Regression which is used to handle regression problems whereas Logistic regression is used to handle the classification problems.
      • Linear regression provides a continuous output but Logistic regression provides discreet output.
    • it is is an inverse to the sigmoid function that limits values between 0 and 1 across the Y-axis, rather than the X-axis
    • tensors containing logit elements are used in tensor calculations for computer vision etc
    • raw model outputs are often as a tensor of logits which then need to be converted to prediction probabilities (via Sigma or SoftMax) and then to labels
  • attributes
    • .dtype
    • .shape (same output as the function .size() )
    • .device
  • operations
    • element-wise multiplication - can use torch.mul(tensor, 10) instead of tensor * 10 #if multiplying two tensors only the ordered element is multiplied with the corresponding ordered element
    • matrix multiplication (dot product) - can use torch.matmul(tensor, tensor2) instead of tensor.tensor2 #see https://www.mathsisfun.com/algebra/matrix-multiplying.html
    • tensor aggregation
      • min, max, mean, sum, etc
        • torch.min(tensor) or tensor.min()
        • NB. mean requires floatig types or complex tensors not integers
      • positional min, max - to get the index of which value has min or max
        • tensor.argmin() and tensor.argmax()
    • reshaping, stacking, squeezing and unsqueezing tensors
      • tensor.reshape() - new tensor with different shape
      • tensor.view() - same tensor and shared memory but different shape view
      • torch.stack([tensor1,tensor2],dim=0) - combine tensors; also can use vstack and hstack
      • torch.squeeze(tensor) removes all 1 dimensions from a tensor
      • tensor.unsqueeze(dim=z) add a 1 dimension at dimension z
      • tensor.permute(tuple of newdim indexes) - swap dimensions as a view (hence sharing same memory)
  • getting a tensor values at an indexes
    • use tensor.[dim1][dim2][dim3] etc which would equate to tensor.[dim1,dim2,dim3,etc] and output would depend upon the shape of the tensor but you can't use a dim value greater than or equal to the shape dim value at that position, can use a colon as a wildcard and this will return a matrix tensor
  • convert a NumPy array into a new tensor in memory
    • torch.from_numpy(ndarray)
    • you can reverse this process and convert to numpy array by using tensor.numpy()
    • NB. numPy's default data type is float64 not float32, so you may need to convert it to Float32 using type
  • make some functions
    • let's make the torch.reLU() function in our own function:
def myRelu (x :torch.tensor) -> torch.tensor #takes a tensor as input and outputs a tensor
    return torch.maximum(torch.tensor(0), x) #this converts any negative value to zero
 
  • let's make the torch.sigmoid function:
def mySigmoid (x :torch.tensor) -> torch.tensor
    return 1 / (1+torch.exp(-x))

NumPy

  • multidimensional array library
  • much faster than Python lists as it:
    • uses less bytes of memory as uses fixed types of integers whereas a list must also store object type, size, reference count in addition to the value
    • no need to do type checking when iterating through objects
    • uses contiguous memory which is faster, allows SIMD vector processing, more effect cache utilisation
  • more functions than Python lists:
    • can multiply arrays of same shape - this will fail in Lists (and in tensors)
  • applications:
    • matlab replacement
    • plotting (matplotlib)
    • backend for Pandas, etc
    • can store images
    • machine learning
  • creating arrays
    • a = np.array([1,2,3],[7,8,9])
    • if you do b = a then b shares same memory of a
    • if you want a separate array use b = a.copy()
    • a.ndim , a.shape, a.dtype, a.itemsize - gives number bytes per element, a.nbytes gives total byte size
    • a.[r,c] gets item at row r, column c (negatives is element from end), : is a wildcard all
    • get a range by a.[r, start:end:step]
    • zeros = np.zeros(dim1,dim2)
    • ones = np.ones(dim1,dim2)
    • filled = np.full( (dim1,dim2),valueToFill)
    • filled = np.full_like(anotherArray,valueToFill)
    • randarray = np.random.rand(dim1,dim2)
    • randarray = np.random.randint(startint, endint, size = (dim1,dim2) )
    • randarray = np.random.random_sample(anotherarray.shape)
    • identity = np.identity(dim1)
    • repeated = np.repeat(array, axis=axisvalue)
  • operations
    • generally as for Python maths
    • trig functions
  • linear algebra
    • matrix multiply np.matmul(array1,array2) #as usual no.columns in array1 must equal no.rows in array2
    • determinant: np.linalg.det(array1)
    • inverse
    • Eigenvalues
    • singular vector decomposition
    • trace
    • matrix norm
  • statistics of the array
    • optionally pass axis=axisnumber
    • min, max, sum,
  • reorganizing arrays
    • newarray = oldarray.reshape( (dim1,dim2) ) as long as same number values
    • vstack = np.vstack([arr1,arr2])
    • hstack = np.hstack([arr1,arr2])
  • import data from csv text file
    • newarray = np.genfromtxt('textfilename',delimiter=',') #default will be float type
    • newarray = newarray.asType('int32') # to convert to int
  • boolean masking and advanced indexing
    • newarray > 12 will give a array of booleans showing which values are greater than 12
    • can index with a list so, to get the actual values greater than 12 can use:
      • newarray[newarray > 12]
    • np.any(newarray > 12, axis=0) gives boolean array indicating which columns had at least one element satisfied argument * np.all(newarray > 12, axis=0) gives boolean array indicating which columns had all elements satisfied argument
    • ( (newarray >12) & (newarray <100) ) allow as above but a range for the argument
    • (~ ( (newarray >12) & (newarray <100) ) ) allow as above but ~ is like NOT so the boolean values will be opposite
  • NumPy ONLY works on the CPU and NOT on a GPU!
    • move tensor to CPU BEFORE moving to a NumPy array via: tensor.cpu().numpy()

PyTorch

  • an evolution of Torch which allows fast writing of deep learning code in Python language to run on a GPU via CUDA or Tensor Processing Units (TPUs)
  • created by Facebook and is now open source and one of the main programs for developing deep learning apps
  • many pre-built deep learning models for transfer learning
  • allows:
    • pre-processing of data using tensors, etc
    • model data
    • deployment of model
  • optionally run PyTorch on Google Coda
    • easy and does not need a local GPU not installation set up of CUDA and PyTorch locally
  • install CUDA if you have a compatible nVidia GPU
    • first install MS Visual Studio Code, Visual Studio Community Edition Python components (8Gb)? - this is so the CUDA install detects it and creates interfaces to VS
    • then download nVidia CUDA (3Gb) and install
    • in Anaconda create a new Environment eg. CUDATest in which to install this
    • install NumPy (this is required by PyTorch)
    • then install PyTorch for CUDA with the version CUDA toolkit it needs (2.6Gb) eg.
      • via Anaconda Prompt:
        • conda activate newEnvironmentname
        • see https://pytorch.org/get-started/locally/ on conda command prompt line top install
        • also optionally need torchmetrics for evaluation:
        • conda install -c conda-forge torchmetrics
  • check GPU accessible
    • in Anaconda select the environment in which you installed PyTorch (otherwise import torch may fail if you use the base environment)
    • then open Jupyter notebook
      import torch
      torch.cuda.is_available()
      #check your GPU hardware
      !nvidia-smi
  • code so that all tensors are on the same device
    • default device is CPU
    • if you want them on GPU:
      • doesn't work: torch.set_default_tensor_type.device = torch.device('cuda') #compiles but doesn't make tensor devices GPU!
      • doesn't work: torch.set_default_tensor_device = “cuda” #compiles but doesn't make tensor devices GPU!
      • so declare a variable device then use this when creating tensors:
        device = "cuda" if torch.cuda.is_available() else "cpu"
        tensor = torch.tensor([3,4], device=device)
        tensor = tensor.to(device) # and it will move it to GPU if available if you forgot to do the above step
      • alternatively you can do torch.tensor(x,y).cuda() to try to move it to cuda

PyTorch workflow

data import and cleansing

initial set up code

import torch
from torch import nn #nn contains all of pyTorch modules for neural networks
import matplotlib.pyplot as plt # allows visualisation

torch.__version__ #check version

get data into tensor

option 1. create linear data to test
weight = 0.7 # gradient of y = mx + c
bias = 0.3 # Y intercept at x= 0 ie. c
start = 0
end=1
step=0.02
X = torch.arange(start,end,step).unsqueeze(dim=1) #features
y = weight * X + bias #output labels
option 2. Import data
  • torchvision.transforms
  • torch.utils.data.Dataset #create a dataset variable
  • torch.utils.data.DataLoader
split data into training data and test data
train_split = int(0.8 * len(X) )
X_train, y_train = X[:train_split], y[:train_split]
X_test, y_test = X[train_split:], y[train_split:]
# could make a more random split by using scikit learn train test split (see in machine learning page)
visualise data
def plot_predictions(train_data=X_train, train_labels=y_train, test_data=X_test, test_labels=y_test, predictions=None):

plt.figure(figsize=10,7))
plt.scatter(train_data,train_labels,c="b",s=4,label="Training data") #c is color, b = blue, s is size

plt.scatter(test_data,test_labels,c="g",s=4,label="Testing data") #c is color, g = green, s is size

if predictions is not NONE:
   plt.scatter(test_data, predictions, c="r",s=4,label="Predictions") #c is color, g = green, s is size)
   
plt.legend(prop={"size": 14});

plot_predictions();

move data to correct device

  • move to CUDA if available else CPU as per function at top of this page
X_train, y_train = X_train.to(device), y_train.to(device)
X_test, y_test = X_test.to(device), y_test.to(device)

build model

  • options:
    • create a subclass of torch.nn.Module (the base class for all neural network models in PyTorch)
    • or, torchvision.models (pre-trained models)
# create linear regression  model
from torch import nn

class LinearRegressionModel(nn.Module): #almost everything in PyTorch inherits from nn.Module
  def __init__(self):
    super().__init__()
    self.weights = nn.parameter(torch.randn(1,
                                             requires_grad=True,
                                             dtype=torch.float))
    self.bias = nn.parameter(torch.randn(1,
                                             requires_grad=True,
                                             dtype=torch.float))
    #forward method to define computation
    def forward(self,x:torch.Tensor) -> torch.Tensor: # x is input data eg. training data
        return self.weights * x + self.bias # create random values for weights and bias and then return internal result using the linear regression formula
    #by use of torch.optim it will aim to get as close as possible to best fit values for these using two algorithms behind the scenes:
      #gradient descent - hence requires_grad=True and then uses torch.autograd
      #backpropagation

create the model

model_0 = LinearRegressionModel()

create loss and optimizer functions

# Create the loss function
loss_fn = nn.L1Loss() # MAE loss is same as L1Loss

# Create the optimizer
optimizer = torch.optim.SGD(params=model_0.parameters(), # parameters of target model to optimize
                            lr=0.001) # learning rate (how much the optimizer should change parameters at each step, higher=more (less gives more accuracy but takes more epochs to get there)

fitting model to data - step-wise training and evaluating

  • torch.optim
torch.manual_seed(42)
torch.cuda.manual_seed(42) 

# Set the number of epochs (how many times the model will pass over the training data)
epochs = 2000

# Create empty loss lists to track values
train_loss_values = []
test_loss_values = []
epoch_count = []

for epoch in range(epochs):
    ### Training

    # Put model in training mode (this is the default state of a model)
    model_0.train()

    # 1. Forward pass on train data using the forward() method inside 
    y_pred = model_0(X_train)
    # print(y_pred)

    # 2. Calculate the loss (how different are our models predictions to the ground truth)
    loss = loss_fn(y_pred, y_train)
 
    # 3. Zero grad of the optimizer
    optimizer.zero_grad()

    # 4. Loss backwards
    loss.backward()

    # 5. Progress the optimizer
    optimizer.step()
    
    ### Testing

    # Put the model in evaluation mode - turns off various settings not needed in evaluation mode
    model_0.eval()

    with torch.inference_mode(): // turn off gradient checking as that is only needed in training mode and is similar to, but faster than torch.no_grad()
      # 1. Forward pass on test data
      test_pred = model_0(X_test)

      # 2. Calculate loss on test data
      test_loss = loss_fn(test_pred, y_test.type(torch.float)) # predictions come in torch.float datatype, so comparisons need to be done with tensors of the same type

      # Print out what's happening but need to convert the tensor values to numpy array values and need to get to CPU if using GPU
      if epoch % 10 == 0:
            epoch_count.append(epoch)
            train_loss_values.append(loss.detach().cpu().numpy())
            test_loss_values.append(test_loss.detach().cpu().numpy())
            print(f"Epoch: {epoch} | MAE Train Loss: {loss} | MAE Test Loss: {test_loss} ")
            print(model_0.state_dict())
    

improve model

  • helps to visualise the training
  • # Plot the loss curves
    plt.plot(epoch_count, train_loss_values, label="Train loss")
    plt.plot(epoch_count, test_loss_values, label="Test loss")
    plt.title("Training and test loss curves")
    plt.ylabel("Loss")
    plt.xlabel("Epochs")
    plt.legend();

save trained model

  • only need to save the model and its trained parameter values ie. state_dict()
  • from pathlib import Path
    
    #there are 3 core functions related to saving and loading a model:
      #torch.save() - uses Python's pickle format
      #torch.load()
      #torch.nn.Module.load_state_dict() - loads a models saved state dictionary - ie. all the parameters and their tensor values - this is the recommended approach (note the optimizer also has a state dict)
    
    # 1. Create models directory - in this case this is a directory within Google Colab
    MODEL_PATH = Path("models")
    MODEL_PATH.mkdir(parents=True, exist_ok=True)
    
    # 2. Create model save path - note the extension of .pth or .pt
    MODEL_NAME = "01_pytorch_workflow_model_0.pth"
    MODEL_SAVE_PATH = MODEL_PATH / MODEL_NAME
    
    # 3. Save the model state dict 
    print(f"Saving model to: {MODEL_SAVE_PATH}")
    torch.save(obj=model_0.state_dict(), # only saving the state_dict() only saves the models learned parameters
               f=MODEL_SAVE_PATH) 
               
    # Check the saved file path
    # !ls -l models/01_pytorch_workflow_model_0.pth

load trained model

# Instantiate a new instance of our model (this will be instantiated with random weights)
loaded_model_0 = LinearRegressionModel()

# Load the state_dict of our saved model (this will update the new instance of our model with trained weights)
loaded_model_0.load_state_dict(torch.load(f=MODEL_SAVE_PATH))

# 1. Put the loaded model into evaluation mode
loaded_model_0.eval()

# 2. Use the inference mode context manager to make predictions
with torch.inference_mode():
    loaded_model_preds = loaded_model_0(X_test) # perform a forward pass on the test data with the loaded model
    
# Compare previous model predictions with loaded model predictions (these should be the same)
y_preds == loaded_model_preds

Data classification or grouping with PyTorch

  • workflow is similar to the above with some differences

input data

  • for training we also need output classifications converted to numerical representations (y value or label)
  • eg. toy dataset options
    • sklearn.datasets.make_blobs #creates dataset with multiple clusters of data
  • don't forget to convert to tensors as sklearn uses numpy

choosing a classification model

  • binary or multiclassification outputs will use different components
  • for a test binary model:
    • #hidden layer activation for binary or multiclass when the data is not able to be separated into distinct groups by straight lines drawn between them, should be ReLU
      • ReLU is a non-linear activation function that just makes any negative value zero and allows learning of non-linear patterns
    • subclass nn.module as usual
    • create class model with:
      • two linear hidden layers in the constructor:
        self.layer_1 = nn.Linear(in_features=X.space, out_features=intermediary_output_size) #intermediary_output_size could be 5 for example
        self.reLU = nn.ReLU()
        self.layer_2 = nn.Linear(in_features=intermediary_output_size, out_features=1) # only a single number as the output (y.shape)
        #override the forward() as per above
        return self.layer_2(relu(self_layer_1(x)) # for processing steps: x => layer_1 => reLU => layer_2 
    • alternatively, the above could be achieved (except the ReLU) via:
      self.two_linear_layers = nn.Sequential(
        nn.Linear(in_features=X.space, out_features=intermediary_output_size),
        nn.Linear(in_features=intermediary_output_size, out_features=1)
      )
        return two_linear_layers(x)
  • use tensorflow playground to experiment with various layer architectures - how many layers and neurons / layer works best for your data type
  • choose a loss function:
    • for binary, usually use binary cross entropy with Logits ie. BCEWithLogitsLoss - this takes logits as the parameter! (as need to run Sigmoid BEFORE BCE and the Logits version has this built in)
    • for multi, usually use cross entropy (output activation to convert logit output for multi should be Softmax)
      • if your input data does not have equal numbers from each category (ie. not balanced), you will need to use the optional weight parameter to address this
      • cross entropy requires long integer tensors (torch.longtensor) for y target and will give a weird nll_ error if you use the usual float32 when creating this y tensor
  • choose an optimizer function:
    • usually use SGD or Adam
  • process the raw logits in the forward pass section of your training loop:
    • apply sigmoid function (if binary) or Softmax )if multi) to get probability then round this to get your y label output
    • apply squeeze to remove the extra dimension
  • improve model either by:
    • changing the model's hyperparameters (values programmer changes):
      • running more epochs
      • using a smaller learning
      • adding additional layers or neurons
    • change the optimization algorith eg. Adam instead of SGD
    • change the model algorithm
      • circular data will not be able to be learned with a linear model UNLESS you include ReLU()
    • analyse with torch.utils.tensorboard
  • evaluate the model:
    • NB. Google Colab does not have torchmetrics installed, need to install via !pip install torchmetrics
    • accuracy % correctly classified (but not good for imbalanced classes)
      • torchmetrics.accuracy(), or, sklearn.metrics.accuracy_score(), or write own function ( (true pos + true neg)/total)
    • precision
      • a measure of false pos rate
      • torchmetrics.precision(), or, sklearn.metrics.precision_score(), or write own function (true pos/(true pos + false pos) )
    • recall
      • a measure of false neg rate
      • torchmetrics.recall(), or, sklearn.metrics.recall_score(), or write own function (true pos/(true pos + false neg) )
      • NB there is a precision-recall trade off
    • F1-score
      • combines precision and recall
      • torchmetrics.F1Score(), or, sklearn.metrics.f1_score(), or write own function = 2 x (precision x recall)/(precision + recall) )
    • confusion matrix
      • helps identify where model is failing but hard to use with large numbers of classes
      • torchmetrics.ConfusionMatrix()
    • classification report
      • combines the above into a report
      • sklearn.metrics.classification_report()

Computer vision with PyTorch and Convolutional Neural Networks (CNN)

Training your own medical small language model

it/ai_deep_learning.txt · Last modified: 2023/12/30 07:33 by gary1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki