AI deep learning

Introduction

deep_learning is typically used for unstructured data (images, voice, natural language, etc) and uses neural networks although a tensor can create more structure, examples include:
- feed forward neural networks
  - assumes each input value is independent - not good for sequences such as letters in words or words in sentences - recurrent would be better
  - needs a non-linear activation (eg. relu, sigmoid, tanh) otherwise it just becomes a linear regression model but with the black box of a neural network
- fully connected neural network
- convolutional neural network
- recurrent neural network
  - great for relatively short sequences as retains “memory” or older inputs in sequence
  - but:
    - may become a deep neural network
    - during backpropagation through time we may see same term recurrently
    - these may compound creating exploding gradients causing instability and inability to learn
    - these may create vanishing gradients which become zero and which stops the model from updating and hence unable to learn
  - solutions include:
    - Gated Reecurrent Unit (GRU)
    - Long Short-Term Memory (LSTM)
- transformer models
  - invented in 2017 and published in a paper titled “Attention is all you need”, initially for language translation, natural language processing
  - has the advantage over RNNs as they can be scaled and parallelized for unlimited speed improvements and they are much better at keeping track of longer sentences or texts
  - they achieve this by positional encoding (inputs have the numerical order of each word) and self-attention (the internal learning of patterns such as synonyms, grammatical rules of various languages, understanding words by their context, etc)
    - see Youtube: how to make a nanoGPT to level of GPT-2
one of the main Python language tools to work in this domain is PyTorch

Neural networks

with neural networks, you tell your network the inputs and what you want for the outputs, and let it learn on its own.
main flow types
- a feedforward network contains inputs, outputs, and hidden layers.
  - The signals can only travel in one direction (forward). Input data passes into a layer where calculations are performed. Each processing element computes based upon the weighted sum of its inputs. The new values become the new input values that feed the next layer (feed-forward). This continues through all the layers and determines the output.
- a feedback network has feedback paths.
  - This means that they can have signals traveling in both directions using loops. Since loops are present in this type of network, it becomes a non-linear dynamic system which changes continuously until it reaches a state of equilibrium. Feedback networks are often used in optimization problems where the network looks for the best arrangement of interconnected factors.
typically have:
- an input layer of neurons
- hidden layer(s) of neurons
- an output layer of neuron(s)
types of learning:
- supervised learning
  - when you have lots of inputs and known outputs or labels (eg. training on images and each image has been labelled with the content - which is the desired output)
- unsupervised and self-supervised learning
  - when you have inputs but no output labels, the network will come up with a range of patterns which you will need to correlate with labels later
- transfer learning
  - when you transfer a learned model into a new model
- reinforcement learning
use-case types:
- sequence to sequence (seq2seq)
  - input and output are both sequences eg. translation of a sentence, asking Siri a question (speech recognition)
- classification / regression
  - identification such as computer vision, spam detection in natural language processing

Basic steps

obtain inputs
convert to tensors by numerical encoding
- tensors
  - are object-oriented matrices-like numerical representations of data
build or choose a pretrained model
- choose a loss function and optimizer
- build a training loop
fit the model to the data
- pass through neural networks to learn representation or patterns, features and weights
- creates numerical representation tensor outputs
- create human understandable outputs
save and load the model
make predictions with the model
evaluate the model predictions
improve through experimentation
use the model on new data to make predictions

Tensors

torch.tensor
types of tensors
- scalar
  - scalar = = torch.tensor(7)
  - attributes: no dimensions, just a single number
  - convert to python int by using the item() function
- vector
  - created by passing a [] eg vector = torch.tensor([7,10])
  - has magnitude and direction
  - has one dimension, shape = 2
- MATRIX
  - MATRIX = torch.tensor([ [7,10],[11,15] ])
  - have 2 dimensions, and shape is [2,2]
- TENSOR
  - TENSOR = torch.tensor([ [ [1,2,3],[2,3,4],[3,4,5] ] ])
  - TENSOR.ndim gives 3 dimensions but can have any number of dimensions
  - TENSOR.shape gives [1,3,3]
- Random tensors
  - many models start with tensors full of random numbers and then adjust those to better represent the data in an iterative manner
  - eg. random_tensor = torch.rand(3,4)
  - random_image_sensor = torch.rand(noChannels,height,width)
  - reproducibility
    - taking the randomness out by using a random seed, add the following before calling creation of a random tensor
    - RANDOM_SEED = an integer value
    - torch.manual_seed(RANDOM_SEED) # must be called EACH time before you create a random tensor and the output will be identical to the first one you created
    - if using GPU then torch.cuda.manual_seed(RANDOM_SEED)
- Zero tensors
  - zero = torch.zeros(size=(x,y) )
- One tensors
  - ones = torch.ones(size=(x,y) )
range of tensors
- torch.range(1,11) #may be deprecated
- torch.arange(1,11) #start, end, step
copying shape of a tensor to a new tensor
- new_zeros = torch.zeros_like(input=anotherTensor) #creates a zero tensor of same shape as anotherTensor
datatypes
- default is float32
- can specify at creation by using parameter dtype=
- can specify at creation the device by using parameter device = none / CPU / CUDA - #default is CPU
- requires_grad - whether to track gradients
- can convert a float32 tensor to float16 tensor by:
  - float16_tensor = float32_tensor.type(torch.float16)
logits
- log-odds function = logit value = ln(x/(1-x)) if epsilon is None
- a function that represents probability values from 0 to 1, and negative infinity to infinity used to model the odds of success of an event as a function of independent variables.
- in contrast to Linear Regression which is used to handle regression problems whereas Logistic regression is used to handle the classification problems.
  - Linear regression provides a continuous output but Logistic regression provides discreet output.
- it is is an inverse to the sigmoid function that limits values between 0 and 1 across the Y-axis, rather than the X-axis
- tensors containing logit elements are used in tensor calculations for computer vision etc
- raw model outputs are often as a tensor of logits which then need to be converted to prediction probabilities (via Sigma or SoftMax) and then to labels
- see https://alband.github.io/doc_view/generated/torch.logit.html
attributes
- .dtype
- .shape (same output as the function .size() )
- .device
operations
- element-wise multiplication - can use torch.mul(tensor, 10) instead of tensor * 10 #if multiplying two tensors only the ordered element is multiplied with the corresponding ordered element
- matrix multiplication (dot product) - can use torch.matmul(tensor, tensor2) instead of tensor.tensor2 #see https://www.mathsisfun.com/algebra/matrix-multiplying.html
- tensor aggregation
  - min, max, mean, sum, etc
    - torch.min(tensor) or tensor.min()
    - NB. mean requires floatig types or complex tensors not integers
  - positional min, max - to get the index of which value has min or max
    - tensor.argmin() and tensor.argmax()
- reshaping, stacking, squeezing and unsqueezing tensors
  - tensor.reshape() - new tensor with different shape
  - tensor.view() - same tensor and shared memory but different shape view
  - torch.stack([tensor1,tensor2],dim=0) - combine tensors; also can use vstack and hstack
  - torch.squeeze(tensor) removes all 1 dimensions from a tensor
  - tensor.unsqueeze(dim=z) add a 1 dimension at dimension z
  - tensor.permute(tuple of newdim indexes) - swap dimensions as a view (hence sharing same memory)
getting a tensor values at an indexes
- use tensor.[dim1][dim2][dim3] etc which would equate to tensor.[dim1,dim2,dim3,etc] and output would depend upon the shape of the tensor but you can't use a dim value greater than or equal to the shape dim value at that position, can use a colon as a wildcard and this will return a matrix tensor
convert a NumPy array into a new tensor in memory
- torch.from_numpy(ndarray)
- you can reverse this process and convert to numpy array by using tensor.numpy()
- NB. numPy's default data type is float64 not float32, so you may need to convert it to Float32 using type
make some functions
- let's make the torch.reLU() function in our own function:

def myRelu (x :torch.tensor) -> torch.tensor #takes a tensor as input and outputs a tensor
    return torch.maximum(torch.tensor(0), x) #this converts any negative value to zero

let's make the torch.sigmoid function:

def mySigmoid (x :torch.tensor) -> torch.tensor
    return 1 / (1+torch.exp(-x))

NumPy

multidimensional array library
much faster than Python lists as it:
- uses less bytes of memory as uses fixed types of integers whereas a list must also store object type, size, reference count in addition to the value
- no need to do type checking when iterating through objects
- uses contiguous memory which is faster, allows SIMD vector processing, more effect cache utilisation
more functions than Python lists:
- can multiply arrays of same shape - this will fail in Lists (and in tensors)
applications:
- matlab replacement
- plotting (matplotlib)
- backend for Pandas, etc
- can store images
- machine learning
creating arrays
- a = np.array([1,2,3],[7,8,9])
- if you do b = a then b shares same memory of a
- if you want a separate array use b = a.copy()
- a.ndim , a.shape, a.dtype, a.itemsize - gives number bytes per element, a.nbytes gives total byte size
- a.[r,c] gets item at row r, column c (negatives is element from end), : is a wildcard all
- get a range by a.[r, start:end:step]
- zeros = np.zeros(dim1,dim2)
- ones = np.ones(dim1,dim2)
- filled = np.full( (dim1,dim2),valueToFill)
- filled = np.full_like(anotherArray,valueToFill)
- randarray = np.random.rand(dim1,dim2)
- randarray = np.random.randint(startint, endint, size = (dim1,dim2) )
- randarray = np.random.random_sample(anotherarray.shape)
- identity = np.identity(dim1)
- repeated = np.repeat(array, axis=axisvalue)
operations
- generally as for Python maths
- trig functions
linear algebra
- matrix multiply np.matmul(array1,array2) #as usual no.columns in array1 must equal no.rows in array2
- determinant: np.linalg.det(array1)
- inverse
- Eigenvalues
- singular vector decomposition
- trace
- matrix norm
statistics of the array
- optionally pass axis=axisnumber
- min, max, sum,
reorganizing arrays
- newarray = oldarray.reshape( (dim1,dim2) ) as long as same number values
- vstack = np.vstack([arr1,arr2])
- hstack = np.hstack([arr1,arr2])
import data from csv text file
- newarray = np.genfromtxt('textfilename',delimiter=',') #default will be float type
- newarray = newarray.asType('int32') # to convert to int
boolean masking and advanced indexing
- newarray > 12 will give a array of booleans showing which values are greater than 12
- can index with a list so, to get the actual values greater than 12 can use:
  - newarray[newarray > 12]
- np.any(newarray > 12, axis=0) gives boolean array indicating which columns had at least one element satisfied argument * np.all(newarray > 12, axis=0) gives boolean array indicating which columns had all elements satisfied argument
- ( (newarray >12) & (newarray <100) ) allow as above but a range for the argument
- (~ ( (newarray >12) & (newarray <100) ) ) allow as above but ~ is like NOT so the boolean values will be opposite

NumPy ONLY works on the CPU and NOT on a GPU!
- move tensor to CPU BEFORE moving to a NumPy array via: tensor.cpu().numpy()

PyTorch

an evolution of Torch which allows fast writing of deep learning code in Python language to run on a GPU via CUDA or Tensor Processing Units (TPUs)
created by Facebook and is now open source and one of the main programs for developing deep learning apps
many pre-built deep learning models for transfer learning
allows:
- pre-processing of data using tensors, etc
- model data
- deployment of model
optionally run PyTorch on Google Coda
- easy and does not need a local GPU not installation set up of CUDA and PyTorch locally
install CUDA if you have a compatible nVidia GPU
- first install MS Visual Studio Code, Visual Studio Community Edition Python components (8Gb)? - this is so the CUDA install detects it and creates interfaces to VS
- then download nVidia CUDA (3Gb) and install
- in Anaconda create a new Environment eg. CUDATest in which to install this
- install NumPy (this is required by PyTorch)
- then install PyTorch for CUDA with the version CUDA toolkit it needs (2.6Gb) eg.
  - via Anaconda Prompt:
    - conda activate newEnvironmentname
    - see https://pytorch.org/get-started/locally/ on conda command prompt line top install
    - also optionally need torchmetrics for evaluation:
    - conda install -c conda-forge torchmetrics
check GPU accessible
- in Anaconda select the environment in which you installed PyTorch (otherwise import torch may fail if you use the base environment)
- then open Jupyter notebook
```
import torch
torch.cuda.is_available()
#check your GPU hardware
!nvidia-smi
```
code so that all tensors are on the same device
- default device is CPU
- if you want them on GPU:
  - doesn't work: torch.set_default_tensor_type.device = torch.device('cuda') #compiles but doesn't make tensor devices GPU!
  - doesn't work: torch.set_default_tensor_device = “cuda” #compiles but doesn't make tensor devices GPU!
  - so declare a variable device then use this when creating tensors:
```
device = "cuda" if torch.cuda.is_available() else "cpu"
tensor = torch.tensor([3,4], device=device)
tensor = tensor.to(device) # and it will move it to GPU if available if you forgot to do the above step
```
  - alternatively you can do torch.tensor(x,y).cuda() to try to move it to cuda

PyTorch workflow

see https://github.com/mrdbourke/pytorch-deep-learning/blob/main/docs/01_pytorch_workflow.ipynb

data import and cleansing

initial set up code

import torch
from torch import nn #nn contains all of pyTorch modules for neural networks
import matplotlib.pyplot as plt # allows visualisation

torch.__version__ #check version

get data into tensor

option 1. create linear data to test

weight = 0.7 # gradient of y = mx + c
bias = 0.3 # Y intercept at x= 0 ie. c
start = 0
end=1
step=0.02
X = torch.arange(start,end,step).unsqueeze(dim=1) #features
y = weight * X + bias #output labels

option 2. Import data

torchvision.transforms
torch.utils.data.Dataset #create a dataset variable
torch.utils.data.DataLoader

split data into training data and test data

train_split = int(0.8 * len(X) )
X_train, y_train = X[:train_split], y[:train_split]
X_test, y_test = X[train_split:], y[train_split:]
# could make a more random split by using scikit learn train test split (see in machine learning page)

visualise data

def plot_predictions(train_data=X_train, train_labels=y_train, test_data=X_test, test_labels=y_test, predictions=None):

plt.figure(figsize=10,7))
plt.scatter(train_data,train_labels,c="b",s=4,label="Training data") #c is color, b = blue, s is size

plt.scatter(test_data,test_labels,c="g",s=4,label="Testing data") #c is color, g = green, s is size

if predictions is not NONE:
   plt.scatter(test_data, predictions, c="r",s=4,label="Predictions") #c is color, g = green, s is size)
   
plt.legend(prop={"size": 14});

plot_predictions();

move data to correct device

move to CUDA if available else CPU as per function at top of this page

X_train, y_train = X_train.to(device), y_train.to(device)
X_test, y_test = X_test.to(device), y_test.to(device)

build model

options:
- create a subclass of torch.nn.Module (the base class for all neural network models in PyTorch)
- or, torchvision.models (pre-trained models)

# create linear regression  model
from torch import nn

class LinearRegressionModel(nn.Module): #almost everything in PyTorch inherits from nn.Module
  def __init__(self):
    super().__init__()
    self.weights = nn.parameter(torch.randn(1,
                                             requires_grad=True,
                                             dtype=torch.float))
    self.bias = nn.parameter(torch.randn(1,
                                             requires_grad=True,
                                             dtype=torch.float))
    #forward method to define computation
    def forward(self,x:torch.Tensor) -> torch.Tensor: # x is input data eg. training data
        return self.weights * x + self.bias # create random values for weights and bias and then return internal result using the linear regression formula
    #by use of torch.optim it will aim to get as close as possible to best fit values for these using two algorithms behind the scenes:
      #gradient descent - hence requires_grad=True and then uses torch.autograd
      #backpropagation

see AI_gradient_descent and AI - backpropagation

create the model

model_0 = LinearRegressionModel()

create loss and optimizer functions

# Create the loss function
loss_fn = nn.L1Loss() # MAE loss is same as L1Loss

# Create the optimizer
optimizer = torch.optim.SGD(params=model_0.parameters(), # parameters of target model to optimize
                            lr=0.001) # learning rate (how much the optimizer should change parameters at each step, higher=more (less gives more accuracy but takes more epochs to get there)

fitting model to data - step-wise training and evaluating

torch.optim

torch.manual_seed(42)
torch.cuda.manual_seed(42) 

# Set the number of epochs (how many times the model will pass over the training data)
epochs = 2000

# Create empty loss lists to track values
train_loss_values = []
test_loss_values = []
epoch_count = []

for epoch in range(epochs):
    ### Training

    # Put model in training mode (this is the default state of a model)
    model_0.train()

    # 1. Forward pass on train data using the forward() method inside 
    y_pred = model_0(X_train)
    # print(y_pred)

    # 2. Calculate the loss (how different are our models predictions to the ground truth)
    loss = loss_fn(y_pred, y_train)
 
    # 3. Zero grad of the optimizer
    optimizer.zero_grad()

    # 4. Loss backwards
    loss.backward()

    # 5. Progress the optimizer
    optimizer.step()
    
    ### Testing

    # Put the model in evaluation mode - turns off various settings not needed in evaluation mode
    model_0.eval()

    with torch.inference_mode(): // turn off gradient checking as that is only needed in training mode and is similar to, but faster than torch.no_grad()
      # 1. Forward pass on test data
      test_pred = model_0(X_test)

      # 2. Calculate loss on test data
      test_loss = loss_fn(test_pred, y_test.type(torch.float)) # predictions come in torch.float datatype, so comparisons need to be done with tensors of the same type

      # Print out what's happening but need to convert the tensor values to numpy array values and need to get to CPU if using GPU
      if epoch % 10 == 0:
            epoch_count.append(epoch)
            train_loss_values.append(loss.detach().cpu().numpy())
            test_loss_values.append(test_loss.detach().cpu().numpy())
            print(f"Epoch: {epoch} | MAE Train Loss: {loss} | MAE Test Loss: {test_loss} ")
            print(model_0.state_dict())

improve model

helps to visualise the training

# Plot the loss curves
plt.plot(epoch_count, train_loss_values, label="Train loss")
plt.plot(epoch_count, test_loss_values, label="Test loss")
plt.title("Training and test loss curves")
plt.ylabel("Loss")
plt.xlabel("Epochs")
plt.legend();

save trained model

only need to save the model and its trained parameter values ie. state_dict()

from pathlib import Path

#there are 3 core functions related to saving and loading a model:
  #torch.save() - uses Python's pickle format
  #torch.load()
  #torch.nn.Module.load_state_dict() - loads a models saved state dictionary - ie. all the parameters and their tensor values - this is the recommended approach (note the optimizer also has a state dict)

# 1. Create models directory - in this case this is a directory within Google Colab
MODEL_PATH = Path("models")
MODEL_PATH.mkdir(parents=True, exist_ok=True)

# 2. Create model save path - note the extension of .pth or .pt
MODEL_NAME = "01_pytorch_workflow_model_0.pth"
MODEL_SAVE_PATH = MODEL_PATH / MODEL_NAME

# 3. Save the model state dict 
print(f"Saving model to: {MODEL_SAVE_PATH}")
torch.save(obj=model_0.state_dict(), # only saving the state_dict() only saves the models learned parameters
           f=MODEL_SAVE_PATH) 
           
# Check the saved file path
# !ls -l models/01_pytorch_workflow_model_0.pth

load trained model

# Instantiate a new instance of our model (this will be instantiated with random weights)
loaded_model_0 = LinearRegressionModel()

# Load the state_dict of our saved model (this will update the new instance of our model with trained weights)
loaded_model_0.load_state_dict(torch.load(f=MODEL_SAVE_PATH))

# 1. Put the loaded model into evaluation mode
loaded_model_0.eval()

# 2. Use the inference mode context manager to make predictions
with torch.inference_mode():
    loaded_model_preds = loaded_model_0(X_test) # perform a forward pass on the test data with the loaded model
    
# Compare previous model predictions with loaded model predictions (these should be the same)
y_preds == loaded_model_preds

Data classification or grouping with PyTorch

workflow is similar to the above with some differences

input data

for training we also need output classifications converted to numerical representations (y value or label)
eg. toy dataset options
- sklearn.datasets.make_blobs #creates dataset with multiple clusters of data
don't forget to convert to tensors as sklearn uses numpy

choosing a classification model

binary or multiclassification outputs will use different components

for a test binary model:

#hidden layer activation for binary or multiclass when the data is not able to be separated into distinct groups by straight lines drawn between them, should be ReLU
- ReLU is a non-linear activation function that just makes any negative value zero and allows learning of non-linear patterns
subclass nn.module as usual

create class model with:

two linear hidden layers in the constructor:

self.layer_1 = nn.Linear(in_features=X.space, out_features=intermediary_output_size) #intermediary_output_size could be 5 for example
self.reLU = nn.ReLU()
self.layer_2 = nn.Linear(in_features=intermediary_output_size, out_features=1) # only a single number as the output (y.shape)
#override the forward() as per above
return self.layer_2(relu(self_layer_1(x)) # for processing steps: x => layer_1 => reLU => layer_2

alternatively, the above could be achieved (except the ReLU) via:

self.two_linear_layers = nn.Sequential(
  nn.Linear(in_features=X.space, out_features=intermediary_output_size),
  nn.Linear(in_features=intermediary_output_size, out_features=1)
)
  return two_linear_layers(x)

use tensorflow playground to experiment with various layer architectures - how many layers and neurons / layer works best for your data type

choose a loss function:
- for binary, usually use binary cross entropy with Logits ie. BCEWithLogitsLoss - this takes logits as the parameter! (as need to run Sigmoid BEFORE BCE and the Logits version has this built in)
- for multi, usually use cross entropy (output activation to convert logit output for multi should be Softmax)
  - if your input data does not have equal numbers from each category (ie. not balanced), you will need to use the optional weight parameter to address this
  - cross entropy requires long integer tensors (torch.longtensor) for y target and will give a weird nll_ error if you use the usual float32 when creating this y tensor
choose an optimizer function:
- usually use SGD or Adam
process the raw logits in the forward pass section of your training loop:
- apply sigmoid function (if binary) or Softmax )if multi) to get probability then round this to get your y label output
- apply squeeze to remove the extra dimension
improve model either by:
- changing the model's hyperparameters (values programmer changes):
  - running more epochs
  - using a smaller learning
  - adding additional layers or neurons
- change the optimization algorith eg. Adam instead of SGD
- change the model algorithm
  - circular data will not be able to be learned with a linear model UNLESS you include ReLU()
- analyse with torch.utils.tensorboard
evaluate the model:
- NB. Google Colab does not have torchmetrics installed, need to install via !pip install torchmetrics
- accuracy % correctly classified (but not good for imbalanced classes)
  - torchmetrics.accuracy(), or, sklearn.metrics.accuracy_score(), or write own function ( (true pos + true neg)/total)
- precision
  - a measure of false pos rate
  - torchmetrics.precision(), or, sklearn.metrics.precision_score(), or write own function (true pos/(true pos + false pos) )
- recall
  - a measure of false neg rate
  - torchmetrics.recall(), or, sklearn.metrics.recall_score(), or write own function (true pos/(true pos + false neg) )
  - NB there is a precision-recall trade off
- F1-score
  - combines precision and recall
  - torchmetrics.F1Score(), or, sklearn.metrics.f1_score(), or write own function = 2 x (precision x recall)/(precision + recall) )
- confusion matrix
  - helps identify where model is failing but hard to use with large numbers of classes
  - torchmetrics.ConfusionMatrix()
- classification report
  - combines the above into a report
  - sklearn.metrics.classification_report()

Computer vision with PyTorch and Convolutional Neural Networks (CNN)

see AI - Computer Vision

Training your own medical small language model

see https://www.youtube.com/watch?v=1ILVm4IeNY8

OzEMedicine - Wiki for Australian Emergency Medicine Doctors

Table of Contents

AI deep learning

Introduction

Neural networks

Basic steps

Tensors

NumPy

PyTorch

PyTorch workflow

data import and cleansing

initial set up code

get data into tensor

option 1. create linear data to test

option 2. Import data

split data into training data and test data

visualise data

move data to correct device

build model

create the model

create loss and optimizer functions

fitting model to data - step-wise training and evaluating

improve model

save trained model

load trained model

Data classification or grouping with PyTorch

input data

choosing a classification model

Computer vision with PyTorch and Convolutional Neural Networks (CNN)

Training your own medical small language model