it:ai_machine_learning
Table of Contents
AI machine learning
see also:
- in contrast to the traditional neural networks outlined below, there are other machine learning frameworks such as:
- Kolmogorov-Arnold Networks (KANs):
- information flows along connections between pieces, like putting puzzle pieces together. Each connection has its own function, like a strangely shaped puzzle piece that changes the information it connects.
Introduction
- machine learning is turning data into numbers then finding patterns in those numbers when the rules to create a desired output from inputs is too complex or are not well known
- you should use a traditional programming solution rather than machine learning if you can build a simple rule-based system
- machine learning can adapt to changing environments or scenarios
- machine learning can help to discover insights into large volumes of data however, the patterns learned can be uninterpretable by humans and the outputs from machine learning aren't always predictable and may be erroneous as they are based upon relatively large datasets and probability
- machine learning with “shallow algoriths” such as decision trees is best on structured data such as rows and columns of data, examples include:
- gradient boosted machine such as XGBoost
- random forest
- naive Bayes
- Nearest neighbour
- support vector machine
- AI deep learning is typically used for unstructured data and uses neural networks although a tensor can create more structure, examples include:
- neural networks
- fully connected neural network
- convolutional neural network
- recurrent neural network
- transformer
basic steps
- get raw data
- clean data to remove duplicates or irrelevancies and to convert text into numerical values
- split the data into training vs testing data
- create a model such as neural network or decision trees
- train the model with the training data
- test the model to make predictions
- evaluate the accuracy and improve - fine tune parameters
Python libraries for AI
- NumPy
- special python n-dimensional array type for faster processing along with additional properties, methods
- array must be all the same data type
- np1 = np.array([… ])
- np1.shape() gives number of items similar to len()
- np2 = np.arange(10) will create an array [0,1,2,3,4,5,6,7,8,9] can use startvalue, endvalue, step
- np3 = np.zeros(10) will create an array [0,0,0,0,0,0,0,0,0]
- np4 = np.zeros1) will create a 2D array of zeros
- np3 = np.full( (10),4) will create an array filled with value 4 [4,4,4,4,4,4,4,4]
- np7 = np.array(python_list) will create a numpy array from a python array
- Pandas - data frame like Excel
- MatPlotLib 2D charting
- SciKit-Learn - machine learning model types
- Jupyter - to allow your code to be segmented into cells and each cell run by itself or all together and provides better inspection of data
- once installed via Anaconda, to run, open up a terminal window, then type $jupyter notebook
- ipynb files
- if green left bar - in edit mode (hit ESC to go to command mode)
- if blue left bar - in command mode
- press A to insert new command line cell above and press B to insert new cell below
- press D twice to delete the active cell?
- each mode has different shortcuts - press H to see them
- shift-tab for tool tip
- Ctrl-Enter to run cell without adding new cells
- Ctrl-slash to comment out selection
- MS Visual Studio Code
- used to view the .dot chart files
- need to run it then install extension: Graphviz (dot) language support for Visual Studio Code by Stephanvs
- Anaconda (5Gb) - installs the above
Machine learning with Jupyter, sklearn and Pandas
basic code
import pandas as pd
from sklearn.tree import DecisionTreeClassifier # (if this is the model you wish to use)
from sklearn.model_selection import train_test_split # only needed when training model
from sklearn.metrics import accuracy_score # only needed when evaluating model
from sklearn.externals import joblib # needed to save the trained model
from sklearn import tree # only needed to visualise the model tree
df= pd.read_csv('csv_filename') # this imports the csv data file into pandas into a dataframe variable df - this will display the data in a table
df.shape # this will output number of rows and columns
df.describe() # this gives the data statistics of each column - count, mean, std, min, 25%, 50%, 75%, max
df.values() # displays the array
split the data into train and test parts
X = df.drop(columns=['output_columnname']) # will create a new dataset X with that column removed (by conventional these are capitalized names) y = df['output_columnname'] # create a new dataset y with only the output column X_train, X_test, y_train, y_test train_test_split(X,y,test_size = 0.2) # split your data into training and test parts, in this case 20% of data will be used in testing phase model = DecisionTreeClassifier() # create your model type
now train, view and save your model
model.fit(X_train,y_train) # trains the model tree.export_graphviz(model, out_file='chartfilename.dot', feature_names = [train_column1, traincolumn2], class_names= sorted(y.unique()), label='all', rounded=True, filled=True) #optionally to save a graphic display of trained and generated decision tree in the model joblib.dump(model, 'saved_model_filename.joblib') # to save the trained model
now evaluate model using your test dataset:
predictions = model.predict( X_test ) predictions # to display the predictions as an array of predictions score = accuracy_score(y_test, predictions) score # to display score
later, you can load your saved model and use it to create predictions without needing to re-train it
- need to re-check this code!
import pandas as pd
from sklearn.tree import DecisionTreeClassifier # (if this is the model you wish to use)
from sklearn.externals import joblib # needed to load or save the trained model
model = joblib.load('saved_model_filename.joblib') # to load the trained model
predictions = model.predict( your_new_dataset_array_to_analyse )
predictions # to display the predictions as an array of predictions
1)
2,10
it/ai_machine_learning.txt · Last modified: 2024/05/04 03:44 by gary1