it:ai_llm_guis
Table of Contents
User Interfaces to AI LLMs
Introduction
- there are now many AI Large Language Models (LLMs) and many ways to access these, although many require setting up a payment account
- some can be set up to be private so you are not sending information into the internet
- some allow access to a variety of LLMs from the same interface
Free LLMs hosted on the internet
- Microsoft's Bing AI
- unfortunately not very useful for clinical queries
- Perplexity AI free version
- much better than Bing AI but the Pro paid version offers useful additional features including access to the latest LLMs such as GPT4 Turbo and Claude 2
- Perplexity's own LLMs are:
- based on mistral-7b and llama2-70b base models
- online versions are augmented with in-house web search, indexing, and crawling infrastructure with search index being large, updated on a regular cadence, and uses sophisticated ranking algorithms to ensure high quality, non-SEOed sites are prioritized, while website excerpts, they call “snippets”, are provided to their pplx-online models to enable responses with the most up-to-date information. These are then continually fine tuned to achieve high performance on various axes like helpfulness, factuality, and freshness 1)
-
- uses a Groq very fast AI inference engine, can use Llama2 70b, or Mixtral
- API access is at a small fee per million tokens used
Free LLMs hosted locally on your computer
- there are a variety of free LLM user interfaces such as:
- LM Studio - see AI - performance testing LLMs using LM Studio
- PrivateGPT
- Jan.ai
- Open Webui - for Ollama interface - allows system prompts to be stored and used, and private docs
- these generally allow access to a local free LLMs such as Mistral Instruct and Mixtral8x7b Instruct models which you can download for free from various hosted websites such as HuggingFace
- NB. models which are larger than your VRAM will run much slower
- these avoid your information being sent to the internet but require a powerful computer to run (preferably with at least 16Gb VRAM although 8Gb will suffice for smaller models or if you are happy to wait a few minutes for responses)
- these may also allow you to privately utilise your own document files to be embedded and used in your queries
- LLocalSearch
- a Perplexity-like GIT source project which just needs Ollama, 2 small LLMs and Docker installed to run it to get live web search plus LLM results.
Free LLMs hosted locally on your iPhone
- PrivateLLM iOS app $15 to buy
- can run Phi-3-mini-4K-instruct 4 bit quantized version if device has at least 6Gb RAM
- iPhone 12 Pro will run it at 10tokens/sec, iPhone 15 runs at 15tok/sec, 15 Pro at 18tok/sec
Accessing your home local LLM via your iPhone on the internet
NGrok - Docker - Ollama approach
- uses:
- Ollama to run the LLM
- Docker to create a container for the Ollama and expose a port
- on Windows this requires installing WSL2 to allow creation of a Linux subsytem and you then need to set up Linux username and password for each Linux distribution (this has no bearing on your Windows user name)
- Ollama Web UI for the web user interface - allows logging in, setting admin settings, grant other users access, and web interface to using the LLM in Ollama
- NGrok - sign up for an account, download it and have it running on your local computer to provide internet access via the allocated URL it provides
Pay per use LLMs on the internet
- these usually charge a per token use fee
- ChatGPT 4
- Google Gemini Advanced running on Google Gemini Ultra 1.0 at $US20/mth introduced in Feb 2024
- faster than GPT-4 and similar speed as Perplexity but Perplexity offers more features
Third party web interfaces to LLMs
- these often charge a monthly fee but can provide access to various LLMs and often provide additional features
- eg. Perplexity AI Pro $US20/mth allows important extra features such as choosing from a variety of LLMs (GPT-4, Claude 2.1, Gemini, or Perplexity), GPT-4 Visual, options to select only scientific papers as sources, and over 300 queries/day
- the free access does not give access to Co-Pilot and only gives you access to Perplexity LLM which does not appear to give as good a response as local Mistral 7B
- presumably this uses the technology processes outlined in https://www.youtube.com/watch?v=IbOoEJ9N2z8
- Perplexity AI LLM answer engine was built in 6 months at a cost of under $US4m utilising Megatron LM and open source Ray via Anyscale and the default LLM is their fine tuned version of GPT 3.5 as it is 4x faster and 4x cheaper than GPT-4 and almost as good - see Perplexity AI's CEO talk
Third party API interfaces to LLMs
it/ai_llm_guis.txt · Last modified: 2024/05/11 23:41 by gary1