User Interfaces to AI LLMs

Introduction

there are now many AI Large Language Models (LLMs) and many ways to access these, although many require setting up a payment account
some can be set up to be private so you are not sending information into the internet
some allow access to a variety of LLMs from the same interface

Microsoft's Bing AI
- unfortunately not very useful for clinical queries
Perplexity AI free version
- much better than Bing AI but the Pro paid version offers useful additional features including access to the latest LLMs such as GPT4 Turbo and Claude 2
- Perplexity's own LLMs are:
  - based on mistral-7b and llama2-70b base models
  - online versions are augmented with in-house web search, indexing, and crawling infrastructure with search index being large, updated on a regular cadence, and uses sophisticated ranking algorithms to ensure high quality, non-SEOed sites are prioritized, while website excerpts, they call “snippets”, are provided to their pplx-online models to enable responses with the most up-to-date information. These are then continually fine tuned to achieve high performance on various axes like helpfulness, factuality, and freshness ¹⁾
Groq Cloud
- uses a Groq very fast AI inference engine, can use Llama2 70b, or Mixtral
- API access is at a small fee per million tokens used

there are a variety of free LLM user interfaces such as:
- LM Studio - see AI - performance testing LLMs using LM Studio
- PrivateGPT
- Jan.ai
- Open Webui - for Ollama interface - allows system prompts to be stored and used, and private docs
these generally allow access to a local free LLMs such as Mistral Instruct and Mixtral8x7b Instruct models which you can download for free from various hosted websites such as HuggingFace
- NB. models which are larger than your VRAM will run much slower
these avoid your information being sent to the internet but require a powerful computer to run (preferably with at least 16Gb VRAM although 8Gb will suffice for smaller models or if you are happy to wait a few minutes for responses)
these may also allow you to privately utilise your own document files to be embedded and used in your queries
LLocalSearch
- a Perplexity-like GIT source project which just needs Ollama, 2 small LLMs and Docker installed to run it to get live web search plus LLM results.
- https://github.com/nilsherzig/LLocalSearch
- https://www.youtube.com/watch?v=lsp4KhLETTY

PrivateLLM iOS app $15 to buy
- can run Phi-3-mini-4K-instruct 4 bit quantized version if device has at least 6Gb RAM
  - iPhone 12 Pro will run it at 10tokens/sec, iPhone 15 runs at 15tok/sec, 15 Pro at 18tok/sec
  - see https://privatellm.app/blog/microsoft-phi-3-mini-4k-instruct-now-available-on-iphone-and-ipad

see https://www.youtube.com/watch?v=syR0fT0rkgY and https://decoder.sh/videos/use-your-self_hosted-llm-anywhere-with-ollama-web-ui
uses:
- Ollama to run the LLM
- Docker to create a container for the Ollama and expose a port
  - on Windows this requires installing WSL2 to allow creation of a Linux subsytem and you then need to set up Linux username and password for each Linux distribution (this has no bearing on your Windows user name)
- Ollama Web UI for the web user interface - allows logging in, setting admin settings, grant other users access, and web interface to using the LLM in Ollama
- NGrok - sign up for an account, download it and have it running on your local computer to provide internet access via the allocated URL it provides

these usually charge a per token use fee
ChatGPT 4
Google Gemini Advanced running on Google Gemini Ultra 1.0 at $US20/mth introduced in Feb 2024
- faster than GPT-4 and similar speed as Perplexity but Perplexity offers more features

these often charge a monthly fee but can provide access to various LLMs and often provide additional features
eg. Perplexity AI Pro $US20/mth allows important extra features such as choosing from a variety of LLMs (GPT-4, Claude 2.1, Gemini, or Perplexity), GPT-4 Visual, options to select only scientific papers as sources, and over 300 queries/day
- the free access does not give access to Co-Pilot and only gives you access to Perplexity LLM which does not appear to give as good a response as local Mistral 7B
- presumably this uses the technology processes outlined in https://www.youtube.com/watch?v=IbOoEJ9N2z8
- Perplexity AI LLM answer engine was built in 6 months at a cost of under $US4m utilising Megatron LM and open source Ray via Anyscale and the default LLM is their fine tuned version of GPT 3.5 as it is 4x faster and 4x cheaper than GPT-4 and almost as good - see Perplexity AI's CEO talk

Perplexity.ai - has the added option of up to date web search data via its “online” versions of its models
LiteLLM - standardises API calling input/output format to access 100+ LLMs
Groq - uses the Groq very fast inference hardware

¹⁾