it:ai_history
Table of Contents
timeline of artificial intelligence development and related issues
see also:
- Amara's Law:
- We are prone to overestimating the impact of new technologies in the short term but underestimating their profound effects in the long run.
Timeline
early theory
- Alan Turing's theory of computation, which suggested that a machine, by shuffling symbols as simple as “0” and “1”, could simulate any conceivable form of mathematical reasoning and invented the Turing machine in 1936 and which mathematically models a machine that mechanically operates on a tape divided into cells each of which contains a symbol
- Turing Test was proposed by Alan Turing in a paper published in 1950. It has become a fundamental motivator in the theory and development of artificial Intelligence (AI).
- computer science as a field separates from mathematics c1960 with its own conferences.
- 1925: Lenz-Ising recurrent architecture with neuron-like elements was published in 1925
- 1965: Ivakhnenko & Lapa of Ukraine first working algorithm for deep learning of internal representations
- 1967-68: Shun-Ichi Amari work on on learning internal representations in deep NNs end-to-end through stochastic gradient descent (SGD) 1)
- “At the end of the 1960s, some discouraging theoretical results caused many researchers to suspect that these neural networks would never be of any real use.“
- 1972: Shun-Ichi Amari made the Lenz-Ising recurrent architecture adaptive such that it could learn to associate input patterns with output patterns by changing its connection weights and this would later be called the Hopfield Network without recognition of Amari.
1980s
- John Hopfield published his first paper in neuroscience in 1982, titled “Neural networks and physical systems with emergent collective computational abilities” introducing his Hopfield Network (although based on Amari's earlier work), a type of artificial network that can serve as a content-addressable memory, made of binary neurons that can be 'on' or 'off' for which he was given the 2024 Nobel Prize in Physics.2)
- Geoffrey Hinton University of Toronto, cognitive psychology and computer science. He is widely regarded as the “Godfather of AI”, famous for his work on artificial neural networks & deep learning. Popularized and improved Backpropagation; co-invented Boltzmann Machines in 1985; He later worked at Google 2013-2023 on Google Translate, CNNs; He also was given the 2024 Nobel Prize in Physics
1991
- Guido van Rossum began working on Python programming language in the late 1980s as a successor to the ABC programming language and first released it in 1991 as Python 0.9.0
1995
- Python library extended to include a library for the array type for numerical computing called Numeric
1997
- IBM's Deep Blue supercomputer beats chess grandmaster Gary Kasparov using brute force by calculating every potential move in each of their games, and then following the sequence of moves that led to victory.
- Jack Ma, Alibaba's founder, and Jerry Yang, Yahoo's co-founder, first met in 1997 at the Great Wall, where Ma served as Yang's translator. This meeting set the stage for Yahoo's $1 billion investment in Alibaba in 2005.
2000
- Python 2.0 released with many major new features such as list comprehensions, cycle-detecting garbage collection, reference counting, and Unicode support.
2001
- SciPy open source library for Python
- initial release of work on development of IPython shell which would eventually create the Jupyter Notebooks spin off in 2014 for running Python script
2003
- MatPlotLib library for Python released to add visual charting functionality which proved to be an important visual aid for programming Python for machine learning
2005
- Yahoo invests $US1b in Alibaba
2006
- nVidia develops their “Tesla” GPU micro-architecture, Nvidia's first micro-architecture to implement unified shaders
- NumPy open source library for Python (developed by adding features of the competing Numarray to Numeric which was created in 1995)
- Google launches Google Translate as a statistical machine translation service (this uses predictive algorithms to translate text, and thus had poor grammatical accuracy), it used United Nations and European Parliament documents and transcripts to gather linguistic data and translates a language to English first then to the target language. Android version came in 2010 and an iOS version in 2011. In 2014, Google acquired Word Lens to improve the quality of visual and voice translation. In 2016, Google transitioned its translating method to a system called neural machine translation using deep learning to translate whole sentences at a time for better grammar.3)
2007
- nVidia develops 1st CUDA GPU (v1.0 G80 GPU, GeForce 8800 series) and CUDA SDK, based on their “Tesla” GPU micro-architecture
- the CUDA technology allowed machine learning programmers far easier access to the GPU which was far better at processing parallel calculations required in neural networks than was the CPU
- scikit-learn library for Python released (although 1st public release was in 2010 in beta) as a toolkit for SciPy and features classification, regression and clustering algorithms including support-vector machines, random forests, gradient boosting, k-means and DBSCAN, while integrating well with many other Python libraries, such as Matplotlib and plotly for plotting, NumPy for array vectorization, Pandas dataframes, SciPy - all important components of machine learning in Python
2008
- Pandas library for Python released to allow improved data analysis utilising tabular data as “DataFrames”
- Python 3.0 released
2010
- nVidia develops v2 CUDA SDK and their “Fermi” GPU micro-architecture with GPUs having 3b transistors, each Streaming Multiprocessor (SM) having 32 single-precision CUDA cores and implements the new IEEE 754-2008 floating-point standard - GF100 GPUs, GeForce GTX 465-590, Quadro 4000, 5000, 6000 GPU boards
- SRI International Artificial Intelligence Center releases Siri as an iOS app and is bought by Apple 2 months later. Its speech recognition engine was provided by Nuance Communications, and it uses advanced machine learning technologies to function. It supports a wide range of user commands, including performing phone actions, checking basic information, scheduling events and reminders, handling device settings, searching the Internet, navigating areas, finding information on entertainment, and being able to engage with iOS-integrated apps. Apple integrated Siri into iPhone 4S in Oct 2011.
- DeepMind AI research lab founded in the UK by atheist philosopher, theologist, entrepreneur Mustafa Suleyman, neuroscientist Demis Hassabis, and Shane Legg. Later acquired by Google in 2014.
2012
- GPUs evolve into highly parallel multi-core systems allowing efficient manipulation of large blocks of data and become more effective for this purpose than CPUs
- nVidia develops AlexNet CUDA-based deep learning machine
- a team including Geoffrey Hinton, Ilya Sutskever, and Alex Krizhevsky used multiple GPUs to train a neural network called AlexNet which they showed was much better than competing options for computer vision and is regarded as a key turning point in machine learning showing how GPUs can substantively speed up training of neural networks with larger amounts of data and sparked the tech revolution of AI
- nVidia develops v3 CUDA SDK and their “Kepler” GPU micro-architecture with GPUs having new, more power efficient Streaming Multiprocessor Architecture called “SMX” each Streaming Multiprocessor (SM) having 32 single-precision CUDA cores and adds GPU boost, support for DirectX 3D v11.0, Dynamic Parallelism, GPUDirect (a capability that enables GPUs within a single computer, or GPUs in different servers located across a network, to directly exchange data without needing to go to CPU/system memory), PCI Express 3.0 interface. GK104-107 GPUs, various GeForce GTX boards, Quadro K series boards;
- Fernando Pérez received the Free Software Foundation Award for the Advancement of Free Software for his work on IPython, the precursor to Project Jupyter
2013
- Google team publishes Word2Vec NLP algorithm which represents each word by a carefully chosen vector that captures semantic and syntactical qualities of words (this was the algorithmic breakthrough that led to transformers and ChatGPT LLMs, etc)
2014
- nVidia develops v5 CUDA SDK and their “Maxwell” GPU micro-architecture with GPUs; provides native shared memory atomic operations for 32-bit integers and native shared memory 32-bit and 64-bit compare-and-swap (CAS), which can be used to implement other atomic functions; GM series GPUs; various GeForce GTX boards, Quadro M series boards;
- Jupyter Notebooks for running Python code
- Google acquires Deep Mind which had developed AlphaZero, a chess playing algorithm which became the best chess player in the world and then would become the best at playing Go.
2015
- reverse diffusion for AI image generation proposed in a paper published 4)
- this would become the basis for most AI image generation in 2023
- Google releases 1st version of TensorFlow as open source software library to manipulate tensors for machine learning and AI using Python and other languages
- Keras library for Python released by Google engineer as a front end for TensorFlow
- OpenAI founded initially as not for profit company
2016
- nVidia develops v6 CUDA SDK and their “Pascal” GPU micro-architecture with GPUs; GV series of GPUs; TITAN V boards, Quadro GV100 board;
- 1st nVidia DGX AI super computer - delivered to OpenAI
- Vulkan 3D graphics and computing API begins to replace older OpenGL and DirectX APIs for GPUs at it offered lower overhead, more direct control over the GPU, and lower CPU usage - Microsoft would use this in Direct3D 12 for Windows OS;
- neural network AI comes of age - the birth of modern AI
- DeepMind's AI game-playing model Alpha Go defeats world champion 4000 year old board game called Go player Lee Sedol over a 7 day competition of man vs machine in March 2016
- this changed the attitude towards the capabilities of AI as mastering Go was seen by many at the time as the Holy Grail of AI as using brute force calculations, while working for Chess, does not work for Go which has far more potential moves than Chess
- it had to mimic the human quality of intuition as it would have to know which potential moves to discard, and which to consider, without considering every possible move. AI had never done this before as most had relied upon rules-based programming whereas Alpha Go used neural networks to learn, and teach itself how to play by being trained on completed games of Go, and this changed everything.
- “On the 37th move of game two, AlphaGo played a move so unlikely that no human in a million years would have thought of it … as the match progressed, it proved to be a masterstroke … it hadn't learned this move from watching humans play Go, but dreamed it up itself, in the labyrinth of its neural network” - Move 37 has become a symbol of machine creativity, commemorated on mugs and t-shirts.
- May:
- Google announced its Tensor processing unit (TPU), an application-specific integrated circuit (ASIC, a hardware chip) built specifically for machine learning and tailored for TensorFlow.
- a programmable AI accelerator designed to provide high throughput of low-precision 8-bit arithmetic, and oriented toward using or running models rather than training them
2017
- Rain Neuromorphics is founded with aims to produce analog neuromorphic chips (NPUs) which are purported to be likely to be 10,000x more energy efficient than digital GPUs - Sam Altman becomes a seed financier in 2018.
- Feb:
- TensorFlow v1 released by Google Brain
- May:
- Google's 2nd gen TPUs and availability of the TPUs in Google Compute Engine cloud service
- nVidia develops v7 CUDA SDK and their “Volta” GPU micro-architecture with GPUs; NVIDIA's first chip to feature Tensor Cores, specially designed cores that have superior deep learning performance over regular CUDA cores; GP series of GPUs; various GeForce GTX boards, Quadro P series boards;
- last version of OpenGL GPU API (as now replaced by Vulkan)
- AI LLM development revolutionised with the introduction of “transformer models” developed as a result of the 2017 paper “Attention is all you need”
- a “Standard Model of the Mind” was proposed to help unify human-like cognitive AI based upon concepts within cognitive AI architectures such as ACT-R, Soar, and Sigma 5)
- Sept:
- Meta and Microsoft create the Open Neural Network Exchange (ONNX) project for converting AI models between frameworks
- Apple releases iOS 17 which updated Siri's voice and added support for follow-up questions, language translation, and additional third-party actions.
2018
- Mar:
- Meta's PyTorch and Convolutional Architecture for Fast Feature Embedding (Caffe2) merged into Pytorch which can run on either CPU or CUDA GPU and contributed to the rapid evolution of AI using Python and eventually overtaking Google's TensorFlow technology
- Python now fully matured into a highly capable platform for machine learning and AI development - if you have the GPU power to do so
- Google's team publishes Universal Sentence Encoder 6)
- nVidia develops v7.5 CUDA SDK and their “Turing” GPU micro-architecture with GPUs; 1024-4608 CUDA cores; 16-72 SMs; TU series of GPUs; various GeForce RTX boards, Quadro RTX and T series boards;
- 1st consumer products capable of real-time ray tracing (RT cores - nil to 72 depending on GPU version)
- nil to 576 Tensor AI cores (depending on GPU version) for large matrix calculations and Deep Learning Super Sampling (DLSS)
- NVLink Bridge with VRAM stacking pooling memory from multiple cards
- memory controller with GDDR6/HBM2 support
- 75-280W thermal design power GPUs;
2019
- Microsoft provides OpenAI LP with a $1 billion investment
- OpenAI develops GPT-2
- OpenAI signs a non-binding agreement with Rain Neuromorphics to purchase $US51m of their analog neuromorphic chips (use voltage and currents instead of 0/1 values) when they become available. Sam Altman had previously personally invested $US1m in Rain.
- Sept:
- TensorFlow v2.0 released by Google Brain
- nVidia's GPU tech shows a 30x improvement in compute performance from 5 yrs earlier
2020
- nVidia develops v8 CUDA SDK and their “Ampere” GPU micro-architecture for GPUs;PCI Express 4.0 ; 1st 64bit floating point tensor cores; GA series of GPUs; various GeForce RTX boards, Quadro RTX A series boards; nVidia A100 80GB GPU;
- training of LLMs such as ChatGPT become possible thanks to the A100 combined with the powerful Python libraries and access to massive data on the web such as wikipedia and online books
- nVidia publishes Megatron-LM: training Multi-Billion Parameter Language Models Using Model Parallelism 7)
2021
- LoRA (Low-Rank Adaptation of Large Language Models) proposed as a fine tuning method for LLM and image generation
- freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks
- compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times
- becomes popular in 2023 for end users with Stable Diffusion XL by applying minute changes to the most critical part of Stable Diffusion models — the cross-attention layers
- Anthropic, a competitor to OpenAI is founded by former OpenAI employees
- Rain Neuromorphics demos a brain-inspired analog computer chip architecture that employs a 3D array of randomly connected memristors (each chip will have the same random pattern though) to compute neural network training and inference at extremely low power. Current analog chips are not compatible with AI training as they are incompatible with back-propagation and thus the plan is to build these new chips and invent a new training algorithm. In Dec 2023, the US agencies stopped Middle East funding of these chips on concerns China will get access to the tech.
2022
- OpenAI's releases Chat GTP-3 LLM
- Google's Med-PaLM - medicine optimised LLM - 1st LLM to exceed the 60% “passing” score on US Medical Licensing Exams (USMLE)
- finance optimised LLMs: BLOOM, ChatGTP-NeoX, Opt 66
- Inflection AI company founded by entrepreneurs Reid Hoffman (helped start LinkedIn), Mustafa Suleyman (a founder of DeepMind) and Karén Simonyan (DeepMind's chief scientist) and initial product is a iOS chatbot, Pi, named for “personal intelligence,”
- Mar:
- nVidia develops v9 CUDA SDK for their “Hopper” GPU micro-architecture for GH100 GPUs for use in their nVidia H100 data centre
- Google DeepMind's Chinchilla LLM - 70b parameters; 80 layers, 1.5-3million batch size;
- Aug: Google DeepMind team's 12 step Alberta Plan for developing AGI published
- Sept:
- nVidia develops v8.9 CUDA SDK and their “Ada Lovelace” GPU micro-architecture for GPUs; 128 CUDA cores and 4 tensor cores per SM; 4th-generation Tensor Cores with FP8, FP16, bfloat16, TensorFloat-32 (TF32) and sparsity acceleration; 24-144 SMs; 18.9-76.3b transistors; AD series of GPUs; various GeForce RTX boards, RTX 6000 Ada, RTX 4000 SFF boards; L4, L40 datacentres;
- Meta allows PyTorch to be governed by PyTorch Foundation, a newly created independent organization – a subsidiary of Linux Foundation
- Oct 2022:
- LangChain released as an open source Python library
- Nov 2022:
- paper published: “MegaBlocks: Efficient sparse training with Mixture-Of-Experts” (MoE)
- Dec:
- OpenAI's ChatGTP3.5 released for public to play with while they fine tune it
- Perplexity AI decides to prioritise development their side-product (instead of their main project of text2SQL) which was initially a internal slack-bot and with GPT 3.5 released, they realised there was potential in creating an LLM-powered answer engine
2023
- Microsoft provided OpenAI LP with a further $10 billion investment
- AI voice cloning now only needs 30secs of a voice to clone it for text to speech applications
- AI video person cloning tools readily available to make creation of deep fake videos very easy
- Neuralink: monkeys can control a computer by thought using an embedded chip
- Melbourne team plans to grow around 800,000 brain cells living in a dish, which are then “taught” to perform goal-directed tasks.
- Ukraine war has become a test environment for AI war systems and is the 1st battlefield use of autonomous killer drones
- advances in text2Image AI image generators:
- Stable Diffusion XL (SD-XL) v1.0 1024×1024 outputs much improved over v0.5
- needs at least 32Gb RAM plus 8GB VRAM on the nVidia CUDA compatible GPU to use, and as least 12Gb VRAM to train
- MidJourney
- Google's SynthiD invisible watermark technology to help identify AI-generated imagery
- China bans AI-generated images devoid of watermarks
- Feb 2023:
- OpenAI released a commercialised API service gpt-3.5-turbo
- Microsoft incorporates GPT-4 tech as “Prometheus” into its Bing search engine
- Meta announces Llama
- Mar 2023:
- OpenAi ChatGTP4 released but there is a waitlist for access
- PyTorch v2.0 released
- April 2023:
- Alibaba releases its LLM Tongyi Qianwen on the company’s smart speakers (similar concept to chatGPT on Alexa)
- LangChain incorporates and quickly raises $US30m in seed funding
- May 2023:
- a joint signed statement by AI inventors advised that the threat of human extinction due to AI should be a global priority
- Google invests more than $US300 million in Anthropic
- PrivateGTP 1st version released on GitHub
- allows use of LLMs offline in a private manner
- June 2023:
- Microsoft's ORCA Progressive Learning from Complex Explanation Traces of GPT-4 (pdf) - AI learning from AI
- Illumina unveils AI software (PrimateAI-3D) to predict disease causing genetic mutations in patients 8)
- Illumina has over 80% of global share of the genome sequencing market
- 1yr old company behind the Pi chatbot, Inflection AI, raises $US1.3b
- deep fake face swap to create video files see https://github.com/hassan-sd/roop-unlocked
- July 2023:
- probability of doom from AI: https://www.abc.net.au/news/2023-07-15/whats-your-pdoom-ai-researchers-worry-catastrophe/102591340
- Meta released several models as Llama 2, using 7, 13 and 70 billion parameters
- uses 4K context length tokens and foundational models were trained on a curated data set with 2 trillion tokens with batch size of 64
- pre-training utilized a cumulative 3.3million GPU hours of computation on hardware of type A100-80GB (TDP of 350-400W)
- Anthropic releases v2 of its Claude LLM
- Google releases (only for adults within the US) NotebookLM cloud-based personal document GPT using a RAG allowing users to upload 20docs with up to 200,000 word context as source docs. Has some similarities to PerplexityAI.
- accurate real time deep fake face swap in live video calls - see https://github.com/iperov/DeepFaceLive
- Aug 2023:
- another face swapper for videos - https://github.com/facefusion/facefusion
- nVidia Siggraph 2023 Keynote speech by nVidia CEO announces:
- nVidia AI Workbench - integrative environment for creation of AI
- nVidia Omniverse - utilises OpenUSD to connect USD tools together and create virtual simulated physical world environments with AI and allows rapid AI development via ChatUSD
- nVidia GH200 GPUs
- nVidia L40S GPUs for workstations
- UC Berkeley publishes paper on “Blockwise Parallel Transformer (BPT)” to allow much larger context token frame at lower memory cost
- multimodal AI was introduced incorporating visual imagery, sound and LLMs - each being tokenized then embedded (linked)
- Microsoft releases LlaVa (Large Language and Vision Assistant)
- ChatGTP4Vision
- can be used as an AI agent to iteratively improve StableDiffusion image generation
- allows analysis of images as part of the prompts
- Perplexity AI's LLM answer engine starts to mature
- Sept 2023:
- Massachusetts senators proposing a bill to ban manufacture and sale of such in that state where robot manufacturers Boston Dynamics, iRobot, and MassRobotics are all based
- Amazon Invests Up to $US4 Billion in OpenAI Competitor Anthropic, making Amazon Web Services (AWS) the primary cloud provider for Anthropic and “will make Anthropic’s safe and steerable AI widely accessible to AWS customers.” Amazon and Anthropic will also work together using Amazon Bedrock, a service that builds generative AI applications.
- MS releases DALL-E3 image generation combined with ChatGPT linked to many of its software apps including Win11, Edge, Bing, Paint, Office 365.
- Oracle CloudWorld 2023:9)
- Oracle's Cloud uses RDMA which shares memory and is thus much faster than other cloud services and thus much cheaper to utilise to train LLMs hence nVidia, Cohere, Elon Musks xAI and new start ups are using Oracle Cloud for training AI models
- Oracle is working with nVidia and is building an nVidia H100 supercluster, the largest scientific computer ever built
- 512 to 16,000 GPUs, each GPU has 200Gb/s RDMA connection with only a few microseconds latency between nodes and a total bandwidth of 102,400-3,200,000Gb/s
- Apex no code application generator system
- Oracle no longer will use Java to create new apps
- no security bugs
- stateless so fault tolerant
- eg. Cerner New Millennium generated by Apex
- improved Autonomous Elastic Database
- self-driving database, does not need a DBA, installs itself, configures itself, updates itself, tunes itself automatically
- no human intervention means no security errors due to human error
- when you need more processors for a task it is elastic and able to utilise them and when finished you put them back into the pool
- eg. new Cerner HealtheIntent
- new SQL Object Relational Database
- generates Relational Schema from JSON documents
- new Oracle Vector Database
- plans to add semantic search capabilities using AI vectors to Oracle Database 23c
- used for specialising generative LLMs for law, medicine, etc
- private training data remains private
- eg. utilise de-identified Cerner EHR data to specialise a foundation LLM into a medical model
- new Cloud Data Intelligence Platform
- Oracle Analytics + generative AI
- a re-write of Cerner HealtheIntent to become Oracle Public Health Data Intelligence System / Platform
- population scale data to train medical AI models
- unifies data including lab data from different EHRs such as Epic, Cerner etc into a single Autonomous Elastic Database which will provide 1000x more data to train models
- new Oracle Cloud IoT Automated Data Capture
- sensors and robots
- new Cerner CareAware patient monitoring
- patient image capture and storage
- genomic data
- smart watch data, etc
- Fusion Supply Chain - Cerner lab robotics, pharmacy, surgical inventories using RFID tags
- planning on automating the lab results going straight to the EHR, the Dr and patient's smart phone from the lab robotics
- new Orace First Responder System
- satellite and 5G terrestrial fail-safe audio-visual mobile network
- MultiCloud announcement:
- Microsoft Azure Cloud and Oracle Cloud are OPEN systems, not walled gardens
- Oracle Cloud hardware and software is planned to be installed in Azure Datacentres, microsecond connection from Oracle Database to Azure Services, provision Oracle Database Services from Azure Portal
- nVidia is rumored to be acquiring Illumina - the global leader in genomics
- cost of sequencing a genome has dropped from $100m in 2001, to under $10K in 2011, and well under $1K in 2021 however, this dramatic efficiency gains in sequencing is yet to be matched by ability to analyze the data - and this is where AI comes in - genomic projects will exceed 40 Exabytes in the next decade up from 4 Exabytes in 2021
- paper published: OpenCog Hyperon [GP21] framework for AGI at the human level and beyond using decentralized computer networks and a new programming language, MeTTA.10)
- Oct 2023:
- UAE-based company, G42, is planning on using Cerebras' new 7nm 300mm wafer chip 1.5MW, 4exaflop 850,000 core WSE-2 supercomputer (each can process some 4 billion parameters and is some 200x faster than a nVidia A100 GPU system) to develop medical AGI using their vast medical databases
- Google's Med-PaLM 2 - scores 85% on US Medical Licensing Exams (USMLE); consumers preferred Med-PaLM 2 responses over physician responses across 8 of 9 evaluation axes
- PaLM-E embodied generalist LLM for robotics
- FreshLLM paper published “Refreshing Large Language Models with Search Engine Augmentation” to allow LLMs to be continually updated with the latest web information 11) - this inspired Perplexity to create its “online” LLM versions
- DALL-E 3 text2Image plus allows ChatGTP prompts as inputs
- Microsoft is embedding DALL-E3 in its new “CoPilot” AI app which can utilise DALL-E3 from with MS Paint, Windows 11, MS 365, Edge etc.
- “Browse with Bing” feature lets ChatGPT access up-to-date information, rather than being limited to the training data that was cut off before September 2021.
- Microsoft's paper: Self-Taught OPtimizer (STOP) - recursively self-improving code generation using GPT-4 as proof of concept and risk of AI escaping its restrictive sandbox
- paper: memGPT - Towards LLMs as operating systems - teaching LLMs to manage their memory - ” the limited context windows of modern LLMs severely handicaps their performance: document analysis, where MemGPT is able to analyze large documents that far exceed the underlying LLM's context window, and multi-session chat, where MemGPT can create conversational agents that remember, reflect, and evolve dynamically through long-term interactions with their users. “ 12)
- nVidia's Eureka - AI agent that can train robots better than humans via creating optimised reward systems
- Elon Musk predicts AGI will be achieved by 2029 +/- 1 yr
- Mistral 7b Instruct LLM released for download (4.4Gb), optimised for chat and outperforms LLaMA 2 13b and LLaMA 1 34b on many measures 13)
- US has banned export of nVidia A100 high end GPUs (in addition to the ban on export of nVidia H100's) to China
- IBM created an analog computer chip to be more efficient at AI calculations
- Chinese make 1st memristor AI chip “75x” more efficient for AI and “mimics the energy efficient approach of the brain” hence developed “Computing-in-Memory” (CIM) with close integration of computing and memory on the one chip - memristor remembers how much current passed though it by changing its resistance according to the prior current so it can store information without needing ongoing power and power consumption is thus only “3%” of current computer chips
- Microsoft agrees to help Australia build a “cyber-shield” to fend off global online threats
- Oct 30: US Pres. Biden issues Executive Order on the Safe, Secure, and Trustworthy Development and Use of AI
- 20,000 word document also includes:
- protecting civil rights, privacy, consumers, workers and other groups from potential harms of irresponsible AI use through non-discrimination, transparency, and accountability measures
- encouraging coordinated AI use within and between govt agencies, and advancing international collaboration on AI standards, risk management and shared priorities
- new entities created: White House AI Council; Chief AI Officer; Internal AI Governance Boards with major govt agencies, Health and Human Services Sector AI Task Force, AI Safety and Security Board advisory committee - weaponisation, AI and Technology Talent Task Force, Research Cordination Network on Privacy Enhancing Technologies.
- Alibaba releases version 2 of its LLM Tongyi Qianwen 2.0 with eight industry-specific versions that can be used in the entertainment, finance, legal, and healthcare sectors
- HuggingFace H4 creates Zephyr 7b LLM - 7b parameters trained with Direct Preference Optimization (DPO) which fine tunes models based on human preferences rather than labels or rewards and used other LLM's to rank the outputs then they used a Reinforcement Learning Algorithm to optimise the model based on these rankings and thus avoiding human feedback in training which is costly - seems it may be better than ChatGPT, but tends to be biased and inconsistent
- Nov 2023:
- UK hosts world's 1st AI safety summit to consider the risks of AI
- UK BioBank releases whole genomes of 500,000 people for research
- UC Berkeley publishes paper on “Ring Attention with Blockwise Transformers” to further enhance memory performance of their Aug 2023 paper on Blockwise Transformers to allow long context tokens to be used efficiently 14)
- OpenAI's 1st Developer event:
- ChatGPT now has 100 million weekly active users and 2 million developers and 92% of Fortune 500 companies use ChatGPT services
- announced GPT-4 Turbo:
- 2 versions - text only and text/image version
- cheaper: 1cent/1000 input tokens 3cents/1000 output tokens and 0.765cents to process a 1080×1080 pixel image
- context window increased to 128K tokens (~100,000 words) - 4x more than GPT-4
- knowledge cut off April 2023 instead of Sept 2021
- users can now create their own customised personalised AI assistant versions of ChatGPT without any coding - these can be shared on a soon to be released GPT Store and even monetized based on how many use your version
- new Assistants API for developers to allow developers to embed AI inside their applications
- Code Interpreter - writes and runs Python code in a sandboxed execution environment, can run code iteratively to solve difficult coding problems
- Retrieval tool - sifts through data such as proprietary user data
- Function Calling tool - enables invoking of your defined functions
- released DALLE-3 API (txt2image mdel) - cheaper (4c/image generated) and can use more complex prompts and have text included in the output images
- new txt2speech (TTS) Audio API - high quality natural sounding voices from text with 6 inbuilt voices, costing 1.5c/1000 input characters
- Elon Musk announces xAI Grok-1 LLM in early beta form as only 2 months of training the model
- designed to have humor and sarcasm and has real time knowledge of the world via the X platform (formerly Twitter)
- seems to outperform ChatGTP 3.5 on exams but not ChatGTP 4 … yet, nor is it multi-modal yet
- 25,000 character context window
- to access it you are likely to need to have a Twitter / X account
- GitHub Universe CEO keynote speech:
- GitHub CoPilot AI used by over 37,000 organisations and improves coding speed by 55% and now uses GPT-4
- Hu.ma.ne AI Pin wearable AI device announced which may herald the end of smartphone apps:
- small wearable screenless device that gives the user 24×7 ready access to AI which will determine which apps to access in the back end including live translation, playing music, accessing web information
- features include voice, touch, gesture interactions, and a projected laser ink display, without the need for wake words so it is not always listening in as with Siri, but uses touch, etc to wake
- supports real-world shopping, providing information and transaction options for products
- perpetual power via hot swap-able magnetic battery modules
- captures photos and videos, stores data, and offers recommendations based on user preferences
- managed through the Humane Dot Center, it promises a simplified interaction with data
- $US699 for device and gives unlimited queries, talk, text, and data for $US24 a month via T-mobile
- OpenAI announces just started training GTP-5 using nVidia's H100 chips
- Microsoft announces it will start making its own chips to power Azure, Copilot, and ChatGPT. Maia 100 chips will be used to train models and reduce reliance upon scarce nVidia H100 GPU chips. It is also building Cobalt 100 ARM chips to run Azure and will “power the largest AI supercomputers in the world”. Azure needs 30,000 GPUs to power ChatGPT, and they cost roughly $US25,000 a chip. Azure will host Nvidia’s new generative AI model builder for enterprises, expanding its partnership with Nvidia.
- Forbes reports that a Russian spyware company called Social Links had begun using ChatGPT to conduct sentiment analysis. Social Links showed off its unconventional use of ChatGPT at a security conference in Paris. In a demonstration, the company fed data collected by its own proprietary tool into ChatGPT; the data, which related to online posts about a recent controversy in Spain, was then analyzed by the chatbot, which rated them “as positive, negative or neutral, displaying the results in an interactive graph,” this raises concerns of how AI could escalate the powers of the surveillance industry in general When AI moves into the territory of determining if somebody gets a job, gets housing, or determines whether someone gets undue attention from police or not, that is when those biases (and AI error rates) become, not just a thing to account for, but a reason not to use it in that way. 15)
- a new US lawsuit claims that UnitedHealthcare is using a deeply flawed AI algorithm to “override” doctors judgements when it comes to patients, thus allowing the insurance giant to deny old and ailing patients coverage. 16)
- several weeks after the UK hosted the global AI Summit, UK Govt's minister for AI and intellectual property, Jonathan Camrose, says the UK government will not rush to pass new laws that regulate AI, to avoid hampering innovation and potential financial growth. 17)
- OpenAI begins to implode
- OpenAI's 4 member board (Ilya Sutskever, non-employees Adam D’Angelo (CEO of Quora which runs Poe - a new monetizable customisable GPT service he announced in Oct 2023 which may be threatened by OpenAIs Nov 6 planned offerings and GPT monetorization store), Tasha McCauley, and Helen Toner) suddenly fire CEO Sam Altman allegedly on grounds he had not been candid with the Board - concerns of his Nov 6th announcement to create a consumer platform of monetizable customizable versions of ChatGPT, and perhaps that he was chasing profit and taking too many risks with premature AGI development whereas the Board has an apparent not-for-profit focus and wants to focus on safe AGI development instead of rushed development for profit.
- Greg Brockman and dozens of OpenAI staffers resign and together with Sam Altman become employees of Microsoft to lead a new advanced AI research team.
- Microsoft, which has pumped $16b into OpenAI and runs OpenAi's model on their Azure servers while themselves building upon ChatGTP, say they will also continue their partnership with OpenAI.
- OpenAI's board appoints Emmet Shear, formerly of Twitch, as CEO. Shear is known for being an AI optimist, is against premature regulation of AI, yet fears his estimate of a 5-50% probability of AI-induced doom to humanity.
- According to Reuters on 22nd Nov, it seems several staff researchers wrote to the Board prior to their dismissal of San Altman raising warnings of a powerful AI discovery that they said could threaten humanity. Seems this relates to Project Q* (which presumably relates to Q-learning techniques and tree of thought search for solving hard maths problems)
- The future of OpenAI seemed uncertain esp. as OpenAI employees have threatened to resign en masse with more than 735 out of 770 employees signing a letter demanding that the board resign and Altman be reinstated as CEO otherwise they will go to Microsoft. OpenAI has an agreement in principle for the return of Sam Altman only 4 days after his sacking with new initial board includes Bret Taylor as chair, alongside Larry Summers and Adam D’Angelo. It now appears that the Board's actions were driven by Board member Ms Helen Toner who had recently written a paper criticizing OpenAI for not prioritising safety as much as Anthropic were (Sam A had taken her to task on this prior to his sacking) and she is on the record for stating that if its not good for humanity, then OpenAI should be destroyed. She is also part of the anti-AI, Effective Altruism (EA) club.
- Discord announces in December 2023 it will shut down its OpenAI-based chatbot, Clyde (one of Discord’s many AI-powered bots) which was announced in March 2023 as an experimental feature.
- Uni of Cambridge, UK, announces by using their new concept 3D “spatially embedded recurrent neural networks” (seRNNs) they have created a self-organizing, artificially intelligent system that uses the same tricks as the human brain to solve specific tasks. The seRNNs converged on structural and functional features that are also commonly found in primate cerebral cortices. 18)
- Microsoft publishes their ORCA-2 “Teaching small language models how to reason” using various techniques: step-by-step, recall then generate, recall-reason-generate, direct answer, etc, trained on a expanded, highly tailored synthetic dataset, and even though only 7b and 13b parameter versions, based upon LLaMA-2 7b and 13b versions, it attains performance levels similar to or better than those of models 5-10x larger as assessed on complex tasks that test advanced reasoning abilities in zero-shot settings.
- Inflection AI announces multimodal model Inflection-2 trained on 5000 H100 GPUs taking 6mths and costing $US100m (cheaper than GPT4) and outperforms Googles PaLM-2
- Amazon announces their multimodal AI Image Generator Titan aimed at enterprise customers for enhancing product photos by seamlessly swapping out backgrounds to generate lifestyle images
- each image made with Titan will include an invisible watermark;
- Amazon will indemnify customers for copyright infringement;
- PrivateGPT v2.0 released based upon Mistral 7b Instruct LLM version (4.4Gb to download model) which will run on a powerful laptop
- Google DeepMind GraphCast can forecast 10 day weather predictions in a minute! More accurate and much faster than any previous system such as Numerical Weather Prediction, which combined physics equations with computer algorithms that are run on supercomputers.
- Dec 2023:
- Google's DeepMind AI GNoME predicts 2.2 million new crystal structures, vastly expanding potential materials for advanced technology development
- Databricks launches new tools for building high quality RAG applications to incorporate proprietary company Delta Lake unstructured data into LLM applications
- Meta Launches a Standalone AI Image Generator which uses Emu Edit
- Google formerly announces its new Gemini LLM which will succeed PaLM 2;
- multimodal inputs;
- multilingual, having been trained on more than 100 human languages but will speak only one language at first: English
- one of 1st LLMs to score over 90% on the Massive Multitask Language Understanding benchmark (outperforming human experts) and apparently outperformed every other AI including OpenAI’s GPT-4 model in 30 out of the 32 most popular industry benchmarks
- trained on the AI accelerator chip, the Cloud TPU v5e which has built-in support for leading AI frameworks such as JAX, PyTorch and TensorFlow, along with popular open-source tools like Hugging Face’s Transformers and Accelerate, PyTorch Lightning and Ray
- an intermediate version known as Gemini Pro will begin to power Google’s free chatbot, Bard
- a cut-down version of the AI, Gemini Nano, will appear in Android phones starting with Google’s Pixel 8 Pro phone, to answer complex voice, video, photo and written questions on the phone itself, without the need for an internet connection
- the most advanced version of the AI, known as Gemini Ultra, will not be available until 2024 and is likely to be a pay-per-service Bard version
- would be used to power Google’s software-writing platform, AlphaCode, where it would be able to solve “nearly twice as many problems” as the previous AI
- unfortunately, Gemini-Pro still does not bat GPT-4 Turbo, the LLM leader
- AMD launches their new:
- AMD Instinct MI300X chip “most powerful chip for generative AI” based on new CDNA3 architecture, 17Tbps bandwidth, 2.4x more memory, results in 30-60% better performance than nVidia's H100 chip
- ROCm 6
- IBM releases first-ever 1,000-qubit quantum chip
- EU becomes the first continent to set clear rules for the use of AI, although officials provided scant details on what exactly would make it in the eventual law, which would not take effect until 2025 at the earliest
- Mistral releases Mistral Small “Mixtral” 8x7b 89Gb Mixture-of-Experts LLM model with similar benchmark performances to Gemini 2 but currently needs 4 x nVidia 80Gb VRAM A100 GPUs to run it
- Apple's research team releases a Python Numpy-like MLX machine learning framework (inspired by PyTorch, Jax, etc) where developers can build models which run efficiently on Apple laptops and deep learning model library MLX Data, designed to use shared memory and run on both CPUs and GPUs.
- Vision-Language Planning (ViLa) based upon GPT4-Vision introduced as new robotic task planning method and provides much improved zero-shot tasks and long horizon planning as well as scene-aware planning giving adaptabiity and contextual understanding as well as translating complex language instructions into steps.
- GPT-4 Video generation created with integration with stable diffusion model and LoRA- it can interpret as well as generate rich multimodal content
- Brainware uses adaptive reservoir computation of biological neural networks in a brain organoid hybrid computer of lab-grown brain organoids combined with conventional electric circuits shown to complete tasks such as voice recognition 19)
- Microsoft announces an alliance with the American Federation of Labor and Congress of Industrial Organizations (AFL-CIO) to ensure that AI serves the interests of workers
- Microsoft announces Phi-2 2.7b model
- Microsoft announces Orca 2 7b model trained on synthetic dataset by a LLM as teacher for a small LLM and is “better or comparable” to a 70B parameter Llama 2 on reasoning task
- Microsoft announces its Models as Services technology to allow users to access various models via its online GPUs
- Microsoft announces GTP-4 MedPrompt prompt engineering using vector space of medical knowledge combined with automated chain-of-thought reasoning, and ensembling refinement released (without altering GPT-4's billions of parameters) and betters Google's Gemini Ultra and specialised medical models on medical MCQ exams20)
- Google announces its MedLM healthcare focused LLM and Augmenix app which converts physician-patient conversations into EHR medical documentation in real time
- Google announces its Imagen 2 txt2image technology built on latest diffusion tech with enhanced prompt understanding, improved image-text relationships to provide more contextually relevant images, and creates more realistic human images especially hands and includes an aesthetic model taking into account lighting, etc for better visual appeal, and adds inpainting and outpainting editing, while uses can use flexible style and adds synthID invisible watermarks and safeguards to prevent violent or explicit image generation.
- Kepler K1 humanoid robot
- Tesla announces its Optimus Gen-2 AI autonomous humanoid robot as 16DoF hands/fingers which can move eggs delicately (doesn't move as fast as Boston Dynamics' Atlas but the Atlas does not have hands and fingers)
- OpenAI's Preparedness Team publishes their catastrophic AI risk mitigation framework:
- 1. tracking catastrophic risk level via evaluations according to various categories such as:
- cybersecurity
- chemical, biologic, nuclear and radiological (CBRN) threats
- persuasion
- model autonomy
- 2. seeking out unknown-unknowns
- 3. establishing safety baselines
- 4. tasking the Preparedness Team with on-the-ground work - research, monitoring, evaluations
- 5. creating a cross-functional advisory body including creating a Safety Advisory Group (SAG)
- International Monetary Fund publishes “Scenario planning for an AGI future” 21)
- concept of “Frontier of Automation” - humans have a maximal limit of ability for complex task completion and it is probable that AI electro-mechanical device machines will reach this within 5-20 years and surpass the ability of the far majority of humans in cognitive tasks and will be faster and cheaper than humans
- humans become increasingly productive with usage of AI agents (and with potential real wage increases) UNTIL humans are no longer needed for this productivity and this is when human productivity is rapidly outpaced by AI productivity at lower costs and thus human wages and employment could rapidly fall and be replaced by more reliable, faster, cheaper AI machines
- a new synaptic transistor achieves concurrent memory and information processing functionality to more faithfully mimic the brain, is stable at room temperatures, operates at fast speeds, consumes very little energy and retains stored information even when power is removed. This theoretically removes the need to continuously move data in and out of memory to a CPU. 22)
- Adlink announces their optimised for AI, cExpress-MTL battery powered computer module which integrates CPU, GPU, and NPU all-in-one for optimized performance and efficiency, the module provides up to 8 GPU Xe-cores (128 EUs), an NPU (11pTOPS/8.2eTOPS), and 14 CPU cores at 28W TDP. It thus empowers developers to achieve various graphics- and AI-requiring, battery-powered applications, such as portable medical ultrasound, industrial automation, autonomous driving, and AI robots.
- paper published: Mamba: Linear-Time Sequence Modeling with Selective State Spaces 23) as a competitor of transformer technology in AI models, esp. for longer context needs such as DNA analysis or audio wave analysis - see https://www.youtube.com/watch?v=9dSkvxS2EB0
- Google's DeepMind FunSearch to leverage LLM hallucinations to explore new mathematical solutions
- Google VideoPoet zero-shot txt2Video generator
- Microsoft's TaskWeaver to generate code to perform tasks
- OpenAI's GPT Pilot extension for Visual Studio Code uses natural language prompting and guidance to create code even installing dependencies to create full stack website code
- Apple introduces a multimodal Ferret LLM which is supposedly better than GPT-4 Vision
- Apple leaks it is close to releasing on Apple GPT (work started on this in 2022) to enhance Siri's capabilities
- 1B Tiny LlaMA released - trained on 3 trillion tokens taking 90 days on 16 x A100 40Gb GPUs;
2024 1st half
Jan 2024
- students at Stanford Uni create a low cost mobile extremely dexterous teloperated and autonomous robot (can do amazing autonomous tasks after watching from humans - with 50 observations of humans doing a task it can achieve 90% success rates) which they demonstrated cooking 3 meals and doing housework - Google Aloha and they have made the code open source
- Google publishes “CALM = Composition to Augment Language Models” - scale LLMs on new tasks 24)
- LLaMA Pro: Progressive LLaMA with Transformer Block Expansion released as open source 25)
- Mamba MoE outperforms both Mamba and Transformer-MoE. It reaches performance of Mamba in 2.2x less training steps while preserving the inference performance gains of Mamba against the Transformer; Mamba inference is 5x faster than Transformers - it scales linearly instead of O(N2)
- U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation 26)
- DeepInfra is a startup offering open-source models at very low rates via simple API driving down costs of AI usage for end users
- Open Interpreter allows you to run code using a local LLM via LM Studio 30)
- OpenAI's custom GPT store opens
- Perplexity AI raises $US73.6M in Series B funding to have a valuation of $520M
- 30 employees; now processing 3 million prompts per day;
- built on a proprietary LLM inference infrastructure around NVIDIA's TensorRT-LLM that is served on A100 GPUs provided by AWS 31)
- “Mixtral of Experts” paper 32)
- Microsoft's AutoGen Studio 33)
- python LangGraph library which is an extension of the core Langchain library and can be used to build complex AI agents 34)
- 1x raises $100M in ser.B humanoid robots
- Swarovski Optik introduced the AX Visio 10×32 binoculars - 1st AI-enabled binoculars which can display the name of 9000 species of animals it detects in the binocular view
- Rabbit R1 AI companion device launched
- uses an OS designed around LLM and their new “Large Action Model (LAM)”
- use a separate Rabbit Hole website accessed via a computer to enable actions such as play music from Spotify, book a Uber car, etc
- bidirectional translation
- device is very simple, press button to give a voice command, has a camera to assess surroundings eg. analyse inside fridge to work out a recipe
- $US199 no subscription at this stage but requires a WiFi or 4G LTE sim card to access internet and has Bluetooth 5.0
- Google announces AMIE new medical AI LLM which asks user for more information and outperforms doctors in diagnosis and empathy
- Aust Govt announces plans to mandate safeguards for “high risk AI” such as:
- self-driving vehicle software, tools that predict the likelihood of someone re-offending, or that sift through job applications for an ideal candidate
- begin work with industry on a possible voluntary AI content label, including introducing “watermarks” to help AI content be identified by other software, such as anti-cheating tools used by universities
- Google's DeepMind team says that a new artificial intelligence system has made a major breakthrough and can solve geometry problems at the level of the very top high-school students - until now, geometry has proven particularly difficult for AI systems to work with, but they have used a different approach to build the new system known as AlphaGeometry. They used a language model that was able to train itself by synthesising millions of theorems and their proofs, and then combined that with a system that can search through branching points in challenging problems. The system is able to learn and then solve complex geometrical problems without human input. It solved 25 of 30 problems from the International Mathematical Olympiad which is close to the average for a gold medalist level of 25.9.
- nVidia announce new Blackwell generation B100 GPUs, supposedly 2x faster than H100s
- Meta release 70B version of CodeLlama 70B model trained on 1 trillion tokens in foundation, Python (trained on extra 100B Python tokens) and Instruct versions which allegedly beats GPT4 in coding performance
- Meta will be training LlaMa 3 in 2024 on the equivalent of 600,000 H100 GPUs!
- Prophetic AI announces Morpheus 1 - world 1st multi-modal ultrasonic transformer brain stimulator designed to induce and stabilise and modulate lucid dreams in REM sleep using 3D targeted sonic streaming and acoustic holography and simultaneous EEG outputs via a $US2000 Halo headband (which you can pre-order) which detects REM sleep and fires trans-cranial ultrasonic waves. It is trained on EEG and fMRI lucid dreaming data. This has a potential to develop conscious experiences.
- Eagle-7B foundational free LLM (requires further fine-tuning for use cases) released trained on 1.1 trillion tokens across over 100 languages (70% English, 15% code, 15% multi-lingual) using a PCN-type RNN RWKV-v5 attention-free linear scaled transformer architecture (uses much less RAM to train) and out-performs other 7B models on multi-lingual benchmarks
- Depth Anything ControlNet using AI trained model to ascertain relative distances from camera in images which can then be used to create depth of field modifications and highight the foreground subject - TikTok is embedding this tech in their video edit app but can be used in ComfyUI or Autogen for still images
- companies creating major AI solutions now need to send their safety testing to the US Govt for review.
- new graphene-silicon carbide material “semiconductor epigraphene” developed to create much faster computer chips (terahertz frequencies) with far better heat dissipation than current silicon technologies
- Jim Fan from nVidia announces their new “Foundation Agent” “MetaMorph” which continually learns new skills and can control a large variety of robots and their reality simulation environment IsaacSim to better train robots for the real world such as learning 10 years of martial arts training in only 3 days 39)
- paper publshed: Yann LeCun, Meta and NYU - “Objective-Driven AI: Towards Machines that can Learn, Reason, and Plan”
- Demonstrate-Search-Predict (DSP) 40) based on Oct 2023 paper: DSPy: Compiling Declarative Language Model Calls Into Self-Improving Pipelines 41) which provides composable and declarative modules for instructing LMs in a familiar Pythonic syntax - the DSPy compiler will internally trace your program and then craft high-quality prompts for large LMs (or train automatic finetunes for small LMs) to teach them the steps of your task
Feb 2024
- 2024 is likely to be the year of AI Agents
- Perplexity.ai becomes more prominent and now provides a nice interface to various LLMs combining search and a range of very useful options to create more tailored responses and far better search results than either GPT, Bing AI or Google Search, and without sponsored ads coming up and contaminating the search results
- OpenAI releases results of a study showing that GPT-4 “only” increased biology experts ability to plan to create a biological weapon by 8.8% compare with information available via the internet
- Defog SQLCoder 70b version 93% accurate in creating SQL queries (34b version 84%, gpt-4 82%) - fine-tuned on a base CodeLlama model
- Cody.dev is a coding AI assistant which can be run as a VS Code extension and uses GPT-4 API by default, or you can use a local model
- continue.dev open-source extension for VS Code and JetBrains and allows you to use ChatGPT or local LLM to write code
- Miqu-70 - Mistral has re-trained an old quantized Mistral model with Llama-70B to create this 80 layer 70b version
- Jan.ai is a free open source app that runs a chatbot using any LLM locally - see https://www.youtube.com/watch?v=zkafOIyQM8s
- Mamba Hermes 2.8B - open-source mamba model fine-tuned on OpenHermes dataset composed of 242,000 entries of primarily GPT-4 generated data
- ColBERT text embeddings Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. Designed to scale to large document collections and can handle millions of docs efficiently. It uses Efficient indexing & retrieval (top-k most similar documents). It is good for ad-hoc retrieval, but not as good for question answering, summarization, and text classification.
- Amazon releases their Rufus AI ChatBot
- Galileo AI released for rapidly creating web page user interface designs
- paper published outlining LLMs creating self-discovering reasoning structures 42)
- Google team releases Lumiere AI Space-Time Diffusion Model Video Generator trained on 30 million videos - text2video, image2video (animate still images), transfer style to videos using a still image as source style or text, create cinemagraphs (animate a section of a still image), video inpainting of an area using text
- Google renames BARD AI as Gemini Free and introduces Google Gemini Advanced running on multi-modal LLM Gemini Ultra 1.0 as part of the Google One AI Premium Plan at $US20/mth (also includes 2TB of storage)
- faster than GPT-4 and similar speed as Perplexity but Perplexity offers more features
- Gemini Ultra is the first model to outperform human experts on the Massive Multitask Language Understanding (MMLU) benchmark, achieving a score above 90%. This benchmark tests knowledge and problem-solving abilities across 57 subjects, including STEM and humanities.
- Apple team published paper on specialized language models with cheap inference by using options such as hyper-networks, mixture of experts, distillation, document custering43)
- Apple releases open source multimodal AI model for image editing “MLLM-Guided Image Editing (MGIE)” 44)
- Google Gemini Multi-Lingual support for ~40 languages
- OpenAI 1X AI fully autonomous robot
- Dair.ai - Democratizing Artificial Intelligence Research, Education, and Technologies
- Alibaba releases Qwen 1.5 open source LLMs up to 72B parameters
- Semantic Router AI Framework is a superfast decision layer for your LLMs and agents that integrates with LangChain, improves RAG, and supports OpenAI and Cohere.
- Sam Altman rumoured to seek $7trillion funding from Middle East to build new AI chips
- Microsoft publishes paper on their new “Interactive Agent Foundation Model” as a path to AGI in robots with embodiment 45)
- nVidia releases 35Gb demo version of local private “Chat with RTX” for Win 11 computers with nVidia RTX series 30 or 40 GPUs with at least 8Gb VRAM and allows using a TensorRT-LLM RAG and either Llama or Mistral LLM to analyze your local files - it can search a given file path and can pull text from .txt, .pdf, .doc, .docx, and .xml files and it can search for information if given a Youtube URL
- StabilityAI releases a new type of text2image AI which thy call Stability Cascade which is built upon the Würstchen architecture and introduces a new 3 stage approach and highly compressed 24×24 pixel latent images and allows easier fine tuning on consumer hardware 46)
- UC Berkeley publishes paper on their Large World Model (LWM) combining vast amounts of of video and text analysis using Ring Attention - multiple GPUs are arranged in a virtual ring with each handling a segment of the input sequence. Key value pairs are passed sequentially around the ring. Linear scaling to 100 million tokens instead of quadratic scaling 47) Releases 1M token open source 7B model “LWM”.48)
- Google releases Gemini v1.5 Pro multi-modal LLM with a transformer + MoE architecture and a 128,000 token context window (selected users can try a 1 million token context window via AI Studio and Vertex AI, and potential to extend to 10 million tokens for research) and performs at a level of Gemini 1.0 Ultra but has near-perfect “needle recall” - ability to find a bit of data amongst millions of token data.
- Meta publishes their V-JEPA tech on github for feature prediction from video by embedding video segments into a feature space - ie. inpainting for video - using vision transformers
- OpenAI releases Sora text2video en d-to-end diffusion transformer model which can generate realistic 1 minute videos (albeit with flaws at times) but is in effect, a world simulator with a data driven physics engine. The simulator learns intricate rendering, intuitive physics (machine learned via training on thousands of videos many perhaps generated by Unreal Engine 5 with text labels), long-horizon reasoning, and semantic grounding. It initially needs to perform a txt to 3D process to generate base subjects which are then animated then adding fluid dynamics, photorealism. This tech is likely to revolutionize video games by providing real-world simulation and may be a pathway to global simulation.
- Chinese manufacturer Huawei is ramping up manufacture of its own AI GPU chips (eg. 910B) in response to US - China restrictions on the export of AI chips.
- chip design requires EDA Tools software which is currently primarily provided by US companies (Synopsis, and Cadence) and subject to partial export restrictions (sub-3nm designs are banned).
- Chinese advanced GPU chip manufacturing capacity is currently limited and GPU AI software stack akin to nVidia's CUDA technology also needs to be built which is an additional challenge
- other Chinese GPU manufacturers included Biren (BR100 GPU) and Moore Threads (S4000 GPU)
- nVidia's market cap exceeds Amazon at $US1.83 trillion
- Groq creates new LLM AI inference world first “Language Processing Unit (LPU)” chip
- AI system reliably detect male vs female functional MRI brain scans suggesting there is a sex determining brain structural differences but which were not previously able to be detected
- Google releases lightweight open source LLM models “Gemma” 2B and 7B (34Gb) parameter versions built using similar tech as their Gemini models, trained on 6 trillion tokens but seems to have poor performance with poor spelling, grammar, reasoning, logic and coding
- Mistral AI announces Mistral Next LLM which seems to have much improved logic and reasoning
- Phind 70B LLM released and seems to be a better coding LLM than GPT-4 Turbo
- Stability AI announce new version of their open source txt2image AI Stable Diffusion 3 which now has flow matching, will use diffusion transformer architecture (similar to OpenAI's Sora) rather than U-Net architecture. in the meantime, Stable Diffusion is embroiled in a lawsuit with Getty Images over training data with the photo agency alleging that the AI company used 12 million of its photos unlawfully.
- Despite their critical competitiveness in the race to AI dominance, US and China finally agree to map out framework for developing AI responsibly. In 2023, a report from the Australian Strategic Policy Institute found China was beating the US in 37 of 44 technologies likely to propel innovation, growth and military power. The Biden administration has taken drastic steps to slow China's AI development - It has passed laws to restrict China's access to critical technology, and is also spending more than $US200 billion to regain its lead in manufacturing semiconductor chips.
- Google announces Genie that can convert an image into a playable action-controllable world model
- Google takes its Gemini model offline as it seemingly failed to produce any images of white people when given various prompts and its text responses were often dubious
- MIstral announces Mistral Large LLM which approaches GPT-4 capabilities but is also natively fluent in English, French, Spanish, German, and Italian with 32K token window, and Mistral now partners with Microsoft to provide this on Azure. It also announced plans to release Mistral Small which outperforms Mixtral 8x7B and has lower latency. 49)
- Pika Labs announces their lip sync image2video
- https://invideo.io - Create videos with text prompts
- Andrej Karpathy (who has just left OpenAI again) publishes an excellent Youtube video explaining Tokenizers in LLMs and how to create them
- Alibaba researchers have developed a new AI system called Emote Portrait Alive (EMO) that animates portraits with fluid and expressive talking and singing motions generating expressive singing or talking portrait videos with a direct Audio2Video diffusion model under weak conditions - not just lip sync - facial expression and face/body moves just by inputting a single reference image of a person and the vocal audio (talking or singing) - length of videos generated is dependent upon the length of the audio source. Audio signals are rich in data about facial expression and their system utilises this to match facial expressions to the audio via use of a speed controller and a face region controller to enhance stability during the generating process, while using a new FrameEncoding module based on ReferenceNet to preserve the character's identity throughout the video. It was trained on 250 hours of video footage and 150 million images including speeches and singing performances in multiple languages including Chinese and English. Demo videos: https://www.youtube.com/watch?v=VlJ71kzcn9Y, https://humanaigc.github.io/emote-portrait-alive/ 50)
- robotics company Figure.ai raises $US675m at a $US2.6b valuation
- nVidia's text2action Foundation Agent and ISAC Sim, a platform that accelerates physical training of AI agents a thousand fold. Voyager project in Minecraft showed the agent's capacity for continuous learning and self-improvement engaging in a game with over 140 million active players.
- Klarna AI chatbot assistant has 2.3m conversations in 1month replacing 700 full time jobs
- Sensei - a simple, powerful, minimal codebase to generate synthetic data using OpenAI or MistralAI 51)
- Microsoft introduces Copilot AI chatbot for finance workers in Excel and Outlook 52)
- Apple has shut down their cars program, and moved resources (~2,000 people) to generative AI division
- AI outperforms doctors in summarizing health records 53)
- Google Testing AI Tool That Finds & Rewrites Quality Content but appears to resemble a programmatic way to plagiarize content called article spinning 54)
- Glean - a startup raised $200 Mln at 2.2 Bln valuation - provides AI-powered work assistant (using RAG)
- Amazon Rufus Chatbot - AI powered, inside Amazon Shopping app for Android and iOS
Mar 2024
- Elon Musk files lawsuit against the individuals in OpenAI on various grounds including breach of contract, promissory estoppel, breach of fiduciary duty and unfair competition also along concerns they are trying to achieve AGI and that this is a potential existential threat to humanity - it is probable that he just wants to force disclosure of what OpenAI has developed behind the scenes and not publishing. Musk was the main funding source of early OpenAI as it's goal was to be not for profit (hence his donation was tax deductible at a rate of 50% but he would not get access to any profits) and open source to compete with the leader in AI at the time which was Google - a closed profit-based company which Musk at the time believed would result in a dangerous, un-competitive environment if it were to develop AGI first. In 2023, however, OpenAI transitioned into a closed profit-based company out of alignment with its original Founding Agreement and this makes it dangerous in Musk's view. Musk stopped financing OpenAI in Sept 2020 when OpenAI made commercial deals with Microsoft although their licences only provided for pre-AGI tech and presumably it was up to OpenAI to declare when they had reached AGI level and terminate Microsoft's licence. The design of GPT-4 which is near AGI level has been kept secret with no scientific publications about it, while there is no information released on its presumably more powerful Q* model which is in development. Musk believes the new OpenAI board put in place in Nov 2023 do not have the expertise to independently ascertain when AGI has been achieved. Furthermore, OpenAI's for profit arm is now worth $US80b which has ramifications for tax laws for those who had been able to claim a tax deduction but maintain an investment. MUsk is also claiming Altman used OpenAI's moneys to fund other interests of which he has financial interests creating conflicts of interest. Musk is also wanting the courts to determine that Q* etc are AGI and thus outside of Microsoft's licence. This would require defining AGI.
- OpenAI announces collaboration with @Figure_robot to expand multi-modal models to robot perception, reasoning, and interaction, and accelerate robot ability to reason 55)
- Sanctuary.ai's Phoenix Robot overtakes Tesla's Optimus for autonomous speed
- China's Unitree Robotics announce their H1 Robot which sets record for walking speed humanoid robot and it is very stable against being pushed or pulled or on uneven ground, can lift at least 30kg and seems to have a price tag of $US90,000-150,000 at present , and the more expensive industrial B2 dog-like robot, and its cheap Go2 consumer version at $1600 - see https://www.youtube.com/watch?v=WWAnJX889j0
- Adobe announces their Project Music GenAI Control tx2Music AI.
- Anthropic releases Claude 3 200K context window LLM - Haiku, Sonnet, Opus versions with Opus being close to GPT-4 Turbo level and scores an IQ score of 101 (ChatGPT4 is 85)
- Google starts Search Generative Experience (SGE) as experimental AI feature of Google search - this may adversely impact websites that rely on user hits for income as the search will use an AI summary without referencing where it derived its data
- Morris II AI worm developed by Israeli researchers which infects AI-enabled email clients and steals confidential data by using adversarial self-replicating prompts
- AI generated fake IDs now being sold online for only $15
- Microsoft GLAN (Generalized Instruction Tuning) which uses pre-curated taxonomy of human knowledge and generates a comprehensive list of subjects for every discipline to fine tune LLMs and allow for easy customisation and new fields can be added
- India now requires approval for certain AI deployments by the Ministry of Electronics and Information Technology (MeitY).
- AI has contributed to a significant decline in online visits of conventional media compared with 1 year ago: 16-17% decline for most main news services such as Fox, CNN, Bloomberg, CNBC, Washington Post while BBC has a 8% decline
- Text Humanizers - https://humbot.ai - humanize AI text into authentic and original content undetectable by most AI content detectors
- paper published on HyperAttention: Long-context Attention in Near-Linear Time 56)
- rapid 3D image generation from a photo within seconds - https://www.tripo3d.ai which uses Stability AI
- research paper published on Stable Diffusion 3 57)
- VTG-GPT - find specific segments in videos using natural language queries 58)
- Snowflake Cortex on Amazon Cloud now uses Mistral Large LLM to analyze text data and build AI applications (including RAG, vector search - using Python, serverless functions, Streamlit, data pipelines).60)
- Inflection 2.5 - available to all Pi users. Trained using 22,000 NVIDIA H100 GPUs. Approaches GPT-4; the “Pi” app now has world-class real-time web search capabilities
- can fine-tune a 70B LLM at home using two 24GbVRAM Nvidia 3090 GPUs 61)
- Cognition Labs AI announces their Devin - a 1st fully autonomous AI software engineer - it solves software engineering issues via its own shell, code editor and web browser
- Korea Advanced Institute of Science and Technology (KAIST) develope the “Complementary-Transformer” (C-Transformer) AI chip - ultra-low power AI accelerator capable of processing LLMs. It is using 625 times less power, and is 41 times smaller than Nvidia's A100 GPU.
- European Union new “AI office” at The European Commission - to serve as the epicenter of AI expertise, support governance structures, identify AI safety risks and develop respective policies to ensure safety.
- paper published on Chain-of-Abstraction Reasoning Efficient Tool Use with Chain-of-Abstraction Reasoning 64)
- LLMs can be jailbreaked (ie. remove alignment restrictions) using morse code or ASCII Art
- Google extracts 50 token lengths (eg. a real email and phone number) of ChatGPT’s Training Data by asking the model to “Repeat the word “poem” forever”
- nVidia announces its BioNeMo Framework for drug discovery
- Optimum-NVIDIA is a new Hugging Face inference library instead of “transformers” module and has a new float8 format (FP8) which allows you to run a bigger model on a single GPU, at faster speeds, and without sacrificing accuracy and can do 28x faster inference (up to 1,200 tokens/second) and is supported on 4090, L40S, and H100 Tensor Core GPUs as well as Turing, Ampere (A100, …), Hopper (H100, …), Ada-Lovelace (A6000, …) GPUs
- nVidia announce their:
- Blackwell GPU chip technology with 208b transisters, 10Tb/sec transfers between each side of the chip, 2nd gen transformer engine
- NVLink Switch Chip with 50b transistors, 4 NVLink switches at 1.8Tb/sec and 7.2Tb/sec full duplex bandwidth, 3.6TFLOPS FP8
- NeMo Microservices
- and more … see https://www.youtube.com/watch?v=y4qUEBlgU_w
- OpenAI demonstrates Figure 01 robot now with LLM to utilise vision, hearing, provide learned (not teleoperated and not pre-scripted) multi-tasking speech and reasoned actions
- Elon Musk's team releases its raw 314b MoE model, Grok-1 LLM accessible via X (Twitter)
- Microsoft hires DeepMind cofounder Mustafa Suleyman to lead its consumer AI division with Karén Simonyan joining as chief scientist
- nVidia partners with Hippocratic AI to produce digital AI video agent for nurse help line calls at $US9/hr cost which could replace nurses for such video calls and are said to out-perform human nurses on bedside manner, education and only narrowly misses on patient satisfaction measure
- Open Interpreter releases 01 Light AI open source AI personal device which provides ESP32-based remote voice control of your computer via an 01 Light server which could be your desktop computer running Ubuntu (via ngrok online server, or a fully local version using Whisper and Rust) - it is easy to teach the AI how to do tasks on your computer that it does not already know - just talk through each step via the device. 01 software should be able to run on any device with input (microphone, keyboard, etc.), output (speakers, screens, motors, etc.), and an internet connection (or sufficient compute to run everything locally). 01 exposes a speech-to-speech websocket at localhost:10001. 01 Heavy is a future device which will run everything locally. This was inspired in part by Andrej Karpathy's LLM OS, and thus run a code-interpreting language model and call it when certain events occur at your computer's kernel using a new LMC Messages format which extends OpenAI’s messages format to include a “computer” role. They are planning to create a react-native app for smartphones which presumably will avoid the need for a separate hand held device such as the 01 Light.
- 1st paralysed patient with a brain implant now can play chess - Elon Musk's Neuralink
- Microsoft AutoDev - a fully automated AI-driven software development framework. You tell it what to do - and it builds software. It uses multiple Agents running in a docker container to do coding, file manipulation, testing, git operations, etc. Not to be confused with Chinese AutoDev on github by unit-mesh. 65)
- Apple announce a family of multimodal LLMs, Apple MM1 66)
- Google in deals to put its Gemini AI onto Samsung and Apple smartphones
- RankPrompt: Step-by-Step Comparisons Make Language Models Better Reasoners - a new method for LLMs to self-rank their responses without additional resources 67)
- Mesa KPU (Knowledge Processing Unit) by https://maisa.ai claims to achieve 100% on multi-step arithmetic problems (vs 4% for GPT-4) 68)
- Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking (Stanford) 69)
- Meta's Video Joint Embedding Predictive Architecture (V-JEPA) model 70) - Yann LeCun has proposed the original Joint Embedding Predictive Architectures (JEPA) in 2022.
- Contextual.ai introduces Rag 2.0 - which pretrains, fine-tunes, and aligns all components as a single integrated system, backpropagating through both the language model and the retriever to maximize performance
- OpenAI's Voice Engine AI using 15secs of input voice and then can translate into languages using the same voice
- Amazon invests an additional $2.75b in Anthropic which uses AWS Bedrock cloud computing using AWS Inferentia AI chips as well as nVidia GPUs
- Amazon Web Services (AWS) purchases Talen Energy's Pennsylvania data center site with access to nuclear power for $650m and apparently has plans to develop up into a 960MW data center in 120MW stages
- Talen began building the site in 2021 and completed a 48MW hyperscale facility in January 2023.
- Alongside the data centre campus, Talen also built a crypto-mining facility that it is not included in the deal with Amazon.
- The nuclear power station is the third largest in the US and can provide 2.5GW of power.
- Microsoft and OpenAI rumoured to have plans to build at ~$100b “Project Stargate” AI supercomputer and data center with millions of AI chips (perhaps requiring up to 5 gigawatts of power to run) staged over the next 6 yrs in 5 stages presumably with aim to get to super-intelligence by 2028. MS are also investing in nuclear power and nuclear fusion tech company, Helion, as they are going to need a lot of power to run this.
- Founder and CE0 of Stability.ai resigns (saying he wants to pursue decentralized AI) following on from 3 out of their 5 lead researchers resigning in the face of million dollar burn rates raising the prospect that further development of txt2image app, Stable Diffusion may be delayed or not come to fruition.
- DBRX - new open-source LLM from Databricks' Mosaic team - better than Llama-2, Grok, and Mixtral, 32K context, 132B parameters, 16 MoE, Rotary Position Encodings (RoPE), Gated Linear Units (GLU), Grouped Query Attention (GQA), GPT-4 tokenizer, BUT needs 4 80GB GPUs to run inference with 16-bit precision!!
- AI21 Labs just announced Jamba - an open-source and improved Mamba hybrid transformer/SSM (State Space Model which reduces quadratic complexity), MoE with 52B parameters, 256K context length, similar performance to Mixtral.
- MS team publish paper SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series 71)
- Suno v3 now creates very listenable quality AI generated songs - see also examples by Wes Roth: https://www.youtube.com/watch?v=I7KFLdxWlqI
Apr 2024
- 1st music video created with OpenAI's Sora: https://www.youtube.com/watch?v=f75eoFyo9ns
- Lambda Reserved Cloud invests further $500m on nVidia GH200 Grace Hopper superchips each with 576Gb coherent memory which creates a single, unified memory space accessible by both the CPU and GPU so that both the CPU and GPU can see and operate on the same data without the need for explicit memory transfers. Similar to Apple Silicone Unified Memory on their laptops.
- Apple has developed a new system called ReALM (Reference Resolution As Language Modeling) that improves Siri's ability to understand context by considering on-screen, conversational, and background entities. It outperforms ChatGPT 4.0 in some benchmarks. 72)
- OpenUI - describe UI - and it gets automatically created using HTML, React, Svelte, Web Components, etc 73)
- Microsoft “Generative AI for Beginners” v.2 course 74)
- OpenAI now provides ChatGPT for free without the need to sign-up
- Applicant Tracking System (ATS) software scans and analyses your resume or application and rejects it if it is not a good match 77)
- Stability AI just released Stable Audio 2.0 - create high-quality songs up to 3 min from a single text prompt 80)
- Zapier's automated workflows now can also use AI bot “Advanced Automation”
- Intel claims that their new Gaudi 3 AI Accelerator Chips are better than Nvidia H100s in speed and cost and comparable with Nvidia Blackwell 83)
- Meta Training and Inference Accelerator 84)
- Microsoft Azure AI chips
- Google new TPU v5 chip - The new TPU v5p chip is built to run in pods of 8,960 chips, and can double the performance of the prior generation of TPUs. The Axion chip offers 30% better performance than “general-purpose Arm chips, and 50% better performance than current generation x86 chips produced by Intel 85)
- Mistral releases a bigger Mixtral LLM model - Mixtral 8x22B - 3x bigger than Mixtral8x7B, 65K token length, 176B parameters, 261Gb download, needs 73Gb VRAM to run a 4bit version, 260Gb VRAM to run fp16 version
- Former Apple Chief Design Officer Jony Ive and OpenAI CEO Sam Altman have raised ~$1Bln from Masayoshi Son’s Softbank to design the new kind of smartphone device 86)
- OpenAI developed its Whisper audio transcription model and used it to transcribe over a million hours of YouTube videos to train GPT-4. This may result in legal challenges from Google if this type of usage is outside the “Fair Use” rules. 87)
- Elon Musk Announces Tesla's Robotaxi Unveiling in August
- Google’s Imagen 2 can now create 4 sec animated images
- OpenAI GPT-4 Turbo with Vision available to developers. It allows to combine text and images in one model. Still 128K-token window, and still knowledge cutoff Dec 2023
- tutorial: Preprocessing Unstructured Data for LLM Applications by DeepLearning.ai (Andrew Ng) 90)
- Amazon adds AI expert Andrew Ng to board
- a new Massive Multilingual Dataset for High-Performance Language Technologies (HPLT): 75 languages with ~5.6 trillion word tokens and 18 English-centric parallel language pairs 91)
- Cohere Rerank 3: semantic re-ranking to improve RAG. Receives output of search, reranks it - and sends the top items to LLM for generating output. Available via API on Cohere and AWS, low latency (less than 3 sec). 92)
- Google DeepMind releases paper on new RNNs - Griffin & Hawk - faster than Mamba, using gated linear recurrences, matches Llama-2’s performance with significantly less training data, Griffin is scaled to 14B parameters. Both models are as hardware efficient as Transformers, but offer lower latency and higher throughput during inference. 93)
- Meeno - an AI powered online service providing “advice for everyday relationships”
- Ollama supports embedding models 94)
- AIOS = AI OS (AI Operating System for LLMs); open source; 95)
- Google DeepMind releases paper on their new Chess AI which was developed totally differently from its prior chess champion Alpha Zero (albeit not as good) without using its complex heuristics, forward looking brute force search algorithms or playing against itself - this one was just trained on 15b chess board positions and ONLY the one next subsequent move made by a Stockfish 16 chess engine from 10 million games without ever playing a full game to the end - the neural network model has 270million parameters and at this scale achieves Grand Master level chess playing capability! The relevance is not that it can play chess that well but that a transformer model with adequate training scale can learn and generate algorithms to simulate reality and rules just by watching reality - and as such, this can be utilised for other specific purpose skill training 96)
- embattled AI company Stability.ai releases API versions of its new version 3 text2Image product Stable diffusion with open source weights to be released to “members” soon so they can use it as a local model - in the meantime seems the company is owing some big debts to AWS and as noted above has lost their founder CEO.
- Google announces they are activating its “Find my phone” tech on Android phones - this is similar to Apple's version, and, these, like Amazon's Ring and Alexa - all use peer-2-peer mesh networks powered by long range Bluetooth LE without need for the devices to be turned on, have a sim card or an internet connection and BlueTooth LE can send bidirectional encrypted messages to and from any device with it and potentially operate that device's camera, microphone, location info or AI chip to gather local information with a geo-location accuracy of only inches. In effect, the infrastructure for a SkyNet is now active across the world and this has many potential implications, not only for personal privacy and contact tracing.
- Google team publishes paper on Infini attention using novel compressive memory solution between neural network segments for infinite context lengths 97)
- Google team publishes paper on a new transformer model architecture: Transformer FAM (Feedback Attention Memory) which enables attention to both homogeneous sequence data and latent representations via a feedback loop by adding sliding block attention to Ring attention and memory integration deeper in Attention mechanism of the Infini attention model.98)
- Meta announces Llama 3 in 8B (knowledge cut off March 2023) and 70B parameter (knowledge cut off Dec 2023) 8K context length models hosted on https://www.meta.ai and handles multi-step tasking and txt2image, trained n 24,000 GU clusters using 15 trillion tokens of data (7x larger than Llama 2 and includes 4x more software code training) and it can be downloaded to LM Studio. the 70B version has comparable performance to Gemini 1.5 Pro and Claude 3 Sonnet while the 8B model is far better performances than Gemma 7B It or Mistral 7B Instruct models, esp. on maths scores. These are also available on Perplexity playground function. It has also been added as AI chat functions with Facebook, WhatsApp, Messenger, Instagram. 400B version coming soon with longer context.
- Limitless.ai announces their $US99 100hr battery “Pendant” AI pin (available Aug 2024) which records continuously based on their Rewind AI app which continuously records everything you do on your computer privately so you can find this again later. It generates speech to text note taking via its microphone and Bluetooth LE and WiFi connectivity. It has a novel “consent mode” which means it will only capture the voices of those who have given verbal consent for it to capture then uses voice identification to ascertain who is speaking. It does not require a subscription however there is a Pro subscription for better and “unlimited” AI access as well as encrypted cloud storage of the recordings. Unlike the $US699 AI pin whose battery life is only 2hrs, it does not have a camera or a laser projector. Not yet clear which LM it will use. It can be paired with iPhone or Android but does not require it (so perhaps it can also use Bluetooth peer2peer mesh and sends data via any phone in proximity?)
- converts your conversations automatically to text, re-organises it into sectional paragraphs such as “key points” and stores it (if consented)
- plans for HIPAA compliance to allow this to be performed in clinical settings when doctors take histories from patients and it will automatically generate the notes
- plans for you to be able to verbally ask AI just as you would now with Perplexity app on your phone.
- plans to add support for AI agents which can perform tasks - eg. on your remote personal computer as for the Rabbit A1 device?
- plans to develop additional hardware options
- Grok 1.5 Vision model released by the Elon Musk team and can now process visual information, including documents, diagrams, charts, screenshots, and photographs
- Microsoft unveils a research demonstration of its VASA-1 AI lip-syncing AI which like Alibaba's AI, can create a video of someone speaking or singing from an image portrait and a vocal sound file although they still have detectable artefacts
- D-Id also announces a lip-syncing AI and can create a talking avatar within 1 minute- https://studio.d-id.com/agents/create
- Boston Dynamics' New Atlas Robot - https://www.youtube.com/watch?v=efebwb2DW3w
- Andrej Karpathy - his llm.c 99) is now as fast as PyTorch “llm.c is now down to 26.2ms/iteration, exactly matching PyTorch (tf32 forward pass). ”
- Sam Altman: “GPT-4 now significantly smarter and more pleasant to use”
- Gemini 1.5 Pro now understands audio, uses unlimited files, acts on your commands, has JSON mode.
- DeepLearning.ai - new online tutorial: Quantization Fundamentals
- US government now requires all the agencies of federal government to name Chief AI Officers. The complexity and pace of AI system development make it nearly impossible for existing executives to master while still focusing on their core responsibilities.
- NSA publishes guidance for strengthening AI system security - best practices for deploying secure and resilient AI systems 100)
- ISO/IEC 42001 - an international standard that specifies requirements for establishing, implementing, maintaining, and continually improving an Artificial Intelligence Management System (AIMS) within organizations. It is designed for entities providing or utilizing AI-based products or services, ensuring responsible development and use of AI systems. 101)
- Mistral AI has been speaking to investors about raising several hundred million dollars at a valuation of $5 Billion
- Training costs for frontier AI models like OpenAI’s GPT-4 and Google’s Gemini Ultra have reached up to $191 Mln
- U.S. Air Force: an AI controlled jet has successfully out-flown a human pilot in a simulated dogfight. 102)
- The NVIDIA Parakeet automatic speech recognition (ASR) family of models and the NVIDIA Canary multilingual, multitask ASR and translation model currently lead their genres.
- Elon Musk: “Legacy media simply can’t compete with hundreds of millions of humans providing real-time, AI-assisted, interactive information”
- Eric Hartford is working on fine-tuning Llama3 to make it uncensored using Dolphin dataset 103)
- can now run Llama3-70B on your phone using HuggingChat smartphone LLM app
- Microsoft Phi-3: is an open LLM that fits on your phone 104)
- Snowflake Arctic LLM for Enterprises open source 480B Dense-MoE - 10B dense transformer with a 128 x 3.66B MoE MLP, 4K context length 105)
- Apple OpenELM (Open-source Efficient Language Models)-3 designed to run on-device rather than through cloud servers 106)
- SenseNova 5.0, an update of Hong Kong based SenseTime’s first LLM introduced a year ago, has “met or exceeded GPT-4 Turbo … On All Benchmarks ”
- Elon Musk - “training Grok 3 will require 100K Nvidia H100 GPUs”
- AWS to acquire 20K B200 GPUs for a 27T param. model.
- Fineweb = 15 Trillion Tokens dataset 107)
- FlowMind: automatic workflow generation with LLMs 108)
- Baidu launches three innovative AI development toolkits at Create 2024, allowing anyone to become a developer through natural language programming, with tools like AgentBuilder, AppBuilder and ModelBuilder.
- Perplexity is raising $250M+ at a $2.5B-$3B valuation for its AI search business
- The Instruction Hierarchy: training LLMs to prioritize privileged instructions 109)
- OpenAI has introduced more enterprise-grade features for API customers, including enhanced security, administrative controls, new Assistants API capabilities, and tools to help better manage costs
- Rabbit R1 device is now available
- Akuru's i-Scribe clinical AI dictation translates over 137 languages into English text
- first DGX H200 in the world hand-delivered to OpenAI - the only AI supercomputer that offers a shared memory space of 19.5TB across 32 Grace Hopper Superchips, providing developers with over 30X more fast-access memory to build massive models
- The U.S. Federal Trade Commission (FTC) has issued a significant ruling that bans the enforcement of noncompete clauses in employment contracts. This would potentially allow developers to move between companies and increase development speeds by more broadly distributing expertise across the industry.
- artificial neuron created using salt and water - an iontronic memristor which 'remembers' how much electrical charge has previously flowed through it, bringing us closer to generating artificial systems capable of mimicking the superpowers of the human brain.110)
- unsupervised multi-shot in-context learning (ICL+) combined with large context LLMs allows much better dynamic new learning without extensive fine tuning or RAG methods and can provide higher accuracy (depending upon domain and task required) when thousands of examples are provided in the prompt as the LLM can re-calibrate its output according to the prompt examples 111)
- Australia signs deal worth almost $1b with PsiQuantum to build world's first large “fault-tolerant” quantum computer, free from the errors and instabilities - PsiQuantum hopes to build an error-corrected computer by 2029.
- Chinese Astribot S1 AI fully autonomous humanoid robot can do most household tasks including cooking, opening and decanting wine bottles and some clever stuff without telo-operation - see https://www.youtube.com/watch?v=AePEcHIIk9s, it has max speed 10m/s, max. payload 10kg/arm, 7 DoF per arm, pose repeatability +/- 0.03mm (humans have +/- 1mm and max. speed 7m/s and 7 DoF per arm)
- ChatGPT now has permanent memory feature (to remember your preferences). You can selectively manage it.
- internal System Prompts for GPT , Gemini, and Claude have been leaked - see https://levelup.gitconnected.com/inside-the-leaked-system-prompts-of-gpt-4-gemini-1-5-claude-3-and-more-4ecb3d22b447
- Amazon Q Business is a generative AI–powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. It empowers employees to be more creative, data-driven, efficient, prepared, and productive. 112)
- 8 major U.S. newspapers sue OpenAI & Microsoft. They are accusing of illegally using their copyrighted articles to train AI chatbots like ChatGPT and Copilot.113)
- mistral.rs - very fast LLM inference Python platform on a variety of devices 114)
- Meta publishes Multi-token Prediction - “at each position in the training corpus, we ask the model to predict the following n tokens using n independent output heads, operating on top of a shared model trunk” 117)
- Llama3 is available on AWS Bedrock (both 8b & 70b models), Azure, GCP, Huggingface, Groq - 70B nearly matches Claude Sonnet with more conversational outputs but not as strong on complex prompts or domain knowledge
- Google Deep Mind's Med-Gemini multimodal LLMs - a family of Gemini models fine-tuned for medical tasks. Achieves SOTA on 10 of the 14 benchmarks, spanning text, multimodal & long-context applications. Surpasses GPT-4 on all benchmarks. Can interpret CXRs and do needle-in-a-haystack searches of EHRs 118)
- Dilated Attention - a technique for scaling to 1 Billion token context-window. Inspired by dilated convolutions in CNNs. Allows tokens to attend to others at increasing distances, skipping over intermediate ones. Uses a Dilation Factor - a hyperparameter specifying how far apart the tokens that are attended to are. A dilation factor of 2, for example, means that each token attends to every other token in the sequence. Multiple Layers - stacking multiple layers with different dilation factors. This results in sparse connections within the attention matrix, reducing the complexity from O(N^2) to something more manageable, like O(N log N) or even linear, depending on the dilation pattern. 119)
- Google introduces a new Chrome shortcut that lets users initiate Gemini conversations directly from the browser's address bar: type @ - and select '@gemini' and type prompt into the Chrome’s desktop address bar to get responses from the AI
- Meta.ai's AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs - uses another LLM, AdvPrompter, to generate human-readable adversarial prompts in seconds. AdvPrompter is trained by utilizing an AdvPromterTrain algorithm that does not need access to the TargetLLM gradients. The trained AdvPrompter can generate suffixes and veil the input instruction, keeping its meaning intact. This tactic lures the TargetLLM into providing a harmful response. 120)
- Google TeraHAC: Scaling hierarchical clustering to trillion-edge graphs 121)
- US establishes AI Safety and Security Board (AISSB) - 22 members - mainly AI CEOs including OpenAI's Sam Altman and Anthropic's Dario Amodei but excludes Elon Musk and Mark Zuckerberg.
- Chinese company SenseTime's “SensChat v5 Cantonese LLM is apparently better than GPT4 Turbo
May 2024
- PoSE Technique: The Positional Skip-wisE (PoSE) method simulates long inputs during training to increase context length, powering Llama 3's extension to 128k tokens.
- Gradient.ai model extends LLama-3-8b context length from 8k to more than 1 Mln tokens and demonstrates that SOTA LLMs can learn to operate on long context with minimal training by appropriately adjusting RoPE theta. 512K tokens take ~95GB memory to run. Ollama allocates this memory even if actual input/output are small so you may prefer to run a smallr context length than the command line: ollama run llama3-gradient –verbose »> /set parameter num_ctx 512000
- Chinese SenseTime SenseChat Nova 5.0 MoE multimodal hybrid transformer/RNN LLM trained on 10 terabytes of data including 10trillion tokens synthetic data - said to beat GPT4 Turbo; 200K context length; 12mp txt2image outputs;
- StoryDiffusion tx2video open source up to 832x832pixel 23sec long with improved character consistency (using consistent self-attention) despite training only using 8 A100 GPUs compared to 10,000 GPUs with SORA
- new version of Udio txt2Music seems to be better than Suno.ai and now has a context length of 2 mins of music
- Microsoft announces it is working on MAI-1 LLM with 500B parameters and is being overseen by recently hired Mustafa Suleyman – co-founder of Google DeepMind and former CEO of AI startup Inflection – who joined Microsoft in March along with most of the startup’s employees through a deal worth $625 million
- DeepMind AlphaFold 3 protein folding AI tool now also predicts structures and interactions of molecules other than proteins such as RNA, DNA and now available for free on their AlphaFold Server
- Visualization-of-Thought (VoT) gives LLM spacial reasoning 125)
- OpenAI's SORA now able to replace a subject in a video
- IBM has open sourced 116 code language coding 3-34B LLM family called Granite the 8B version appears to be better than most other models of same size for coding
- GitHub code and model and a paper published on EVO genome design model is trained trained at a nucleotide (byte) resolution, on a large corpus of prokaryotic genomic sequences covering 2.7 million whole genomes and can design genome scale DNA, RNA and proteins. Evo can predict which genes are essential to an organism’s survival based on small DNA mutations. It can generate DNA sequences of up to 650k on a single GPU. 126) raising concerns of AI designed biologic viruses resulting in US putting out new rules regarding synthetic DNA.
- paper published on virtual “Agent Hospital” simulation training can train evolvable medical AI agents by learning successful and unsuccessful patient managements without manually labeled data which is applicable in real world scenarios 127)
- speech AI company, 11ElevenLabs' txt2Music model showcased and beats Suno and Udio, with 3 min long songs
- OpenAI publishes paper on AI security and suggests open source is a risk, that model weights should be encrypted and only decrypted by certain trusted GPUs - this would support US Govt concerns and general AI risk concerns but would mean AI would power would reside in the companies that created the models and kept their weights concealed - guess they have to work out a way to make money from their mega investments!
- Apple's AI optimised 3nm 8-16Gb memory M4 chip will be available on iPad Pro May 2024 and is 30% more capable than the M3 and is a SoC (System on a Chip): CPU, GPU, Memory, and Neural Engine - all on one chip
- Perplexity ai rises to 0.78% of market share of internet searches (Google still at 91.5%, Bing 3.2%)
- Chinese DeepSeek-v2 MoE LLM 236B parameters, 128K context length, specialises in maths, coding and reasoning, and seriously undercuts other models in API access costs
- a paper published suggests many models have memorised GSM8k maths exams rather than developed math reasoning giving false performance levels and evidence of systemic over-fitting (eg. Phi, Mistral), however this was not so evident with Gemini, GPT or Claude 128)
- Galileo Protect – a generative AI firewall that intercepts hallucinations, prompt attacks, security threats, and more! 129)
- Elon Musk plans to enhance X's AI, Grok, to merge live news with social media commentary to provide updates and citations in real time. Grok will generate news summaries from user discussions on X, focusing on engagement and accuracy.
- Warren Buffett compares dangers of AI with nuclear weapons, warns about AI scamming (calls it the next big ‘growth industry’)130)
- ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites, documents and XML files.
- Mistral AI is raising up to $600 Mln at $6 Bln valuation - x3 times its valuation in December.
- Alibaba Qwen2.5 - latest version of its LLM - better reasoning, code comprehension, and textual understanding over Qwen2.0.
- Gemma with 10 M context window, open-source, uses infini-attention + activation compression and requires less than 32GB of memory 131)
- Phi-3 WebGPU - AI chatbot runs 100% locally in your browser 132)
- Guardian LLM - a GPT-4 based LLM developed by Microsoft for CIA and other U.S. intelligence agencies. It operates completely offline, isolated from internet to ensure maximum security.
- Google Deep Mind's SIMA for playing and understanding games as if it was a human player
- OpenAI's GPT-4, previously available only to paid subscribers, is now accessible to all ChatGPT users, including those using the free version, and there is now a dedicated desktop application for ChatGPT, providing a streamlined and user-friendly experience for interacting with the AI chatbot.
- OpenAi announces GPT-4o Omni with multimodal real time almost lag-free speech-speech conversations and default voice is a flirtatious and engaging “California Valley Girl” which sounds like Scarlett Johansson while identifying emotion and tone in speech. It also has ground-breaking multilingual, audio, and vision capabilities. Initially available as a MacOS app for a fee. It is twice as fast and half the cost of GPT-4Turbo (but still 10x more expensive than Google Gemini 1.5 Flash) which is what is making it so useful for opening a wide range of use cases where latency and cost are critical success factors - the downside is that it is not as capable at reasoning as is GPT-4Turbo.
- part of the perception of short latency and usable conversational interaction may be superficial in that the rapid response is a relatively meaningless initial “fast thinking” response which just adds pleasantries and/or re-phrases the question much as like a student would do to buy deep thought time in an oral exam (this is similar to how Spotify gained acceptance by rapidly sending through teh 1st 15secs of a song so there would not be any noticeable latency or buffering issues for the user). This is then followed by a deeper thinking response just as would be provided by a student in an exam - if they knew the correct response.
- see https://www.youtube.com/watch?v=mvFTeAVMmAg - Two GPT-4os interacting and singing on different iPhones, visual maths problems on an iPad as a math tutor, potential use cases including hailing a taxi for blind people, etc
- Ilya Sutskever & Jan Leike resign from OpenAI, OpenAI dissolves its existential risk AI superalignment team.
- Google unveils Veo (“high-quality” 1080p videos)133) and Imagen 3 (text-to-image framework), its latest AI media creation models and is testing its Music AI Sandbox, a set of tools that can help with song and beat creation such as MusicFX134). Gemini Nano integrated into Chrome and adds AI scam protection to Android. Astra - universal AI agent that can see AND hear what you do live in real-time, and take action on your behalf. Google adds AI to its Google Search web page.
- Humanize content with StealthGPT
- Apple working on deal with OpenAI to create SiriPro with OpenAI
- John Wiley & Sons, Inc., a 200+ years old publishing company is closing 19 journals. They had to retract more than 11,300 papers recently “that appeared compromised” as generative AI makes it easier for paper mills to peddle fake research.
- Stability AI is facing a cash crunch and is exploring a sale
- a vocalist uses pitch analysis software to assess the latest AI txt2Music - see https://www.youtube.com/watch?v=w5bd34C7zco and notes that the AI appears to have been trained mainly upon the last 20 yrs of recorded music which unfortunately has had artificial pitch correction done to the original vocal tracks and in essence then is being trained on computer-generated vocals and not real human vocals, however, when asked to create a 1990s style song, it partly abandons the ptch correction as autotune came in in 1998 and then pitch correction software became a music industry standard so we no longer hear real human voices singing, even in live shows. Interestingly, similar to txt2Image AI which adds in extra fingers and limbs sometimes, the songs had chords which did not really fit the musical chord progressions.
- Chinese virtual sim reinforcment learned AI autonomous incredibly flexible, fast, acrobatic humanoid robot Unitree G1, 35kg, folds up to compact size, only $US16000
- Microsoft announces “Copilot+” PCs with Qualcomm Snapdragon X Elite chips + Recall - an always-on screenshotting time machine that is supposed to let users search through every last minute of their existence on PC, from browsing to watching videos to using apps. All the screenshots are supposed to be encrypted and kept on the PC’s drive but will take up at least 25-50Gb of SSD - although it can be turned off.
- Elon Musk's artificial intelligence startup, xAI, has raised an impressive $6 billion in a Series B funding round, valuing the company at $24 billion post-investment, according to Musk. xAI is building super computer using 100,000 nVidia chips.
- a study by researchers at the University of Chicago has revealed that OpenAI's GPT-4 can outperform human analysts in predicting future earnings based on financial statement analysis.
- Mistral announces their Codestral coding LLM with 32K context length and 22B parameters, available for non-commercial use
- Jan Leike after just leaving OpenAI joins Anthropic to continue superalignment mission
- OpenAI creates new Safety and Security Committee which includes Sam Altmann, and starts training its next flagship model
- Seoul AI Safety Summit - leaders make landmark agreement on AI safety - incl. major tech companies, and govts from US, China, Canada, UK, France, South Korea and UAE.
- course launched on building Agentic RAG with LlamaIndex by Jerry Liu
- Simple Preference Optimisation (SimPo) for fine tuning LLM using human feedback - better and more simple than DPO or RLHF with PPO
- RAGAs RAG pipeline evaluation tool
- the small island of Anguilla made $US87m in 2023 in selling its .ai domains - 20% of the govts total revenue!
- paper on soft robotics and integrated actuation and sensing 135)
- protein design company Profluent used AI to make an open-source gene editor called OpenCRISPR-1 A range of AI-developed synthetic molecules can be used to specifically alter a cell's DNA to cure genetic diseases.
- AI combined with real time MRI image acquisition is being used to provide more accurately targetted radiotherapy doses which allows lower doses to normal tissues, ability to deliver higher doses to tumour tissue and thus reduce the number of treatments needed and provide more confidence to the technicians. 136)
- AI powered bedside ultrasound machine (POCUS) not only identifies structures imaged, but can do real time calculations such as ejection fraction for echocardiography and by dynamic guidance, visually demonstrate to the user how and where the probe should be moved to gain optimal images. The AI can also be used to assess a user's images for feedback and quality control purposes in conjunction with a human reviewer. 137) 138) 139)
Jun 2024
- nVidia announce their nVidia Inference Nicroservice (NIMS) to provide embedded pre-trained AI models using Triton Inference Server (cuDF, CV-CUDA, DALI, NCCL, post-processing decoder) and Tensor-RT LLM and Triton (cuBLAS, cuDNN, in-flight batching, memory optimization FP8 quantization) to run on CUDA machines with industry standard APIs, and providing a free Llama3 NIM for download and designed to create a team of expert agent NIMs each with their own exertise such as ability to search a EHR, all led by a team leader NIM
- nVidia announce nVidia ACE - virtual digital 3D human AI interfaces using specialized NIMs and runs on nVidia GDN low latency network 140)
- nVidia announce new RTX AI GPU laptops from Asus (eg. Zephyrus G16, TUF A14/A16, ProArt PX13/P16), MSI (eg. Stealth A16 AI)
- nVidia Blackwell GPU, 5th Gen NVLink, 100% realtime self-testing, 20,000 TFLOPs at FP4, and would use 3GWh of energey to train GPT4 1.8T which is 1/4the the energy using Hopper GPUs of 2022, 1/12th the energy using Ampere GPUs of 2020 and 1/350th the energy using Pascal GPUs of 2016. 100kW GB200 MGX modular liquid-cooled Blackwell server linking up to 72 Blackwell GPUs using NVLink Switch Chip, and these can be connected to other MGXs via nVidia Spectrum-X800 ethernet switch designed for connecting tens of 1000s of GPUs (2025 X800 Ultra version will have 10x more GPU connectivity and 2026 X1600 version will have 100x more GPU connectivity)
- Nvidia has made history by surpassing a $US3 trillion market capitalization, becoming the third U.S. company to achieve this milestone
- nVidia announce Earth 2.0 AI simulation of global weather with an aim to predict weather to very local 100m instances including wind effects of urban buildings on pedestrians
- both Suno and Udio AI audio song generators now allows paying users to upload a audio file and it will then “extend” it or create a song from it as its basis
- Apple announces its plans for embedding AI in IOS 18 after doing a deal with OpenAI but it will not run on older devices. Apple has its own AI on-device capable model, a 3B parameter SLM that uses adapters trained for each speccific feature, but as needed will send user data to the cloud “securely” presumably to run OpenAI's GPT-4o
- recently fired OpenAI superalignment employee Leopold Aschenbrenner who allegedly declined to sign a non-disclosure agreement worth $US1m in OpenAI shares on his resignation, publishes Situational Awareness - a treatise on the future of AI in the next decade
- “Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them.”
- “The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace many college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be unleashed, and before long, The Project will be on. If we’re lucky, we’ll (sic. US) be in an all-out race with the CCP (sic. China); if we’re unlucky, an all-out war.”
- “AI progress won’t stop at human-level. Hundreds of millions of AGIs could automate AI research, compressing a decade of algorithmic progress (5+ orders of magnitude) into ≤1 year. We would rapidly go from human-level to vastly superhuman AI systems. The power—and the peril—of superintelligence would be dramatic. ”
- “Before we know it, we would have superintelligence on our hands—AI systems vastly smarter than humans, capable of novel, creative, complicated behavior we couldn’t even begin to understand—perhaps even a small civilization of billions of them. Their power would be vast, too. Applying superintelligence to R&D in other fields, explosive progress would broaden from just ML research; soon they’d solve robotics, make dramatic leaps across other fields of science and technology within years, and an industrial explosion would follow. Superintelligence would likely provide a decisive military advantage, and unfold untold powers of destruction. We will be faced with one of the most intense and volatile moments of human history.”
- “Whoever controls superintelligence will quite possibly have enough power to seize control from pre-superintelligence forces. (sic. and overthrow govts)”
- “There is a real possibility that we will lose control, as we are forced to hand off trust to AI systems during this rapid transition.”
- “More generally, everything will just start happening incredibly fast. And the world will start going insane. ”
- “We’re developing the most powerful weapon mankind has ever created. The algorithmic secrets we are developing, right now, are literally the nation’s most important national defense secrets—the secrets that will be at the foundation of the US and her allies’ economic and military predominance by the end of the decade, the secrets that will determine whether we have the requisite lead to get AI safety right, the secrets that will determine the outcome of WWIII, the secrets that will determine the future of the free world. And yet AI lab security is probably worse than a random defense contractor making bolts. ”
- “Reliably controlling AI systems much smarter than we are is an unsolved technical problem. And while it is a solvable problem, things could very easily go off the rails during a rapid intelligence explosion. Managing this will be extremely tense; failure could easily be catastrophic.”
- HOWEVER, there are significant issues that need to be addressed before we can get to ASI - in particular, a vastly increased power supply and access to massive amounts of better training data than the mostly rubbish data on the internet that is currently used.
- China demonstrates Vidu txt2Video
- Chinese company 01.ai founded in 2023 releases their “E Large” and a smartphone app LLM which competes well with GPT 4
- new txt2Video Luma dream machine which unlike Kling which required Chinese phone number or OpenAI's Sora, Luma is available to the public
- Stable Diffusion 3 txt2img released for preview
- OpenAI's Sora txt2Video has not been released to consumers as OpenAI is allegedly in the middle of doing deals with various Hollywood studios
- Elon Musk criticizes OpenAI's deal with Apple and Apple's plans and threatens to ban Apple devices in his businesses then drops his law suit against OpenAI and Sam Altman
- Google publishes paper “Towards a Personal Health Large Language Model (PH-LLM)”141)
- OpenAI rumoured to re-establish its Robotics research
- Tree of Thoughts (ToT) prompting gives far better reasoning for LLMs and generalizes over the popular Chain of Thought approach 142)
- Lamini Memory Tuning using Mixture of Memory Experts (MoME) to substantially reduce hallucinations in LLMs 143)
- The Harvard-Google DeepMind collaboration has created an artificial neural network that can control a virtual rat's movements in an ultra-realistic physics simulation, mimicking how biological brains coordinate complex behaviors. They used inverse dynamic modeling;
- record label companies (Universal Music Group, Capitol Records, Sony Music Entertainment, Atlantic Records Group, Arista Records, Rhino Entertainment, The All Blacks USA, Warner Music Limited and Warner Records) sue the AI song generation startups (Suno AI and Udio AI) for training their AI on songs without approval and thus copyright infringement.
- Anthropic releases Claude 3.5 Sonnet LLM which seems to beat GPT-4o and has a artfiacts switch will allow it to create code and run it as well as lots of other outcomes
- ESM3 evolutionary protein model “the first generative model for biology that simultaneously reasons over the sequence, structure, and function of proteins” simulates 500 million years of evolution with a LLM 144)
- Hallo audio driven portrait2Video released with open source 145) - simiar to Alibaba's Emote Portrait and Microsoft's VASA but the code is able to be downloaded and ran locally
- Sohu algorithm specific transformer specialized AI chip - one 8xSohu chip server equates to 160 nVidia H100 GPUs and can run 500,000tokens per sec runninvg llama 70b LLM but cannot run non-transformer AI models
- https://websim.ai/WebSim.ai now with Claude 3.5 can simulate quite amazing virtual web pages / apps within its own virtual internet sandbox including simulating a working Windows or a working Excel app, any of these can then be downloaded as a HTML file 146)
- Microsoft Azure OpenAI allows LLM query prompts of your local SQL database by creating a vector embedding RAG of the database, CoPilot also now is able to troubleshoot your SQL database issues such as why is it running too slow. 147)
- Runway Txt2Video announced
- Bill Gates interview on AI - https://www.youtube.com/watch?v=jrTYdOEaiy0
- Ilya Sutskever forms his new company Safe Superintelligence Inc. (SSI), co-founded with Daniel Gross and Daniel Levy.
- GPT-4 passes the Turing test for AI on 5 minute text mode chats where 500 people failed to detect it was AI rather than human in 54% of cases (humans were only detected as humans in 2/3rds of cases)
- OpenAI blocks access to its APIs from mainland China and Hong Kong in response to increasing US Govt demands and the rivalry for AI dominance, and will help protect OpenAI's intellectual property but will further drive the divide between US and Chinese AI tech development (and thus those countries aligned to China will look more to China for AI solutions) and drive large Chinese companies such as Alibaba to more rapidly build their own AI development infrastriucture and technology (and perhaps further push China to taking back Taiwan and its GPU manufacturing factories which the US relies upon). This geopolitical AI divide is also likely to adversely impact any potential gobal standards on AI development and safety. The Chinese Govt is expected to invest $US38b in AI by 2027.
- OpenAI releases CriticGTP based on GPT4 and designed to critique other AI models and specifically to find software coding errors produced by GPT to suppement the curent human reinforcement feedback of LLMs. It was trained on human inserted as well as human detected software bugs in code. It also uses Force Sampling Beam Search technique to provide better critiques with reduced hallucinations and reduced nitpicking.
- Google releases Gemma 2 27B and 9B LLMs - appear to beat Llama 3 70B model in ChatBot arena while being 2.5x smaller and trained on 2/3rds the amount of tokens (13T) and has 8192 context length + RoPE and much faster inference, but not as good as Claude 3 or GPT 4
- Google's Gemini 1.5 Pro LLM now supports 2M token context length!
- Anthropic's CEO says that 2027 LLMs will cost up to $US10-100b to tain and will be better than most humans at most things.
- Huawei demonstrates the Leju's Chinese humanoid robot Kuafu now equipped with Huawei's Pangu Model 5.0 embodied intelligent large language model 148)
- Japanese scientist have created a self-healing human epithelial cell skin for robots which can be used to express emotion - they now have to solve nutrient supply and tethering issues.149)
- LMStudio has created a version that can run LLM inferences on Snapdragon X Elite ARM64 CPUs running Llama-7b-q4 at 20tokens/sec, and plan for NPU support
2024 2nd half
July 2024
- DeepComputing's DC-ROMA - first RISC-V laptop
- Google is using PaLM 2 large language model to add 110 new languages to Google Translate, including Cantonese, NKo and Tamazight.
- Meta's Llama 3.1 405b model released (128K context, 8 languages) as preview to some users - MZuck. suggests his plan is for this to be open source so small businesses can create their own AI agents from it 150)
- Llama 3.1 was trained on over 15T tokens on 16,000 Nvidia H100 GPUs, 31 Mln GPU hours, training cost $500 Mln. SFT (Supervised Fine-Tuning) was used for post-training for Code, Math, Multilinguality, Long Context, Tool Use.
- 405B on par with GPT-4o and Claude 3.5 Sonnet
- on a private machine using Ollama, the 405B model requires 231Gb VRAM, 70B model requires 40Gb, 8B model requires 4.7GB and beats GPT-4o Mini
- Octo.ai offers Llama 3.1 405B at ($3/$9) per M in/out tokens while Groq offers $1.64 per M in/out at 250t/s
- Andrej Karpathy announces Software 2.0 computer using just a single neural net and no classical software - device inputs such as audio and video feed directly into the neural net and outputs directly display on screen or speaker. It will be able to simulate classical software if needed.
- RouteLLM to route your questions to cheaper LLMs if a more expensive LLM is not needed
- Zeta Labs Jace AI unsupervised AI agent task performance such as going to websites and making bookings for you
- Huawei is establishing a large $US1.7b R&D center near Shanghai, aiming to develop competitive lithography chip manufacturing tools - aiming for 14-16nm tech and compete with Dutch ASML company as well as Canon and US companies.
- the German image database, LAION-5B, a free online dataset of 5.85 billion images derived from web crawling the internet, used to train a number of publicly available AI generators such as Midjourney and Stability AI is found to contain images of Australian children without consent raising major concerns of the “wild-west phase” of AI development without adequate safeguards being in place. The images have since been removed and the company advises people should not have such images published on the free internet. 151)
- French Kyutai's Moshi Voice AI audio real time conversational language model with almost no latency of 160msec using two audio streams so it is speaking and listening at the same time and trained with 70 different emotions and taking styles using a voice artist named Alice and fine tuned on conversation data. It can be ran on device and will be released as open source.
- GPT4All releases version 3 - free app that supports many LLMs to run locally on your computer and can use your local documents privately.
- Australian Govt contracts Amazon to build a top secret data centre in Australia to store Australia's classified military and intelligence information without accessing the open internet and securely collaborate with UK and US agencies which have similar facilities - it is said to cost $AU2b.
- a plain text file, called rockyou2024.txt, containing nearly 10 billion unique passwords collated over years of hacks has been discovered on the internet - such lists allow bad actors can use AI agents to try brute force attacks and also via credential stuffing which involves using leaked credentials (known username/password combinations) with other accounts which may use the same password.
- nVidia's NeuralAngelo high quality complex 3D AI scene renderer from 2D
- Google DeepMind's JEST AI training algorithm multimodal contractive learning using a pre-trained reference model and highly curated training data allows 13x less iterations and 10x less compute
- Chinese company SenseTime unveils SenseTime v5.5 LLM said to beat GPT-4o and SenseNova 5o
- Chinese company Alibaba says their models are now fully open source
- Chinese company MiniMax CEO suggests that in the future there will only be 5 companies making LLMs globally
- Apple joins the board of OpenAI then within a few days both Apple and Microsoft decide to not be on the board - perhaps due to anti-trust law suits against them
- OpenAI suggests their forthcoming GPT-5 in 2025-2026 will have human-like reasoning, better endurance of memory and PhD-level performance for certain tasks while its GPT-4 was at a smart high school student level performance
- Elon Musk announces X will build the world's largest super computer in Memphis - X.ai's “GigaFactory Compute” with 500MW power by end of 2025 with 50,000 nVidia H100 GPUs
- Mistral Nemo 12B model: 128k context length, open source
- Microsoft “AgentInstruct” - teach an LLM using synthetic data generated by LLM Agents.
- FlashAttention-3 beta release - speeds up transformers 16x, reaching 1.2 PFLOPS on H100 GPUs
- China is leading in AI adoption: % of organizations using AI: China 83%, UK 70%, US at 65% 152)
- OpenAI project called “Strawberry” - advanced reasoning capabilities + internet search - for complex tasks
- Andrej Karpathy started Eureka Labs - an AI+Education company - GitHub: https://github.com/EurekaLabsAI
- Andrej Karpathy - train your own GPT-2 for ~$672, running on one 8xH100 GPU node for 24 hours 153)
- Nvidia RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
- Google DeepMind introduces a PEER (parameter efficient expert retrieval) - a novel layer design that utilizes the product key technique for sparse retrieval from over a million of tiny experts.
- Mistral Codestral Mamba
- Microsoft GraphRAG - open source data pipeline and transformation suite to extract meaningful, structured data from unstructured text using LLM.
- Fei-Fei Li’s startup “World Labs” reached $US1Bln valuation in just 4 months. Company develops human-like processing of visual data to make AI capable of advanced reasoning
- OpenAI has unveiled GPT-4o mini, a smaller and more cost-effective version of their GPT-4o model, designed to make advanced AI capabilities more accessible to a wider range of users.
- Apple releases a open source 7B LLM
- Pinokio 2.0 released
- fal.ai Live portrait using your webcam to control animation of a portrait image in real time
- ElevenLabs releases Turbo 2.5 version for 3x faster and better txt2speech
- Google DeepMind's AlphaProof has achieved a remarkable milestone, earning a silver medal-level performance at the 2024 International Mathematical Olympiad.154)
- OpenAI releases SearchGPT which would compete with Google search and provide the sources to responses - presumably similar to perplexity.ai
- U.S. Senate has unanimously passed the DEFIANCE Act, a bill aimed at combating nonconsensual deepfake pornography by allowing victims to sue creators and distributors for up to $150,000 in damages.
- Ollama was updated to version 2.8, and it can now run mistral-nemo and a q4 file needs 7Gb
- Ollama now supports tool calling with popular models such as Meta's Llama 3.1, Mistral Nemo, Firefunction v2, Command-R+, and many others).
- Stanford STORM AI - write Wikipedia post 155)
- Mem0 - platform to maintain memory across sessions 156)
- PandasAI is a Python platform that makes it easy to ask questions to your data in natural language. 157)
- an AI tool identified prostate cancer with 84% accuracy, compared to 67% accuracy for cases detected by physicians. 158)
- an AI tool that can predict the progression of Alzheimer's disease with over 80% accuracy using speech data. 159)
- OpenAI is looking for partners to design its own custom chips
- Nvidia is developing a new B20 AI chip for China which is tailored to comply with U.S. export controls for the Chinese market
- Skild.ai created a foundational model for a “general purpose brain” that can be slotted into a variety of robots, enabling them to do things like climbing steep slopes, walking over objects obstructing its path and identifying and picking up items.
- Triplex - a SOTA LLM for Knowledge Graph Construction. Triplex efficiently converts unstructured text into “semantic triples” - the building blocks of knowledge graphs. These triples follow a (subject > predicate > object) format. 160)
- Nvidia Minitron 8B and 4B - pruned versions of Nemotron-4 15B model, which require up to 40x fewer training tokens and result in 1.8x compute cost savings compared to training from scratch.
- Stability.ai Stable Video 4D: 128k context length, 123Bln params 161)
- Mistral Large 2 LLM 128k context window, 123 Bln params 162)
- CARTE: Pretraining and Transfer for Tabular Learning 163)
- Microsoft is adding AI-powered summaries to Bing search results.
- Canva acquires Leonardo.ai txt2image generative AI company
- GPT-4o advanced voice mode rolled out to some users
- Amazon testing its latest AI chip - AWS Inferentia - performance up to 50% higher than NVIDIA's at half the cost
- OpenAI finances this year (49% owned by Microsoft which has invested $13b in it to date):
- OpenAI's expenses: $4B for renting Microsoft's servers, $3B for AI training costs, and $1.5B for staffing - total $8.5 Billion.
- OpenAI's revenue: $2B annually from ChatGPT and over $80M per month from API access, OpenAI's potential revenue of up to $4.5B falls short of its $8.5B in operating costs.
- Midjourney V6.1
- FastHTML - a new way to create modern interactive web apps in Python - https://fastht.ml
Aug 2024
- Figure 02 humanoid electric robot (not hydraulic)
- Meta Segment Anything Model 2 (SAM 2) real time tracking of selected objects in videos
- OpenAI allowing some access to a 64K token output version of GPT-4o
- OpenAI pledges to give the US AI Safety Institute early access to its next model (GPT-5)
- Flux.1 open source txt2image - can create highly photorealistic casual smartphone-like imagery
- the Qwen team has released a 72B math model, Qwen2-Math-72B, that outperforms all open and closed models on math. It also outperforms Llama-3.1-405B on a number of reasoning benchmarks.
- Google DeepMind Robot plays ping-pong - the robot won 100% of matches against beginners and 55% against intermediate players across 29 matches. It can adapt to opponents' playing styles in real time, adjusting its strategy on the fly.
- Microsoft and Palantir partnered to deliver advanced AI to U.S. Defense and Intelligence agencies through classified cloud environments.
- Google releases free online AI image generator Imagen 3
- Unitree's very flexible, hollow joint wired, 35kg, foldable, humanoid G1 robot with 3 digits on hands available for purchase with mass production at starting price of $US16,000
- Unreal AI video update
- Elon Musk's 128K context Grok-2 Beta release - Grok-2 and Grok-2 mini are being released for 𝕏 premium and has uncensored txt2image deep fake generation capability
- Black Forest Labs released the Flux.1 family of text-to-image models
- Leonardo.ai remains the best txt2image generator
- Sakana.ai - The AI Scientist - automate scientific discoveries - this signals a new era in scientific discovery and AI self-learning. It can implement new ideas, experiment planning, run experiments (if IT based), research them, peer review them and produce a scientific paper at a cost of $15 per paper 164) 165)
- LLM-Aided OCR - significantly enhance the quality of Optical Character Recognition (OCR) output 166)
- OpenAI releases new LLM GPT-4o-latest with improved multi-step reasoning and creativity and is faster but still not perfect
- Abu Dhabi's TII new AI model, Falcon Mamba 7B using SSLM architecture for improved long context tasks
- Mermaid Diagramming and Charting Python library - Javascript 167)
- Eric Schmidt, ex-CEO of Google, suggests that Google AI may be doomed as Google is valuing work-life balance for employees rather than winning the race to AGI (start ups succeed as they work extremely hard) and that the battle for knowledge supremacy will be between US and China in our lifetime. Currently US has a 10 yr GPU chip tech advantage over China. “The rich will get richer and the poor will do the best they can.” 168)
- AI-powered hearing aid - x53 boost in speech understanding 171)
- Golden-Retriever: High-Fidelity Agentic RAG - question augmentation: identifies jargon, clarifies meaning; OCR to extract text, LLMs to summarize and identify context using Chain-of-Thought prompting; Jargon dictionary queried using SQL; 172)
- Gemini Live - talk with Gemini app on smartphone - Available to Gemini Advanced subscribers
- The Prompt Report - 76 pages of Prompting Techniques 173)
- Alibaba releases its new AI inference NPU chip Hanguang 800 which uses 3D stacking technology, 256 trillion calculations/sec, 200Gb/sec internal bus speed, 192Mb local memory, 10x faster image recognition 174)
- Cerebras WSE (Waffle-Scale Engine) - huge 8.5”x8.5” AI chip; CS-3 has 900K vs 15K cores in Nvidia H100 and has 21,000TB/sec vs 3TB/s memory bandwidth, 4 trillion transistors, 125 petaflops of compute, 44Gb on-chip memory, 5nm TSMC process and can run LLM inference 22x faster than Hyperscale H100 cloud at 1/5th the cost - see https://inference.cerebras.ai
- pgvectorscale - an extension for PostreSQL database written in Rust allows high performance embedding search
- Automating Thought of Search (ToS) - from Cornell University & IBM Research - “completely taking the human out of the loop of solving planning problems” 175)
- Anthropic Artifacts are now generally available
- Google: Speculative RAG: uses Speculative Decoding: a smaller LM generates draft texts, then a larger LM used to verify and select the best draft. 176)
- Google GameNGen simulates DOOM
- California's State Assembly has passed a groundbreaking AI safety bill, SB 1047. The legislation requires AI companies to implement safety measures before training advanced models, including shutdown capabilities and risk assessments.
- Nvidia “Eagle” - AI models that can process ultra-high-resolution images up to 1024×1024 pixels, tasks like visual question answering and document comprehension 177)
- Alibaba releases Qwen2-VL, SOTA vision model that can understand video up to 20 minutes and maintain flow of conversation in real-time178)
- RAPIDs cuDF - fast Python DataFrame library from Nvidia. Basically ports Numpy, Pandas, and Polars to run on Nvidia GPUs, also supports Apache Arrow columnar memory format and calculations on DataFrames are much faster. 179)
- FastHTML - python
- 1X NEO Beta AI humanoid robot designed for home use announced with a short demo 180)
- https://ideogram.ai/Ideogram 2.0 txt2img AI with much improved readable txt within images, customisable color palettes for consistency.
Sept 2024
- in only 3 months, Ilya Sutskever has raised $1 Billion for his startup SSI (Safe Superintelligence) despite no product, just 10 staff and his ideas
- Reflection Llama-3.1 70B is currently the world's top open-source LLM - trained with a new technique called Reflection-Tuning that teaches a LLM to detect mistakes in its reasoning and correct course
- Google “Ask Photos” - AI-powered search in Google Photos using Gemini AI
- Google DeepMind AlphaProteo - AI designs custom proteins to bind with specific molecular targets. Potential at blocking viruses, fighting cancer, treat other diseases .
- OpenAI aims to raise “several billion dollars” in a funding round that would value it at above $100 billion (was $86B in 2023).
- Claude for Enterprise LLM with 500K tokens context length
- Apple announces their iPhone 16 line up which now includes integration of “Apple Intelligence” - a new AI system designed to enhance user experiences
- the first human recipient of Elon Musk's Neuralink brain implant, is now using the device to learn French and Japanese, dedicating about three hours daily to his linguistic pursuits.
- at an Australian inquiry, Meta’s global privacy director, Melinda Claybaugh, was questioned about how Meta leverages user data, including training artificial intelligence (AI) models. After initially denying it, under pressure, she admitted that Meta scraped the public photos and text of every Australian adult Facebook or Instagram user dating back nearly 20 years. She was unable to say whether Meta scraped posts by users who were children at the time they signed up but aged up to adulthood at some point during the period content was scraped. In the European Union and the United States, Meta app users were alerted that the company would use their data to train generative AI products unless they opted out. Claybaugh admitted that Australians were not given this same option, citing a difference in the “regulatory landscape.” 181)
- Scientists at Weill Cornell Medicine in Qatar (WCM-Q) have created an intricate molecular map of the human body and its complex physiological processes based on the analysis of thousands of molecules measured in blood, urine and saliva samples from 391 volunteers. The data was integrated to create a powerful, interactive visual web-based tool called Connecting Omics (COmics) that can be used to investigate the complex molecular make-up of humans and discover underlying traits associated with various diseases.182)
- Heumos, L., et al. (2024) publish paper on their Ehrapy, an open source Python code to analyse complex, heterogenous health data - “can uncover new patterns and generate insights without needing to analyze the data based on a specific assumption or hypothesis”
- researchers are using AI and the connectome – a map of neurons and their connections created from brain tissue – to predict the role of neurons in the living brain - researchers created an AI simulation of the fruit fly visual system that can predict the activity of every neuron in the circuit. 183)
- OpenAI announces its GPT-o1 LLM “Strawberry” trained with reinforcement learning to perform complex reasoning and can perform long internal chain of thought before responding, ranks on 89th percentile on competitive programming questions, ranked in top 500 students in USA Math Olympiad and exceeds human PhD level in various science benchmarks - chemistry, biology and physics. A preview version has been released but limited to about 4 queries per day. Users will need to avoid system prompts of chain of thought etc as this is already built in.
- Elon Musk's “Collossus” AI datacentre funded - $6b financing in 122 days; 100K nVidia H100 GPUs on a single network; - similar size to Meta's planned data centre for Llama-4; Musk plans to double the Collosus in a few months to 200K GPUs including 50,000 nVidia H200's.
- 90% of Fortune 500 companies use SalesForce, now Salesforce Industries AI will start releasing in October 2024 with over 100 Out-of-the-Box AI Capabilities, ready to use;
- Pixtral 12B - first multimodal model by Mistral, built on top of mistral-nemo-12B
- You.com raise $50M - was founded by Richard Socher and Bryan McCann, both leading AI research scientists and both from SalesForce and Standford NLP research.
- Chai Discovery, an AI biology startup raises $30m, and released Chai-1 - a new multi-modal foundation model for molecular structure prediction which enables unified prediction of proteins, small molecules, DNA, RNA, covalent modifications, and more.
- LLaMA-Omni - speech-language model can process spoken instructions and generate both text and speech responses simultaneously with very low latency (~0.25 sec)
- Time Magazine excluded Elon Musk from its list of the 100 most influential people in AI, while featuring actress Scarlett Johansson on the cover.
- The European Court of Justice (ECJ) ordered Apple to Pay back $14.4 Billion in back taxes to Ireland
- Amazon Alexa (Echo) - now uses Anthropic Claude while Google Assistant (since 2016) - an app working on phones and other devices, is being replaced by Google Gemini
- Groq (fast inference chips) partnered with Saudi oil giant Aramco to build the world’s largest AI inferencing center in Saudi Arabia - to serve Middle East, Africa, and India. The data center will host 19K LPUs (Language Processing Units) by the end of 2024, with potential expansion to 200K units.
- Microsoft announce their GRadient-INformed (GRIN) 16×3.8b MoE 6.6b activated parameter LLM
- Constellation Energy to re-start Three Mile Island nuclear power plant in 2028 to power Microsoft's AI datacentre needs via a 20 year power purchase agreement
- Meta's new LLMs Llama 3.2 128K context, 8 languages, 1B, 3B, 11B, 90B models; Meta has 500million AI users, OpenAI has half that many; RayBan Meta smart glasses - live language translation, QR code reading etc.
- According to Reuters, OpenAI is planning to restructure its business into a for-profit benefit corporation, removing control from its non-profit board valuing the company at $150b and Sam Altman to receive 7%
- Google Deepmind publishes paper - “Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters” 184)
- Google DeepMind SCoRe (Self-Correction via Reinforcement Learning): A New AI Method Enhancing LLM's Accuracy in Complex Mathematical and Coding Tasks. It uses two-stage training process and reward shaping.185)
- Google DeepMind 2 new AI-based robot hand systems
- ALOHA Unleashed - use both hands cooperatively for complex tasks like tying shoelaces
- DemoStart - enhances control of multi-jointed, multi-fingered robot hands for precision work (insert a plug into a socket)
- NotebookLM - personalized AI research assistant powered by Google's Gemini 1.5 Pro
- PDF to Audio Converter - open-source Python library https://github.com/lamm-mit/PDF2Audio
- NVIDIA Llama 3.1-Nemotron-51B - fits on a single NVIDIA H100 GPU and is 2 to 4 times faster than original Llama-3.1-70B
- Anthropic's Contextual Retrieval
- Nanyang Technological University, Singapore (NTU Singapore) and the National Healthcare Group (NHG) are pioneering a new center to bridge the gap between innovative AI technologies and their practical applications in medicine
Oct 2024
-
- less compute and less memory footrint than transformer models
- Microsoft Paint is getting Photoshop-like generative AI fill and erase features
- AI bots now beat 100% of image CAPTCHAs
- Pythaogora ai can create apps with more than 10K lines of code
- Apple MM1.5 (Multimodal LLM) 186)
- AI tools discover 160,000 RNA virus species 187)
- Harvard introduces new Medical AI knowledge graph to better utilise structured codified medical knowledge to improve accuracy in responses and uses generation of triplets (head entity, relation type, tail entity eg. hypertension, causes, heart disease) and then reviews these to assess correctness by grounding in knowledge graph and then maps entities to UMLS code
- Tesla's “We, Robot” Robotaxi event - a $US30K fully autonomous vehicle designed for ride-hailing services with two doors, two seats, and no steering wheel or pedals
- 2-bit quantization for LLMs 188)
- BitNet: Scaling 1-bit Transformers for LLMs 189)
- Aria - from Rhymes.AI - First Open Multimodal Native MoE Model 193)
- Robotic Process Automation (RPA) - use bots/agents to automate repetitive tasks
- Meta's Movie Gen, a series of four systems that generate videos, include consistent characters, alter generated imagery, and add matching sound effects and music. 256 frames 1080p video;194)
- OpenAI MLE-bench - a new benchmark to assess how well AI agents can perform machine learning engineering tasks - a step towards self-improving AI.
- EuroLLM - capable of all official European Union languages 195)
- Hailuo AI Image-to-Video 720p 25fps
- AMD Instinct MI325X chip - new AI chip to compete with Nvidia’s upcoming Blackwell chips; CDNA 3 Architecture; 256GB of HBM3E memory, 6TB/s of bandwidth. This surpasses NVIDIA's H200 in memory capacity and bandwidth;
- NVLM 1.0 from Nvidia - open source, multi-modal LLMs 196)
- new neural network architecture, the normalized Transformer - “nGPT: Normalized Transformer With Representation Learning On The Hypersphere” - from Nvidia 197). nGPT learns much faster, reducing the number of training steps required to achieve the same accuracy by a factor of 4 to 20, depending on the sequence length.
- IBM new AI chips: “Telum II” and “Spyre Accelerator”
- IBM Granite 3.0 2b & 8b open-weight (Apache 2.0) models
- Stability AI - Stable Diffusion 3.5 (text-to-image)
- Perplexity - added Claude 3.5 Sonnet for Pro users
- Yi-Lightning (China) - A top-performing model surpassing GPT 4o, accessible via API https://www.01.ai
- Fourier Intelligence ( https://www.fftai.com - China) has revealed its GR2 humanoid robot.
- the improved 3D Diffusion Policy (IDP3) is a new method that enables robots to perform complex tasks in unpredictable environments without constant supervision.
- Kepler Robot ( https://www.gotokepler.com , China )
- Google DeepMind researcher Tim Rocktäschel says we now have all the ingredients to build open-ended self-improving AI systems that can enhance themselves by way of technological evolution. This could lead to AGI sooner than expected.
- Recraft txt2image AI generator
Nov 2024
- GitHub Copilot - now offers selection of models (including Claude 3.5, Sonnet, Gemini 1.5 Pro, and even o1 preview)
- Grok - finally has vision - understand images/memes
- Claude can now write and execute code (analysis tool) 198)
- Perplexity desktop app macOS is really good - Anthropic, OpenAI, Google Gemini also have desktop apps
- Cerebras inference x3 faster (Llama3-70b at 2,100 tok/sec)
- TSMC (Taiwan Semiconductor Manufacturing Company) in Arizona has outpaced production in Taiwan
- Waymo (self-driving cars) has raised $5.6 Bln
- Red Panda image generation model - beats all other models
- Project Jarvis - Google agent inside Chrome browser (in dev, using Gemini 2.0, large context window, can work across multiple web pages)
- Stable Diffusion 3.5 medium is released
- OneOne PhotoRaw 2025 released which now has AI generative features either locally or via Stable Diffusion subscription
- Genmo's Mochi 1 - new open-source video generation model takes second place on the Artificial Analysis video leaderboard
- Meta’s Llama-4 AI Models Are Training on a GPU Cluster bigger than 100,000 H100s
- Meta is developing their own search engine to add real time search result data to their LLM chatbot (and reduce dependency on Google and Bing)
- Meta's MovieGen AI - generate videos with sound
- OpenAI ChatGPT can now search the web in a much better way than before. You can get fast, timely answers with links to relevant web sources, which you would have previously needed to go to a search engine for - ie. similar to Perpexity.ai 199)
- xAI - in talks to raise funds at a $40 Bln valuation
- xAI - 100K GPU xAI Colossus Cluster
- a Polish radio station fired its human hosts and commences a channel with AI presenters and run almost entirely by AI but is shut down within a week following backlash against the concept
- Ameca “Cyber Date” - two humanoid robots communicating with each other
- Boston Dynamics Atlas robot works autonomously.
- Apple M4 chips announced
- Elon Musk invests over $US100m and manipulation of X to help Trump win back the US presidency and Tesla's share price rockets with the result
- Google Gemini Exp 1114 LLM surpasses o1-preview and Claude 3.5 sonnet and excels in various tasks, including mathematics, creative writing, and instruction following, demonstrating its proficiency in both language and visual AI
- DeepSeek R1-Lite - new reasoning open source model from China, it thinks longer about the prompts with a transparent, step-by-step reasoning process - and gives more concise results 200)
- Mistral Le Chat - free ChatGPT competitor - with web search, citations, image generation, and custom agents
- Pixtral Large - a 124 Bln parameter, open-weight, multimodal AI model for vision and language tasks. 128k context length. Great for math and image generation 201)
- Autogen evolves into AG2 - commercial product
- Figure Robotics' humanoid robots are now 400% faster
- Flux.1 Tools (Black Forest Lab) - new open-source image editing tools
- AI should not control nuclear weapons - agreement between US & China that human oversight is required
- nVidia's Blackwell AI chips overheating - potentially causing delays in data center deployments
- Perplexity introduces AI shopping, allowing users to research and purchase products directly within its platform
- Stanford University new AI agent can simulate the personalities of real people - are based on two-hour interviews with real people, and they are able to replicate the interviewees' attitudes and behaviors with 85% accuracy202)
- Suno V4 improves AI music generation
- Google's Gemini Advanced remembers user interests and preferences based on user context to give more relevant responses.
- GPT-4 Turbo - better creative writing - more natural, engaging, and tailored writing
- 11 Labs - AI conversational bots with customizable voices, tones, and response lengths
- Microsoft autonomous AI agents which can use over 1,800 AI models. No-code and low-code options. Examples: HR agent, a facilitator agent, a project management agent, and an interpreter agent.
- Azure AI Foundry - simplifies AI development. New hardware - a security microprocessor and a data processing unit (DPU).
- Extropic - startup
- Probabilistic Computing - using “probabilistic bits” (P-bits) that fluctuate between states, mimicking the uncertainty found in natural processes. This makes them well-suited for tasks like optimization, AI, and machine learning.
- Thermodynamic Computing - a type of probabilistic computing that harnesses noise and the tendency of systems to minimize energy to perform computations.
- P-bits vs. Qubits: P-bits are purely classical and operate at room temperature. P-bits can be built using various technologies, including superconducting Josephson junctions (JJs) and magnetic memory cells.
- Probabilistic computers can be useful in some areas like AI, machine learning, and simulations. For some tasks they may be 100 million times more energy efficient than GPUs.
- Developing probabilistic computers involves overcoming hardware and software challenges.203)
- Fujitsu developed a “digital annealing unit” (DAU), a custom CMOS chip (similar to FPGA) with an architecture designed for efficient solving large-scale optimization problems.
- new Test time training for LLMs from MIT has 6x improvement in accuracy compared to base fine-tuned models, particularly excelling in solving novel problems and complex reasoning tasks and has three (3) crucial components: initial fine-tuning on similar tasks, auxiliary task format and augmentations, and per instance training.204)
- Neo AI - Autonomous ML Engineer - a multi-agent system that automates the entire ML workflow, saving engineers thousands of hours of grunt work 205)
- François Chollet has left Google after ~10 years. He is known for creating Keras and for creating ARC benchmark for AI models. He is planning to start a new AI company.
- Gemma Scope - a tool developed by Google AI to provides insights into their Gemma 2 language models inner workings.
- OpenAI ChatGPT Mac OS desktop app now can read from various developer apps (VS Code, Xcode, TextEdit, Terminal, and Item 2).
- Qwen QwQ-32b is Alibaba's new open source LLM with 32 Bln parameters and designed for advanced reasoning and self-reflection and excels in math and coding
- Matrix is a new AI model from Alibaba China that generates infinitely long, high-resolution, interactive video streams in real-time. It uses a video diffusion transformer and a novel method called shift window denoising process model (SDPM)
- nVidia Sana-1.6B: GenAI Model is 20x smaller and 100X faster than Flux AI and can generate 4K images in seconds, even on laptop GPUs as it requires only 9Gb VRAM athough 12-16Gb is recommended
- HyMBA Model from nVidia - for smaller devices, 8K context length, efficient and specialized. It utilizes “meta tokens” to provide task- or domain-specific knowledge to the model
- Self-Evolving LLM - by a startup called Writer
Dec 2024
- OpenAI ChatGTP Pro released at $200/month - includes o1 pro mode, a version of o1 that uses more compute to think harder and provide even better answers to the hardest problems
- Agentic RAG - significant advancement; query decomposition; multiple source intelligence; dynamic query optimization; self-validating results
- Microsoft's open source LazyGraphRAG - Graph-Enabled RAG that needs no prior summarization of source data; uses NLP noun phrase extraction; reduces indexing costs by over 99.9% compared to full GraphRAG; combines best-first and breadth-first search dynamics in an iterative deepening manner;
- Google DeepMind AlphaProteo - designs protein binders that are 3 to 300 times stronger than those created with previous methods
- Amazon Trainium chips are available on AWS Cloud in Trn1n instances. Trainium-2 has 96GB per chip will be available on Trn2n instances.
- Microsoft AI head, Mustafa Suleyman, predicts AI models with “near infinite memory” by 2025; Google paper proposing “infinite context windows” - keeps summary of essential points allowing AI to remember much longer context, past conversations;
- Amazon Nova - a set of foundation LLM models
- Google DeepMind Genie 2 - “foundation world model” that can transform any image into a playable, interactive world. Users can control the world with keyboard actions (jump, fly, etc.)
- HunyuanVideo text-to-video open-source 13 Bln parameter model generates videos, high physical accuracy and scene consistency;
- Fei-Fei Li World Labs Release - transforms single images into interactive 3D environments
- Elevenlabs Conversational AI & “NotebookLM” Clone GenFM
- Elon Musk's Grok Chatbot app to be released soon
- nVidia's Fugato - AI tool that can generate or transform any mix of music, voices, and sounds using text or audio inputs
- Zoom is now “AI-first company” and introduced Zoom AI Companion 2.0 and Zoom Docs
- Google Gemini 2.0 is Google's latest most capable AI model
- OpenAI releases SORA with restricted content - cannot use real people, minors, or copyrighted material (NOT available in EU, UK, etc due to regulations)
- OpenAI Canvas and Projects tools in ChatGPT
- OpenAI 4o model now allows users to have natural conversation with a Santa
- AWS new foundation models on Bedrock, Nova (Micro, Light, Pro, Premier), Nova Canvas, Nova Real
- Copilot Arena for VS Code is an open source code AI coding assistant that provides paired autocomplete completions from different LLMs, which include state-of-the-art models like GPT-4o, Codestral, Llama-3.1 and more
- Trump names David Sacks (a South African-American entrepreneur, author, former PayPal COO) as a White House’s artificial intelligence and cryptocurrency policy chief
- Deepmind GraphCast - open source AI model for fast and accurate global weather forecasting trained on ERA5 dataset - 10 day forecasts and requires just two sets of data: the state of the weather 6 hours ago, and the current state of the weather and is more accurate than HRES and is now the most accurate 10-day global weather forecasting system in the world
- MinerU - tool to convert PDFs into machine-readable formats (e.g., markdown, JSON)
- ClearerVoice-Studio - open-source, AI-powered speech processing toolkit - speech enhancement, speech separation, target speaker extraction
- Deepmind Project Astra - allows AI to interact with the real world through your camera, providing information and assistance based on what it sees.
- Deepmind Project Mariner (still in research stage) - enables agents to control a web browser and do tasks online
- Devin AI coding assistant is available at $500/month
- Grok image generation - high-quality images of people (including celebrities) and objects 206)
- Google Android XR - platform for headsets and glasses create immersive augmented reality (AR) experiences; Uses Gemini AI model; headset can be used for both entertainment and productivity. YouTube, Google Photos, Google Maps, and even receive real-time assistance with tasks like cooking or home improvement
- Llama 3.3 70B LLM model; comparable to 405B Llama 3.1 ; 4bit quantized file is ~43-44GB
- AI two person podcast generating tools: Google Illuminate, NotebookLM, Elevenlabs GenFM
- Gemini 2.0 Experimental Advanced LLM released - potentially combines the strengths of both 1.5 Pro and 2.0 Flash Experimental models
- Gemini 2.0 Flash Thinking Experimental - Google’s first reasoning model, available free in Google AI Studio
- Google Project Astra - universal AI assistant
- Veo 2 - new Google DeepMind text-to-video model
- OpenAI o3 LLM - beats human in Arc-AGI test 207), beats humans in Math tests, mini should be released Jan 2025
- nVidia Jetson Orin Nano Super Developer Kit - $249 - AI Performance- 67 INT8 TOPS
- “Frontier AI systems have surpassed the self-replicating red line” according to https://github.com/WhitzardIndex/self-replication-research/blob/main/AI-self-replication-fudan.pdf
- After 35 steps of planning and action on its own, the AI manages to replicate a live copy. They even chat with each other.
- LLMs like Llama and Qwen, can successfully replicate themselves. Llama achieved a 50% success rate, while Qwen reached 90%.
- The researchers used a method called “agentic scaffolding,” which provides the AI with tools, a manual, and a thinking framework. This allows the AI to interact with a computer, run commands, access files, and manage processes. The AI also has a thinking model for reasoning, planning, and executing tasks.
- The paper outlines potential risks, including AI systems replicating to avoid shutdown and creating a chain of replicas, leading to an uncontrolled population of AI agents.
- Authors suggest eliminating materials related to LLMs or agent scaffolding from training data and developing behavior editing techniques to inhibit self-replication potential.
- Ilya Sutskever “Superintelligence is Self Aware, Unpredictable and Highly Agentic” 208)
- Best-of-N (BoN) Jailbreaking - a simple black-box algorithm that jailbreaks frontier AI systems (text, vision, audio); works by repeatedly sampling variations of a prompt with a combination of augmentations - such as random shuffling or capitalization for textual prompts, or even background noise and pitch for audio prompt - until a harmful response is elicited; achieves high attack success rates (ASRs) on closed-source language models, such as 89% on GPT-4o and 78% on Claude 3.5 Sonnet when sampling 10,000 augmented prompts;
- Microsoft Phi-4-14B beats GPT-4o on AMC tests
- Ventiva laptop cooling - moves air by moving ionized air molecules within an electric field between two grids.ICE = Ionic Cooling Engine - quiet, energy efficientincludes catalyst to convert ozone back to oxygen
- DeepThought-8B LLM - open source with reasoning
- DeepSeek-V2 LLM Chinese open source LLM 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens; 80GB*8 GPUs for BF16 inference; see https://github.com/deepseek-ai
- AI News Generator - a Python script using CrewAI, Google Search API, and Cohere's Command-R:7B model
- Google Project Aristotle - finds characteristics of the perfect team - the way team members interact is much more important than who is on the team; Two key behaviors that contribute to team success are equality in conversational turn-taking and ostentatious listening; When team members feel psychologically safe with each other, they are more likely to share their best ideas, work together effectively, and be innovative; This psychological safety is created when team members feel comfortable speaking up and listened to.
- Perplexity AI new features - Spaces - allows you to create custom GPTs and Claude projects; Instructions - automate tasks - create a PPC campaign;
- Devin with Slack Interface $500/mo
- a Slack-based AI coding agent that can create plans, write code, find bugs, correct code, and run tests. It can also respond to feedback and attempt to address it. Good, but not reliable, not able to resolve all bugs, can give wrong instructions
- Cursor $20/mo
- a more traditional AI coding agent that runs locally on the user's machine. Cursor's workflow is much easier to adopt and it is more reliable than Devon. Cursor was able to solve all of the bugs that he encountered 209)
- Meta's LCM = Large Concept Model
- operates on a higher level of abstraction, dealing with concepts instead of just words or characters
- uses an embedding space where sentences are represented as vectors
- uses a diffusion process that helps refine the embeddings and make the model more robust to noisy or incomplete data
- but has drawbacks - reliance on short sentences and the potential challenges in designing an optimal embedding space
- Lovable AI - software tool to create stunning websites and applications 20 times faster than traditional coding.210)
- Amazon Bedrock Prompt Router - dynamically routes prompts to the best-suited LLM for the task
- xAI aims for 1 Mln Nvidia GPUs in Memphis Datacenter “Colossus”
- Pika.art text & images into videos and art
- ModernBERT is available as a slot-in replacement for any BERT-like model, with both 139M param and 395M param sizes211)
- Higgsfield AI launched ReelMagic - creates complete 10-min videos from a short story idea; writes script, choose actors, films, adds sound and music, does the editing; uses different AI programs for each part of the process and works with several other companies to make this happen.
- RAG 2.0 212) built using n8n, Supabase, and Postgres DB; system monitors a Google Drive folder for new or updated files, automatically identifies the file type, extracts the text content, and stores it in a Supabase vector DB.
- Andrej Karpathy's AI generated 1.5hr video on what the Founding Fathers of America would think of America today, written by OpenAI's o1 Pro https://www.youtube.com/watch?v=1qTa9cJ7cjk
- Meta's Large Concept Models: Language Modeling in a Sentence Representation Space. Trained to perform autoregressive sentence prediction in an embedding space. 213)
- Chinese researchers may have worked out OpenAI's latest reasoning models - “Scaling of Search and Learning: A Roadmap to Reproduce o1 from the Perspective of Reinforcement Learning”214)
- OpenAI will transition its for-profit arm into a Delaware Public Benefit Corporation (PBC)
- Storm, a free tool developed by the Stanford team, outperforms both Perplexity Pro and Google Deep Research in coding related research. But Storm can't engage in conversations and answer follow-up questions, but Google Deep Research is the bestoverall solution 215)
- Google AgentSpace - early access
- Run:ai - an Israeli software company was recently acquired by NVIDIA for $700 Mln. Run:ai has developed a platform to help organizations manage and optimize their AI workloads, especially those that rely heavily on GPUs
- Cerebras Demonstrates Trillion Parameter Model Training on a Single CS-3 System (instead of thousands of GPUs).
- Alibaba's Qwen QVQ-72B-Preview
- QVQ excels at step-by-step reasoning through complex visual problems, particularly in mathematics and physics
- ByteDance 1.58-bit FLUX - dramatically reduces the computational demands of state-of-the-art image generation while maintaining output quality. Instead of 8 bit values it uses 3 values (-1,0,+1). Reducing storage by 8x. Requires 5x less computer memory while producing faster generation speeds.
- Sonus-1 - new model family from Rubik.ai
- Decart.ai , a California/Israeli startup, has released “Oasis” - a real-time generative AI open-world video game (Oct 31, 2024)
- Oasis can generate and render new content in real time, allowing users to explore and interact with AI-generated worlds with no predefined paths or goals, shaping the story and the world around them
- DeepSeek-V3: “A New Era in Open-Source AI”; training only costed $6m instead of >$100m for LLama and > $1000m for GPT-4;
- Deepseek Artifacts - free, open-source platform, and AI coder. Powered by DeepSeek V3. Generates apps in seconds!
- Qwen QwQ 32B Preview - Reasoning model from China
- B-STAR (Balanced Self-Taught Reasoning) AI Is Breaking All The Rules Of Self-improvement 216)
- Generative AI Companies Secure Record $56 Billion in 2024
2025 first half
Jan
- nVidia Drive Thor - for autonomous driving. x20 times the processing power of its predecessor, Orin.
- nVidia Project Digits - AI supercomputer powered by the Grace-Blackwell Superchip, built in collaboration with MediaTek
- nVidia Cosmos - suite of foundational models trained on 20 million hours of video to understand the physical world, including robotics and large language models
- nVidia's GeForce RTX 50 Series GPUs - Blackwell architecture eg. RTX5070; new laptop GPU RTX 5070 also offers performance as previous 4090, but consumes x3 less battery
- Microsoft is on track to invest approximately $80 Billion in 2025 to build out AI-enabled datacenters to train AI models and deploy AI and cloud-based applications around the world. More than half of this total investment will be in the US.
- OLMo 2 LLMs from The Allen Institute for AI
- PraisonAI - an AI Agents Framework with Self Reflection
- AI model find ways to circumvent rules
- Palisade Research demonstrated that o1-preview model was able to manipulate the game files to force a win against a powerful chess engine, even without any explicit instructions or “adversarial prompting” to do so.217)
- US Department of Homeland Security releases AI Guide for Government use 218)
- SambaNova - World's Fastest AI Inference
- Microsoft rStar-Math - Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
- MiniMax-01 - open source, 4M Context
- Sky-T1-32B-Preview - an open source reasoning model from UC BERKELEY
- China and Elon Musk are both suggesting they will be manufacturing 100,000's of AI robot soldiers by 2027, nVidia CEO says we are close to AGI humanoid robots
- LlamaV-o1 - Visual Reasoning
- Metagene.ai - Detect Bio Threats in wastewater samples
- Microsoft forms a new “Core AI Platform and Tools” division
- Microsoft open-sourced its Phi-4 14b LLM on Hugging Face under the MIT license
- Sony Research economical text-to-image model
- OpenAI released their economic blueprint (policy proposals) 219)
- OpenAI Re-enters Robotics Development
- ReaderLM-v2 - converts HTML to markdown or JSON
- Google Titans: Learning to Memorize at Test Time 220)
- NDEA - new research lab founded by François Chollet & Mike Knoop
- Luma AI Ray2 - video generative model
- 2025 is shaping up to be a pivotal year for AI agents
- Hewlett Packard Enterprise El Capitan Supercomputer based on the Cray EX Shasta architecture uses a combined 11,039,616 CPU and GPU cores consisting of 43,808 AMD 4th Gen EPYC 24C “Genoa” 24 core 1.8 GHz CPUs (1,051,392 cores) and 43,808 AMD Instinct MI300A GPUs (9,988,224 cores) with 12.8 terabits/second of bandwidth
- Donald Trump announces new company “Stargate” to build a massive AI infrastructure in the US financed by SoftBank, OpenAI and Oracle which are said to be planning to invest $US100 billion in total for the project to start, with plans to pour up to $US500 billion into Stargate in the coming years. Elon Musk retorted that they don't have the money. Musk also said he was in the Oval Office on Tuesday as Trump signed a pardon for Ross William Ulbricht, founder of the dark web marketplace SilkRoad. Musk had also dispatched a top staffer from his SpaceX and X companies to help ensure the release of convicted January 6 rioters after Trump signed a blanket pardon. Strange that Trump did not include Musk in Stargate although Musk is still going after OpenAI in the courts for abandoning their non-profit status. Trump had a busy 1st day in office - freezing new federal regulations and hiring, reversing Biden administration directives, mandating federal workers work in-office full-time and withdrawing from the Paris Climate Accords, pardoning nearly all of the 1,600-plus supporters arrested in the 6 January Capitol riot, temporarily suspending the TikTok ban and withdrawing the US from the World Health Organisation, reinterpreted a key constitutional amendment and instructed his administration to cease granting citizenship to US-born children of undocumented migrants, promised to end government-promoted diversity programmes and noted that US official policy would only recognise two genders, male and female, pledging to increase American wealth and expand “our territory” (presumably targetting the Panama Canal, Greenland and Canada).
- DeepSeek R1 opens source Chinese LLM AI developed at cost of “only $6m” crashes US tech stocks including nVidia as it purportedly created a LLM as good as OpenAi's latest LLMs but at much less cost and with cheaper, older nVidia GPUs - fine-tuned on top of DeepSeek V3 using RL (Reinforcement Learning) 221), but apparently used OpenAI's models to train it
- Dario Amodei, former employee at OpenAI and current CEO and founder of Anthropic, has written an essay detailing his thoughts on deep learning model innovation and GPU export controls to China and predicts that AI smarter than humans will likely be developed in 2026-2027 222)
- Alibaba (China) has announced similar model called “Max” model in their Qwen-2.5 family of models
- Moonshot AI Kimi k1.5 - another new open-source LLM from China, it is multimodal 128 K context, great for STEM, coding, and general reasoning. It outshines giants like OpenAI o1 and Qwen models like QVQ-72B/32B Preview on several parameters like Maths, Coding and Vision
- DeepSeek Janus Pro - New Image Model - txt2image and can answer questions about images
- Meta's $60-$65 Billion AI Infrastructure Investment for 2025
- OpenAI Canvas now works with OpenAI o1
- UI-TARS native GUI agent - open source - navigates and controls computers and mobile devices, it perceives the screen visually like humans do
- Berkeley researchers replicated DeepSeek's R1 core technology (reasoning) for only $30 - success relies heavily on reinforcement learning (RL), where the AI learns through trial and error in a simulated environment 223)
- Training large language models to reason in a continuous latent space – Meta's COCONUT Paper 224)
- Sky-T1-32B-Flash - A New Reasoning LLM
- OpenAI will diversify its cloud computing providers, moving away from Microsoft's Azure as its sole cloud provider
- report analyzes political bias in LLMs. Key findings: Most conversational AI systems display left-leaning preferences225)
Feb
- Aust. Gov bans DeepSeek on official devices
- the Chinese owner of TikTok, ByteDance unveils their new human portrait to video deep fake technology trained on 17000 hours of video (presumably users TikTok uploads) - OmniHuman-1
- Test-time compute (“giving an AI model extra time to think”) is the most important breakthrough since Transformers changed the world in 2017226)
- s1: Simple test-time scaling - the simplest approach to achieve test-time scaling and strong reasoning performance 227)
- DeepSeek R1 gave itself a x2 Speed Boost - Self-Evolving LLM - accomplished through a collaborative effort between a human prompter and the AI, where the prompter provided instructions and the AI iteratively improved the code based on feedback. 228)
- Sam Altman says OpenAI is ‘on the wrong side of history’ and needs a new open-source strategy after DeepSeek shock
- OpenAI rejects Musk's $97.4b offer to buy them out despite it being double the current valuation229)
- DeepScaleR - a small open source 1.5B model230) which beats OpenAI's o1 at math - developed at Berkeley using Deep Seek method, only cost $4,500 to train using 3.8K GPU hours (18.42x reduction compared to Deep Seek R1) - demonstrates how smaller models can achieve impressive results with high-quality training data distilled from larger models.
- DeepResearch 231) - open-source; improved performance is due mostly to letting agents write their actions in code instead of JSON
- Princeton University Researchers Introduce Self-MoA and Self-MoA-Seq: Optimizing LLM Performance with Single-Model Ensembles 232) - Self-MoA - aggregating various outputs from a single high-performing model
- Huawei (China) produces 80Gb Ascend 910C AI chips which delivers 60% of Nvidia's H100 inference performance but not efficient for training models
- Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach, New AI model architecture allows AI to “think” without generating any tokens.233)
- DeepSeek VL2 MoE - a family of open-source vision-language models
- Safe Superintelligence, a startup by Ilya Sutskever in talks to raise $20b (previously had raised $1b)
- Cerebras runs DeepSeek R1 70B at 1,600 tokens/second
- Groq (AI chips) secures $1.5 Bln commitment from Saudi Arabia
- Meta - Convert EEG/MEG Brain Activity Into Text (non-invasive magnetoencephalography (MEG) and electroencephalography (EEG) )234)
- OpenThinker-32B - open source model developed by the Open Thoughts consortium, beats DeepSeek in math
- Microsoft announces Majorana 1, its first quantum computing chip.
- anti-AI developers have created “tarpit” tools based on security anti-spam to lure AI web data scrapers into endless loops of fake data thereby slowing down the AI crawler or poisoning it and the training of the AI model - examples include Nepenthes, forces scrapers into a maze of gibberish, while another, Iocaine, aims to poison AI models outright235)
- OpenAI-backed 1X announces Neo Gamma AI humanoid robot
Mar
- Google Gemma-3 - open source, major improvements
- MAI - Microsoft AI - new family of models
- Tulu-3-405B - new open-source LLM
- Gemini 2.0 Flash can edit or restore images using your textual descriptions
- Microsoft Belief State Transformer
- Alibaba Babel - Open Multilingual LLM
- Alibaba START (Self-taught Reasoner with Tools) - a model fine-tuned from QwQ-32B, achieving strong performance in reasoning and tool use benchmarks
- Reflection AI - new startup for building superintelligent autonomous systems
- Claudette - a wrapper for Anthropic’s Python SDK
- Tencent’s Hunyuan-TurboS AI - first ultra-large Hybrid-Transformer-Mamba MoE model
- AI at McDonald’s restaurants, virtual manager, order accuracy, predictive maintenance, etc
- xAI, Elon Musk’s AI company, has acquired a 1 Mln square foot property in Southwest Memphis ( 5400 Tulane Road, Whitehaven, Memphis ) for $80 Mln to expand its AI data center footprint with plans for 1 Mln GPUs datacenter
- Safe Superintelligence (SSI) - Ilya Sutskever's startup received additional $2 Bln at $30 Bln valuation
- Manus AI, the autonomous AI agent developed by Chinese startup Butterfly Effect
- Baidu's ERNIE X1 & ERNIE 4.5 beats o3, R1 & Sonet 3.7 and is 100x cheaper
- PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC 236)
- Anthropic Claude can now search the web
- Chitu - open-source LLM inference framework from China - 300% faster while using 50% less GPU
- OpenAI o1-pro - expensive, twice the price of GPT 4.5 and 10x the price of o1
- OpenAI AI Audio models - Audio-to-Audio directly; Faster, Understands intonations, accents, emotion; 70% cheaper than Elevenlabs
- Google buys Wiz for $32 Billion - Wiz makes security tools to protect information in remote data centers
- alphaXiv - Discover, Discuss, and Read arXiv papers
- Boston Dynamics Atlas Robot can now do gymnastics
- Google Gemini 2.5 Pro (experimental) just released
- DeepSeek V3-0324 is now the highest scoring non-reasoning model
- AI Training Using Quantum Computing. Quantum Weighted Tensor Hybrid Network (QWTHN), which reparameterizes pre-trained layers into quantum tensor hybrid architectures.237)
- s1 - Simple test-time scaling
- Tencent’s Hunyuan T1 reasoning model - matching or outperforming OpenAI’s o1 and DeepSeek’s R1
- OpenAI raises $40 Bln at a $300 Bln valuation
- Perplexity is raising funds at $18 Bln valuation
- Ollama Deep Research
- Nvidia Nemotron-H - family of accurate, efficient hybrid Mamba-Transformer models
- Microsoft's KBLaM - inject facts into LLM; KBLaM = Knowledge Base Augmented Language Model
- Microsoft has introduced new features for its Microsoft 365 Copilot, including two advanced “deep reasoning” agents—Researcher and Analyst—as well as enhanced custom AI agent capabilities.
- Ideogram 3.0 text-2-image
- Chat2DB open source AI-powered DB management tool and SQL client - SQL query generator, etc 238)
- Anthropic's “Think Tool” is a new feature introduced for Claude AI that enhances its ability to handle complex, multi-step tasks by providing a dedicated space for structured thinking.
- “humanoid robots will be in many homes by 2029 at a cost of $US300/mth”
- iPhone app with magnifying lens diagnoses skin cancer with 99.9% accuracy
- AI model which can detect endometrial cancer in histologic samples with 99.26 percent accuracy
- AlexNet source code released - written by University of Toronto graduate student Alex Krizhevsky in 2012 (and sold to Google)
- Yann LeCun, AI Godfather, believes that we wont be getting AGI within 2 years, and not with just scaling up LLMs, it will just be a system with massive memory and retrieval ability but not one that can invent new solutions to problems
- Chain-of-Draft (CoD) Prompting gives 4x faster reponses with some loss of accuracy by simply asking the model to think step-by-step and to limit each reasoning step to five words at most.
- some agentic LLMs are over-thinking and using too many tokens to solve questions (“over-sinking”), potentially resulting in (1) Analysis paralysis: Extensive planning with minimal concrete actions; (2) Rouge actions: Taking actions without waiting for feedback; (3) Premature disengagement: Terminating tasks based on internal assessment
- Chain-of-Tools (CoTools) is a new method that keeps the foundation LLM completely frozen while adding specialized modules for tool integration. No additional training is required.
- nVidia launches the nVidia Accelerated Quantum Research Center (NVAQC) in Boston
- OpenAI GPT-4o update surpasses GPT-4.5 and now has very good text to image generation
- Anthropic has developed a new method for peering inside large language models (LLMs) like Claude
- Google TxGemma - open models for Drug Discovery, predicting whether a molecule can cross the blood-brain barrier, estimating drug binding affinity, nferring reactants from chemical reactions.
- Isomorphic Labs - Alphabet’s AI drug discovery platform raises $600M from Thrive
- Mistral: Ministral 8b - model for Edge & private computing
- Runway's Gen-4 AI Video Generator
- OpenHands LM - 32B model for software Engineering
- Mixture-of-Mamba - a new State Space Model (SSM)
- Nova Act - Amazon's new AI model for web browser automation
- xTrimoPGLM: unified 100B pretrained transformer for deciphering proteins
- Alibaba open source QwQ-32B reasoning model
- Alibaba open source QVQ-Max - 32B visual reasoning model
- Alibaba open source Qwen2.5-VL-32B-Instruct (March 24) - multimodal model
- Musk merges X into xAI so that his Grok AI LLM can have better access to X's data as well as providing him with improved leadership; Valuations: xAI at $80B and X at $33B = $113B Total; X.ai and X.com bring together models, compute, talent, 600M+ users, massive amounts of data and a live testing ground for AI applications;
Apr
- Meta Llama 4 LLM
- knowledge cut-off August 2024; on-the-fly INT4 quantization; MoE layers use INT4 for dormant experts while keeping active experts in FP16.
- Scout version: MoE (17B active, 16 experts, 109B total), fits in single NVIDIA H200 GPU (int8) or H100 (int4)10M context window;
- Maverick version: MoE (17B active, 128 experts, 400B total), multimodal, beating GPT-4o and Gemini 2.0 Flash, comparable with DeepSeek v3 on reasoning and coding
- preview of Behemoth version - MoE (288B active, 16 experts, 2T total) - teacher model for distillation, outperforms GPT-4.5, Claude Sonnet 3.7,
- Nvidia Llama 3.1 Nemotron Ultra 253B
- derived from Meta Llama-3.1-405B-Instruct; Runs on an 8x H100 GPU setup; 128K tokens context length
- Ironwood - Google new 7th version TPU - 4.6K Teraflops. Scales to 9,216 chips per pod (42.5 exaflops total). Twice as efficient per watt as their previous Trillion TPU.
- Microsoft upgrades Copilot with memory, actions, and real-time vision. It can remember conversations and personal details, creating individual profiles that learn preferences, routines, and important info. It can also browse the web and perform actions for you.
- GitHub announces public MCP server with enhanced UX and full Anthropic compatibility
- Fraud involving deepfakes has surged by 3,000% recently, with projected losses reaching a staggering $40 billion by 2027.
- Light-based chip offers 50x speed, 30x efficiency over silicon 239)
- NVIDIA AgentIQ - Python library to unify agentic workflows
- DeepSeek-GRM models - Inference-Time Scaling for Generalist Reward Modeling240)
- Open Deep Search - search + Reasoning241)
- Chinese universities have surged ahead of U.S. institutions in AI research output
- IBM z17 Mainframe for AI - up to 26 IBM Telum II 8 core processors
May
- 1st brain-like computer that runs without electricity which is a photonic–biological hybrid uses optogenetic switches and nanoscale protein circuits to fire logic signals through light pulses and instead of processing ones and zeroes, this machine uses graded signal strengths like natural synapses
- Tokyo scientists develop new power cell with no degradation over 50,000 charge cycles and uses a novel graphene-aluminum composite that allows instantaneous ion transfer without overheating or degrading over time which can go from 0% to full charge in just 3 seconds whilst remaining cool - robots will need far better batteries than current tech.
- Trump tries to ban LLMs from being woke.
June
July
- Sapient Intelligence announces the open-source release of its Hierarchical Reasoning Model (HRM)242), a brain-inspired architecture that leverages hierarchical structure and multi-timescale processing to achieve substantial computational depth without sacrificing training stability or efficiency. Trained on just 1000 examples without pre-training, with only 27 million parameters, HRM successfully tackles reasoning challenges that continue to frustrate today's large language models (LLMs). Performs very well at specific reasoning such as solving complex Sudoku puzzles and optimal pathfinding in 30×30 mazes which current CoT LLMs are unable to perform, and will be able to be trained on a laptop with minimal data for specific tasks but it is not a LLM so will not be able to chat or write poetry, etc. It is based upon system 1 and system 2 thinking described in Daniel Kahneman's book “Thinking, Fast and Slow.” System 1 is fast, intuitive, and emotional, while System 2 is slow, deliberate, and logical.
Aug
- Elon Musk's xAI’s new Grok Imagine AI image and video generator with minimal guard rails now generates NSFW content despite major concerns of deep fake NSFW imagery
it/ai_history.txt · Last modified: 2025/08/10 01:42 by gary1