News 11 June 2023 - simon-oz/Weekly-AI-news GitHub Wiki

OpenAI see traffic soar to Billion mark, achieved a total 847 million user access in March 2023.
Video-LLaMA, a multi-modal framework that empowers Large Language Models (LLMs) with the capability of understanding both visual and auditory content in the video. Video-LLaMA showcases the ability to perceive and comprehend video content, generating meaningful responses that are grounded in the visual and auditory information presented in the videos.
InstructZero, is an efficient instruction optimization method for black-box large language models by optimizing a low-dimensional soft prompt applied to an open-source LLM to generate the instruction for the black-box LLM.
Git-Theta is a Git extension that aims to provide similar functionality for machine learning model checkpoints by efficiently and meaningfully track a model's version history natively through Git. Link to the project
A github project named roop is recently released. It allows anyone to take a video and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training.
Facebook published a paper introducing MusicGen, which can generate high-quality samples, while being conditioned on textual description or melodic features, allowing better controls over the generated output.
SpQR- Sparse-Quantized Representation, is a new compressed format and quantization technique which enables for the first time near-lossless compression of LLMs across model scales, while reaching similar compression levels to previous methods. Require GPU VRAM > 32GB.
MAN - Matting Anything Model, can estimate the alpha matte of any target instance with user prompts as boxes, points, or text descriptions for interactive use by incorporating SAM. It further reaches comparable performance to the specialized matting models on multiple benchmarks, and shows superior generalization ability with fewer parameters as a unified image matting model.
Video-ChatGPT, is a multimodal model that merges a video-adapted visual encoder with a LLM. The model is capable of understanding and generating human-like conversations about videos. Try it here.
DeepMind publish a paper in Nature: Faster sorting algorithms discovered using deep reinforcement learning. Researchers trained a new deep reinforcement learning agent, AlphaDev, to formulate a task of finding a better sorting routine at assembly-language level as a single-player game.
Magic, an AI startup company, announced LTM-1, a prototype of a neural network architecture designed for giant context windows, can handle prompt with 5,000,000 tokens, much larger than GPT-4's 32k tokens.
Huggingface released StarCode+ - is a fine-tuned version of StarCoderBase on 600B tokens from the English web dataset RedefinedWeb combined with StarCoderData from The Stack (v1.2) and a Wikipedia dataset. It's trained on 512 Tesla A100 GPUs for 14 days.
RedPajama released SlimPajama - the largest extensively deduplicated, multi-corpora, open-source dataset for training large language models. Github link