News 2nd July 2023 - simon-oz/Weekly-AI-news GitHub Wiki

23rd Jun, A16Z’s Shoham interviewed CEOs from Anthropic, Cohere, Charater.AI, they identified four key innovations: Steering, memory, “arms and legs”, and multimodality; and how these key innovations will evolve over the next 6 to 12 months.

26 Jun, Wired reports that DeepMind’s CEO says its next AI project Gemini, is still under development within several months, will be more capable than OpenAI’s ChatGPT, including such as planning or the ability to solve problems. AlphaGo-type techniques will be introduced in Gemini. The project could cost hundreds of millions of dollars, while GPT-4 cost more than $100 million, according to Altman.

26 Jun, VentureBeat, Databricks is acquiring MosaicML for a jaw-dropping $1.3 billion. The news was also confirmed by both Databricks and MosaicLM, an AI start-up established only one and a half years. MosaicLM believes that every organization should be able to benefit from the AI revolution with more control over how their data is used. MosaicLM has its own open-source MPT serious LLMs.

27 Jun, Microsoft published a paper: “KOSMOS-2: Grounding Multimodal Large Language Models to the World”. KOSMOS-2 is a multimodal LLM, enabling new capabilities of perceiving object descriptions and grounding text to the visual world. Researchers also created a large-scale dataset of grounded image-text pairs to train the model. The research lays out the foundation for the development of Embodiment AI.

27 Jun, Nvidia, H100 GPUs set new records on all eight tests in the latest MLPerf training benchmarks released today, excelling on a new MLPerf test for generative AI. On a commercially available cluster of 3,584 H100 GPUs co-developed by startup Inflection AI and operated by CoreWeave, a cloud service provider specializing in GPU-accelerated workloads, the system completed the massive GPT-3-based training benchmark in less than eleven minutes.

28 Jun, Meta published a paper: “Extending Context Window of Large Language Models via Positional Interpolation”. Researchers present Position Interpolation (PI) that extends the context window sizes of RoPE-based pretrained LLMs such as LLaMA models to up to 32768 with minimal fine-tuning (within 1000 steps), while demonstrating strong empirical results on various tasks that require long context, including passkey retrieval, language modeling, and long document summarization from LLaMA 7B to 65B.

29 Jun, Oracle CEO said the company is spending "billions" of dollars on chips from Nvidia Corp as it expands a cloud computing service targeting a new wave of artificial intelligence (AI) companies, also including investment on CPUs.

29 Jun, Inflection.ai, Microsoft-backed start-up, has raised $1.3 billion new funding. "We'll be building a cluster of around 22,000 H100s. This is approximately three times more compute than what was used to train all of GPT4. Speed and scale are what's going to really enable us to build a differentiated product," Suleyman said at Collision Conference on Thursday.

2nd July, OpenChat, based on LLaMA-13B, ranked #1 open source LLM on AlpacaEval leaderboard. OpenChat is finetuned with 8xA100 80GB GPUs.