Meta launches AI model Motivo for humanoid agents; mind reasoning program for machine learning
Meta Platforms (NASDAQ:META) said on Thursday that it is launching an AI model called Meta Motivo, which can control the behavior of a human-like digital agent to perform complex tasks, with the potential to enhance Metaverse experience.
The social media giant also launched Meta Video Seal, an open-source model for video watermarking. The model builds on the company’s Meta Audio Seal work, which it shared last year.
Meta released these and more of its latest research, code, models, and datasets from its Meta Fundamental AI Research, or FAIR, unit.
Meta Motivo: The company said Meta Motivo is trained with a novel algorithm that uses an unlabeled dataset of motions to ground unsupervised reinforcement learning towards learning human-like behaviors. The AI model can solve a range of whole-body control tasks, including motion tracking and goal pose reaching, without any additional training or planning.
Meta believes that in the future, this research could pave the way for fully embodied agents in the Metaverse, leading to more lifelike non-player characters, or NPCs, democratization of character animation, and new types of immersive experiences.
Meta Video Seal: The company said Video Seal adds a watermark (with an optional hidden message) into videos that is undetectable to the naked eye and can later be uncovered to determine a video’s origin. The watermark has shown to be resilient against common video editing moves like blurring or cropping, and compression algorithms.
Meta noted that it is publicly releasing the Video Seal model under a permissive license, with a research paper, training code, and inference code.
Large Concept Models: Meta also unveiled what it called a “fundamentally different training paradigm for language modeling.” The company noted that the core idea of the Large Concept Model, or LCM, is to decouple reasoning from language representation, and was inspired by how humans can plan high-level thoughts to communicate.
As an example, the company said that when giving a presentation several times, a presenter always has the same ideas they want to convey (materialized by their slides projected on screen), but their exact choice of words may vary from one time to the other.
“Guided by that principle, the LCM is a significant departure from a typical LLM. Rather than predicting the next token, the LCM is trained to predict the next concept or high-level idea, represented by a full sentence in a multimodal and multilingual embedding space,” said Meta.
The company added that the LCM outperforms or matches recent LLMs in the pure generative task of summarization, and is more computationally efficient as input context grows, among other things.
Meta Explore Theory-of-Mind: To achieve advanced machine intelligence, the company has introduced Meta Explore Theory-of-Mind. It is a program-guided adversarial data generation for theory of mind reasoning. It enables the generation of diverse, challenging, and scalable Theory-of-Mind reasoning data for training and evaluation, according to the company.
Adversarial data are inputs to machine learning models, which a programmer has purposely designed to cause the model to make a mistake.
The company noted that Explore Theory-of-Mind generates reliable stories that push the limits of large language models, or LLMs, making it ideal for evaluating frontier models or fine-tuning data.
Meta CLIP 1.2: The company also released CLIP 1.2 a high performance vision-language encoder. The company said it has been working on advanced algorithms to curate vast amounts of image-text data, unlocking the learning of human knowledge about the world. This enables its models to learn efficiently and accurately, capturing the nuances of fine-grained mapping between image and language semantics.