Exploring the Latest Advancements in AI Research

Our community of open source research hubs has over 200,000 members building the future of AI. We are working globally with our partners, industry leaders, and experts to develop cutting-edge open AI models for Image, Language, Audio, Video, 3D, Biology and more.

MARBLE: Material Recomposition and Blending in CLIP-Space
Savannah Martin Savannah Martin

MARBLE: Material Recomposition and Blending in CLIP-Space

Editing materials of objects in images based on exemplar images is an active area of research in computer vision and graphics. We propose MARBLE, a method for performing material blending and recomposing fine-grained material properties by finding material embeddings in CLIP-space and using that to control pre-trained text-to-image models.

Read More
FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image
Savannah Martin Savannah Martin

FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image

We present a novel framework for generating high-quality, animatable 4D avatar from a single image. While recent advances have shown promising results in 4D avatar creation, existing methods either require extensive multiview data or struggle with shape accuracy and identity consistency.

Read More
SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation
Savannah Martin Savannah Martin

SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation

We present Stable Video 4D 2.0 (SV4D 2.0), a multi-view video diffusion model for dynamic 3D asset generation. Compared to its predecessor SV4D, SV4D 2.0 is more robust to occlusions and large motion, generalizes better to real-world videos, and produces higher-quality outputs in terms of detail sharpness and spatio-temporal consistency.

Read More
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
Savannah Martin Savannah Martin

Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Diffusion models are the main driver of progress in image and video synthesis, but suffer from slow inference speed. Distillation methods, like the recently introduced adversarial diffusion distillation (ADD) aim to shift the model from many-shot to single-step inference, albeit at the cost of expensive and difficult optimization due to its reliance on a fixed pretrained DINOv2 discriminator.

Read More
Stable Audio Open
Savannah Martin Savannah Martin

Stable Audio Open

Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics.

Read More
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers
Research Kesh Bhamidipaty Research Kesh Bhamidipaty

Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

Explore the latest research in image generation with the Hourglass Diffusion Transformer (HDiT). This paper presents a new approach in high-resolution image synthesis, setting itself apart by handling large-scale images more efficiently than traditional methods. It's an insightful read for those interested in the technical advancements of image generation, offering a deep dive into the complexities and innovations in this field.

Read More
Adversarial Diffusion Distillation
Research Joshua Lopez Research Joshua Lopez

Adversarial Diffusion Distillation

We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1–4 steps while maintaining high image quality.

Read More