UniFL:

Improve Stable Diffusion via Unified Feedback Learning

ByteDance
  *Equal Contribution   Project Lead

Text-to-Image Examples

SDXL + UniFL 4-Step

ream a dog holding a cigar and glass with whiskey + dim volumetric lighting, 8k octane beautifully ...

SDXL + UniFL 4-Step

Portrait of a clown wearing a medieval royal robe and an ornate crown on a colorful close ...

SDXL + UniFL 4-Step

Burger gone wild, mega detailed, volumetric lighting, beautiful, golden hour, sharp focus, ultra detail ...

SDXL + UniFL 4-Step

3 d render abundant flowering hosta plants made from ideas, 3 d geometric neon shapes ...

SDXL + UniFL 4-Step

3 d sci - fi cgartist rendering of a hyper realistic marble greek statuary bust floating in space ...

SDXL + UniFL 4-Step

4k Portrait of Friedrich Nietzsche in his Suit with a chiseled Jawline and serious Look ...

SDXL + UniFL 4-Step

A 3 d render of peewee herman on max headroom, octane render, unreal engine, hyperrealism, 8 k ...

SDXL + UniFL 4-Step

A beautiful classy partying couple, dimly lit upscale 1920s speakeasy, relaxed pose, art deco, detail ...

SDXL + UniFL 4-Step

A chubby tabby cat surfing on a rainbow in outer space, diffuse lighting, fantasy, intricate, surreal ...

SDXL + UniFL 4-Step

A closeup photorealistic photograph of a rabbit chef. film still, vibrant colors. this 4 k hd image ...

SDXL + UniFL 4-Step

A contemporary painting of a little boy sits in his bed and looks through the window into the night ...

SDXL + UniFL 4-Step

A cow in a luxury chair reading the newspaper, digital art, landscape, fantasy art, octane render ...

SDXL + UniFL 4-Step

A creepy slimy pink ocelot that looks like an investigator, wearing a long beige trench coat, CGI ...

SDXL + UniFL 20-Step

A goldfish swimming inside a whiskey glass, close up view, dramatic lighting, DOF, caustics, soft ...

SDXL + UniFL 20-Step

Beautiful ceramics studio photograph of a tall geometric symmetrical stoneware vase glazed by Jackson ...

SDXL + UniFL 20-Step

A full body portrait of a beautiful french milkmaid in silk underwear, recumbent in the arboreal sun ...

SDXL + UniFL 20-Step

A grinning girl, with plant patterns, her face looks like an orchid, she is the center of the garden ...

SDXL + UniFL 20-Step

A high quality illustration of a goth-clown hybrid with red hair, trending on artstation, hd ...

SDXL + UniFL 20-Step

Detailed soft painting of a scarecrow and his wife dancing. lino print elegant highly detailed artst

SDXL + UniFL 20-Step

A portrait of japanese crane walking into a forest of japanese pines, by range murata, a big red sun ...

Inference SDXL + UniFL 20-Step

Inference SDXL + UniFL 20-Step

Inference SDXL + UniFL 20-Step

Inference SDXL + UniFL 20-Step

Inference SDXL + UniFL 20-Step

Inference SDXL + UniFL 20-Step

Inference SDXL + UniFL 20-Step

Inference SDXL + UniFL 20-Step

Text-to-Video Examples

AnimateDiff + LCM 12-Step AnimateDiff + UniFL 12-Step AnimateDiff + LCM 12-Step AnimateDiff + UniFL 12-Step
"a drop of black ink falling in a superwhite background in slow motions angle advertising." "a drop of black ink falling in a superwhite background in slow motions angle advertising." "3d animated female character, singing and moving her arms, musical film style , 4k, high quality vid ..." "3d animated female character, singing and moving her arms, musical film style , 4k, high quality vid ..."
":lipsticks advertisement -camera pan up right -s 7." ":lipsticks advertisement -camera pan up right -s 7." "modern swimming pool villa design by arquitetura, in winter, snow, in the style of 32k uhd, white and blue ... " "modern swimming pool villa design by arquitetura, in winter, snow, in the style of 32k uhd, white and blue ..."
"A cute talking panda stands in the museum, furry,Chinese New Year style red clothes, cinematic, 4k, epic Steven Spielberg movie still ..." "A cute talking panda stands in the museum, furry,Chinese New Year style red clothes, cinematic, 4k, epic Steven Spielberg movie still ..." "a uniquely captivating Ferrari Portofino advertisement, emphasizing its elegance and speed, night, ultra-high definition." "a uniquely captivating Ferrari Portofino advertisement, emphasizing its elegance and speed, night, ultra-high definition."

Abstract

Diffusion models have revolutionized the field of image generation, leading to the proliferation of high-quality models and diverse downstream applications. However, despite these significant advancements, the current competitive solutions still suffer from several limitations, including inferior visual quality, a lack of aesthetic appeal, and inefficient inference, without a comprehensive solution in sight. To address these challenges, we present UniFL, a unified framework that leverages feedback learning to enhance diffusion models comprehensively. UniFL stands out as a universal, effective, and generalizable solution applicable to various diffusion models, such as SD1.5 and SDXL. Notably, UniFL incorporates three key components: perceptual feedback learning, which enhances visual quality; decoupled feedback learning, which improves aesthetic appeal; and adversarial feedback learning, which optimizes inference speed. In-depth experiments and extensive user studies validate the superior performance of our proposed method in enhancing both the quality of generated models and their acceleration. For instance, UniFL surpasses ImageReward by 17% user preference in terms of generation quality and outperforms LCM and SDXL Turbo by 57% and 20% in 4-step inference. Moreover, we have verified the efficacy of our approach in downstream tasks, including Lora, ControlNet, and AnimateDiff.

Pipeline


Pipeline: UniFL aims to elevate the visual generation quality, enhance preference aesthetics, and accelerate the inference process in a unified perspective of feedback learning. The Pipeline of UniFL is depicted as follows. UniFL contains three key components: Perceptual Feedback Learning: Exploit the generation prior embedded in the existing image perceptual model (e.g. Instance segmentation model). Decoupled Feedback Learning: Achieve efficient human preference alignment with multiple fine-grained human preference reward models. Adversarial Feedback Learning: Adversarially train the diffusion model and reward model, improving the generation quality for the sample of fewer inference steps. The model is trained via two-stage opitmization,

Perceptual Feedback Learning

The existing perceptual model can serve as an excellent visual generation quality feedback provider for the diffusion model. Take the instance segmentation model as an example, it accurately captures the defect of the generation(e.g. the distort arm of the boy)

Perceptual Feedback Learning of UniFL significantly enhance the visual generation quality of diffusion model, including style response, structure optimization.

Adversarial Feedback Learning

Adversarial Feedback Learning of UniFL achieves superior acceleration performance compared with the existing commonly used acceleration methods, including LCM, SDXL-Turbo, etc.

Comparsions

UniFL exhibits a remarkable advantage over existing methods that concentrate on quality optimization and inference acceleration in terms of both quantitative comparison and user study.

BibTeX

@misc{zhang2024unifl,
      title={UniFL: Improve Stable Diffusion via Unified Feedback Learning}, 
      author={Jiacheng Zhang and Jie Wu and Yuxi Ren and Xin Xia and Huafeng Kuang and Pan Xie and Jiashi Li and Xuefeng Xiao and Weilin Huang and Min Zheng and Lean Fu and Guanbin Li},
      year={2024},
      eprint={2404.05595},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}