A Quick Comparison of Text-to-Image Models: Flux, Stable Diffusion 3, DALL·E 3, and Kling

3 min readAug 9, 2024

--

Last week, a new state-of-the-art text-to-image model called Flux was released by Black Forest Labs (the original creators of Stable Diffusion), which is open-sourced and offers capabilities comparable to Midjourney. Curious about its quality compared to other models, I conducted a quick one-shot generation test for the following models (prices are estimated based on official pricing websites and replicate.com):

I used the following prompt for general image with an artist style:

a surreal landscape with floating islands and a giant glowing moon in the style of Hayao Miyazaki

and another prompt to test the text generation:

gateau cake spelling out the words “Takin.AI”, tasty, food photography, dynamic shot

The testing results are listed below.

For the first prompt, I prefer the Flux Schnell and Kling results, which are also the most affordable models.
For the second prompt, I like the results from Flux Schnell and Dalle3 the most.

You can use text2image models such as Flux, SD3, Dalle3, and ControlNets with one single account from Takin.ai — start with a free account to try the examples in this post.

Flux Schnell (fastest — only took 1.3 second):

Flux Pro (took about 8.1 second):

Dalle 3:

SD 3:

Kling:

PS. The featured image for this post is generated using HiddenArt tool from Takin.ai.

Originally published at https://harrywang.me on August 9, 2024.

Stable Diffusion

Written by Harry Wang

No responses yet

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams