A Quick Comparison of Text-to-Image Models: Flux, Stable Diffusion 3, DALL·E 3, and Kling
Last week, a new state-of-the-art text-to-image model called Flux was released by Black Forest Labs (the original creators of Stable Diffusion), which is open-sourced and offers capabilities comparable to Midjourney. Curious about its quality compared to other models, I conducted a quick one-shot generation test for the following models (prices are estimated based on official pricing websites and replicate.com):
I used the following prompt for general image with an artist style:
a surreal landscape with floating islands and a giant glowing moon in the style of Hayao Miyazaki
and another prompt to test the text generation:
gateau cake spelling out the words “Takin.AI”, tasty, food photography, dynamic shot
The testing results are listed below.
- For the first prompt, I prefer the Flux Schnell and Kling results, which are also the most affordable models.
- For the second prompt, I like the results from Flux Schnell and Dalle3 the most.
You can use text2image models such as Flux, SD3, Dalle3, and ControlNets with one single account from Takin.ai — start with a free account to try the examples in this post.
Flux Schnell (fastest — only took 1.3 second):
Flux Pro (took about 8.1 second):
Dalle 3:
SD 3:
Kling:
PS. The featured image for this post is generated using HiddenArt tool from Takin.ai.
Originally published at https://harrywang.me on August 9, 2024.