Beyond DALL-E: Advanced Image Generation Workflows with ComfyUI

René Fa

Friday 10:15 in Hassium

Image generation using AI has made huge progress over the last years, and many people still think that DALL-E with a text prompt is the best way to generate images. But thanks to Stable Diffusion, Flux, and many supplementary models like ControlNet or an Image Prompt Model, we have much more control over the images we want to create. There are frontends for that, like A1111 or Invoke AI, but if you want to try bleeding-edge models or do something more complex, you will have a hard time implementing such a pipeline in code yourself, and it requires a steep learning curve. In this talk, I want to show you ComfyUI, an open-source node-based GUI written in Python where you can build workflows as a DAG. Thanks to many other contributors, there are a lot of plugins available which bring in new functionality. This talk shows the capabilities and power of this tool using practical examples and how you can combine many things together to create a complex workflow much faster than coding it yourself.

I want to cover the following topics:

  • What are the limits of a simple text-to-image workflow?
  • What is ComfyUI?
  • What are the requirements to use ComfyUI? (Resources, OS, etc.)
  • What can you do with ComfyUI that you can't do with a simple text-to-image interface?
    • Pre- and post-processing of images in a single workflow
    • Advanced conditioning using images, bounding boxes, depth maps, etc., all together
  • The examples shown as a demonstration:
    • Integrating existing objects from a photo into a generated scenery
    • Creating optical illusions and surreal images

René Fa

Just another Python nerd with a freshly gained enthusiasm for image gen AI. I'm working as a Data Engineer for nearly three years with focus on computer vision topics.