this post was submitted on 29 Jun 2023
7 points (100.0% liked)

StableDiffusion

3 readers
1 users here now

For discussions around Stable Diffusion, a text-to-image generative AI model. Share your generated pictures, discuss the various UI and extensions, share news about releases, bring tutorials and more!

founded 1 year ago
 

I thought I would share my experience this evening with the group here seeing that I still excited as hell for getting this hodgepodge to work at all.

I have been playing with the machineMl version of stable diffusion on Windows for a while now (we won't go into the reasons why, but I have the 6800XT, which is not well suited to this use case)

Automatic11111 on MachineML is dog slow, but I got some interesting results. So today I cleared out an old SSD, wired it up and installed a clean ubuntu. Following this guide I managed to get rocM running and the difference is like chalk and cheese. Or rather Impulse and Warp drive. Totally crazy!

So for you AMD Radeon users out there. There is hope. It is in Linux, but it is there.

top 15 comments
sorted by: hot top controversial new old
[–] ToKrCZ@kbin.social 2 points 1 year ago (1 children)

Yes, Linux rocks. I will be playing with my 6800 XT in Manjaro Linux over the weekend, I am mostly interested in running local LLMs via ExLLaMa.

[–] mack123@kbin.social 2 points 1 year ago

I am working my way there. I am interested in the gaming possibilities. NPC dialogue and so on. But I wanted to get the environment working first. I found more guides for stable diffusion. Now I can venture deeper knowing that rocm is working.

[–] kindenough@kbin.social 2 points 1 year ago* (last edited 1 year ago) (1 children)

I am on Windows, automatic1111 with directML and rendering is pretty fast. 7700x, 6750xt and 32GB at 4800mhz and a couple of m.2 drives. No xformers tho and some problems when upscaling in txt2image, but it renders prompts with default settings in 10 to 15 seconds. Fast enough for me. AMD has updated their Adrenalin drivers lately to have better directML performance.

Some things can take some time or aren't supported on AMD, but it's surely faster then my rtx 1070 and 1080 rigs wich performed adequately, except with training.

[–] mack123@kbin.social 1 points 1 year ago

For sure. It works, especially if you use Shark with the experimental driver, but the speed difference was an order of magnitude for the rocm compiled driver on linux. I am already needing more card though. The 6800xt 16gb ram is not enough.I am running on medium ram settings. I hear rocm support for windows is coming soon, so that will be interesting as well. There were some rumours earlier this year.

[–] okamiueru@lemmy.world 1 points 1 year ago (1 children)

I've been running SD on AMD GPU and Linux since more or less the beginning. It's been smooth sailing all the way. Not nearly as fast as some equally expensive RTX cards. But, it is what is is.

[–] mack123@kbin.social 1 points 1 year ago (1 children)

Awesome, I am still finding my way and am happy if I simply don't crash. I don't have a frame of reference to compare to a Nvidia card for this, but it does seem like we have a little more work in getting things smooth with the AMD cards. I can't say that my speed is terrible. Most renders finish in reasonable time. I am simply amazed that we can do this on consumer grade hardware.

[–] okamiueru@lemmy.world 1 points 1 year ago* (last edited 1 year ago) (2 children)

Indeed. I'm in complete awe by this technology. It's an amazing pass-time that tickles the creative side. As for getting an idea of how different cards and system compare, you can check out https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html

I also have an 6800 XT, and the performance on that particular benchmark is around 9 it/s. Something like this looks to be a rough indication.

AMD Cards:

- 20 it/s:  7900 XTX
- 10 it/s 6900 XT
- 9 it/s 6800 XT
- 7 it/s 6700 XT
- 2 it/s RX Vega

NVIDIA Cards:

- 50 it/s RTX 4090
- 25 it/s RTX 4080
- 22 it/s RTX 3080 Ti
- 11 it/s RTX 4060 Ti
[–] mack123@kbin.social 1 points 1 year ago

I run the tests out of interest. I am leaving some performance on the table due to my launch options, but I need those to avoid to many out of memory and other errors.

Normal Test:

  • 5.24 / 6.16 / 7.06
  • app:stable-diffusion-webui updated:2023-06-27 hash:394ffa7b url:https://github.com/AUTOMATIC1111/stable-diffusion-webui/tree/master
  • arch:x86_64 cpu:x86_64 system:Linux release:5.19.0-46-generic python:3.10.6
  • torch:2.0.1+rocm5.4.2 autocast half xformers: diffusers: transformers:4.25.1
  • device:AMD Radeon RX 6800 XT (1) hip:5.4.22803-474e8620 16GB
  • sub-quadratic medvram
  • v1-5-pruned-emaonly.safetensors [6ce0161689]

Extensive Test

  • 5.36 / 6.16 / 7.08 / 5.38 / 5.52
  • app:stable-diffusion-webui updated:2023-06-27 hash:394ffa7b url:https://github.com/AUTOMATIC1111/stable-diffusion-webui/tree/master
  • arch:x86_64 cpu:x86_64 system:Linux release:5.19.0-46-generic python:3.10.6
  • torch:2.0.1+rocm5.4.2 autocast half xformers: diffusers: transformers:4.25.1
  • device:AMD Radeon RX 6800 XT (1) hip:5.4.22803-474e8620 16GB
  • sub-quadratic medvram
  • v1-5-pruned-emaonly.safetensors [6ce0161689]
[–] mack123@kbin.social 1 points 1 year ago (1 children)

Included my current numbers, any optimisation advice would be much appreciated 😉

[–] okamiueru@lemmy.world 0 points 1 year ago (1 children)

I think those numbers are roughly the same I get. It varies a bit from time to time. I also wouldn't know how to improve it, to be honest.

[–] mack123@kbin.social 1 points 1 year ago* (last edited 1 year ago) (1 children)

I managed to find an extra iteration or two without sacrificing to much stability.

6.06 / 7.59 / 9.11

Running with Doggettx selected as the optimiser in the optimiser config inside automatic1111.

I installed the google perftools as suggested in this thread

"sudo apt install libgoogle-perftools-dev"

And then added the following memory management options as suggested in this thread
by exporting: "export PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:128"

Made the following changes to my web-user.sh

Uncommented the command line options as follows:
export COMMANDLINE_ARGS="--medvram --upcast-sampling"

And added the following lines to the end of the file

export LD_PRELOAD=libtcmalloc.so
export PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:128

[–] okamiueru@lemmy.world 1 points 1 year ago (1 children)

Those are some great notes, thanks for sharing.

[–] mack123@kbin.social 1 points 1 year ago

No problem, backup and be careful. I think AMD still has a lot that can be done in the rocm drivers themselves. There should be gains left to make. Nvidia is not helping with their pricing either. Which should see more users on AMD. Hopefully better support for us. I am pleasantly surprised by what the card can do. I got it for gaming at 1440p, where it was the best bang for buck. The AI stuff is a cool bonus.

[–] AJYoung@beehaw.org 0 points 1 year ago (1 children)

What’s the ML version? I’d love to learn more!

[–] mack123@kbin.social 1 points 1 year ago

I was running this version: directML

It is dog slow compared to running under linux with rocM, but she runs ;-)

load more comments
view more: next ›