Shiimiish

joined 1 year ago
 

hi,

I got the whisper_stt extension running in Oobabooga and it (kinda) works. However it seems really, really bad in understanding my speech and recognition has been spotty at best.

I saw some youtube tutorials where it seemed to have no problem in understanding - even when spoken to in quite a bit of an accent - and in my own experience it performs knowhere near as good as shown there.

So - are there things I can do to improve its performance? Or may the yt tutorials have been edited to give a wrong impression and spotty performance is what to expect?

I'm very happy with the silero_tts and if I can get the speech-to-text to work at the same level, I'd be a happy camper already

Edit: It seems to be a memory problem. I can select several models in the extension interface - tiny, small, base, medium, ... If I choose the tiny or small model, it does work but with the poor results I mentioned above. If I select the medium model I get an OOM error (something like: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 11.99 GiB total capacity; 11.14 GiB already allocated; 0 bytes free; 11.22 GiB reserved in total by PyTorch) It looks to me as if the language model reserves the whole of my VRAM (which is 12GB) and doesn't leave any for the extension - is there a way to tweak it?

Edit 2:

Ok so, if I use a smaller language model (like a 6B model) it seems to be working perfectly fine with the medium whisper model ... so it is probably a memory issue. I have already tried to start with the command flag "--gpu-memory 5/8/10" which doesn't seem to do anything. Are there other ways of memory management?

 

Hi there, do any of you know of an in-depth guide/tutorial to ComfyUI? I'm looking for something that goes beyond the general setup and talks about the different nodes, where in the workflow they should be placed and so on. Basically a resource that shows me how to build advanced workflows without too much trial and error.

I've been using ComfyUI, but I feel like I'm doing it blindfold and I don't know what screws there are to adjust.

 

After a bumpy start (see my other thread about it), I start to feel a bit comfortable with SDXL to the point that I probably wont look back at the 1.5 models. This wizard-hat wearing cat was generated in A1111 with:

"a cute kitty cat wearing a wizard hat, candy rays beaming out of the cat ears, (a swirling galaxy of candy pops background:0.7), 1980's style digital art, hyperrealistic, paintbrush, shallow depth of field, bokeh, spotlight on face, cinematic lighting " Negative (from a standard style I use): "(bad anatomy:1.1), (high contrast:1.3), watermark, text, inscription, signature, canvas frame, (over saturated:1.2), (glossy:1.1), cartoon, 3d, ((disfigured)), ((bad art)), ((b&w)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck))), Photoshop, video game, ugly, tiling, poorly drawn hands, 3d render

Generated at 1024x1024 without refiner.

There's a few things to be aware of, when working with SDXL in A1111 that I found:

  • make sure you upgraded A1111 to version 1.5.1 (do a "git pull" in the install directory)
  • I needed to add "--medvram" to my command line arguments, otherwise I'd get out of memory errors (12 GB VRAM)
  • make sure you have your VAE as "automatic" or using the SDXL VAE (can be downloaded from huggingface). Older VAE's wont work
  • older LoRa don't work and you will get errors
  • there is a noise offset LoRa for SDXL (sd_xl_offset_example-lora_1.0) which does work, but I don't see too much difference in the images. With LoRa they are a tiny bit crisper. However, this LoRa doesn't work with the Refiner model (you will get errors)

And the biggest one for me:

  • don't use arbitrary image proportions and stick to the ones posted here: https://platform.stability.ai/docs/features/api-parameters This was the biggest mistake I made initially. By using other image sizes I'd get super wonky images and very unsatisfying results. I stick to the recommended dimensions and now my images are much, much better.

A word to the refiner model: as of now I don't see big quality improvements if I go with the refiner model in img2img @about 0.1 - 0.25 denoising. I think I will play around more with this at higher denoising strength and see what I can get out of it.

Anyway, I think the SDXL is a huge improvement and I start getting really exciting results already Cheers :)

[–] Shiimiish@lm.ainyataovi.net 2 points 1 year ago

The prompt was just an example and usually my prompts get quite a bit longer than that. But in 1.5 models I manage to get what I want to see eventually. I also find that throwing in qualifiers like "mesmerizing" does do something to the image, although in can be subtle.

However, what I wanted to say here was that in SDXL my prompting seems to go to nowhere and I feel I'm not able to get out the kind of image I have in my head. Keeping the prompt example, in SD1.5 using a custom model like Deliberate 2.0 I'm able to end up with an image of a hat wearing cat surround by surreal looking candy pops. (however the final prompt for this reads). In SDXL my images "break" (i.e start looking flat, unrefined or even bizarre) at some point long before I can direct them towards my imagined result. All my usual approaches like reducing CFG, re-ordering prompts, using a variety of qualifierts don't seem to work like I'm used to.

And tbh, I think this has to be expected. These are new models, so we need new toools (prompts) to work with them. I just haven't learned how to do it yet and I'm asking how others do it :)

 

Hi,

I'm looking into hosting a blog site for myself - nothing fancy, just a site where I can publish some of my thoughts and ideas. Maybe I also want a section to publish images. So, basically something lean and mostly text only.

What's the easiest way to set this up for myself?

 

Hi,

I'm a bit struggling to get good results with SDXL and I'm wondering if I do something wrong ... I tried A1111 and ComfyUI and have been underwhelmed in both cases. I can get bland looking boring images out of it, which seem to be ok from a technical point of view (like they seem to be correctly generated, without weird artifacts or something like that). However, whenever I try to get something more elaborate my prompting leeds to nowhere. Like I can get "a cat" and it will generate a picture of a cat. But if I try to get "a cat wearing a wizard hat floating in a mesmerizing galaxy of candy pops" - these kind of prompts seem to quickly break the final image. I'm not talking about tailored models and LoRa here, but I seem to be able to do much more interesting stuff with the Deliberate 2.0 model than with SDXL.

So, what's your experience so far? Does the community need to catch up first and do work on custom models, LoRa, and so on to really get thinks cooking? Or do I need to learn better how to work with XL? I was actually looking forward to have a "bland" and hopefully rather unbiased model to work with where not every prompt desperately trys to become a hot anime girl, but I'm struggling to get interesting images for now.

For reference, I updated my A1111 installation with "git pull" (which seems to have worked, as I now have a SDXL tab in my settings) and downloaded the 1.0 model, refiner and VAE from huggingface. I can generate text2imgage in A1111 with the base model, however I can't seem to get the img2img with the refiner model to work ... On ComfyUI I found a premade workflow that uses the base model first and the refiner from the latent and which seems to work just fine technically, but also seems to require a different approach to prompting than I'm used to.

[–] Shiimiish@lm.ainyataovi.net 2 points 1 year ago

Thank you for the input! I recently upgraded my PC to be able to handle Stable Diffusion, and I got 12GB of VRAM to work with at the moment. I also have recently started to self-host some applications on a VPS, so some basics are there.

As for what I'd like to do with Stable Diffusion: One of my hobbies is storytelling and worldbuilding. I would like to (one day) be able to work on a story with a LLM and then prompt it: "now give me a drawing of the character we just introduced to the story" and the LLM would automagically rope in Stable Diffusion and produce a workable drawing with it. I think that this is probably beyond the capability of the current tools, but this is what I would like to achieve. I will definitely look into langchain to see what I can do with it.

That's also where the questions about context length and cross thread referencing come from. I did some work with ChatGPT and am amazed at how good a tool it is to "brainstorm with myself" in developing stories. However, it does not remember the story bits I've been working on 2 hours ago, which kinda bummed me out .. :)

[–] Shiimiish@lm.ainyataovi.net 22 points 1 year ago (4 children)

I look them up at lemmyverse.net

I go there about once a week to see if there are new communities I might be interested in. I'm on a selfhosted single-user instance, so my "all" is identical to my "subscribed" and this is how I populate my feed.

[–] Shiimiish@lm.ainyataovi.net 1 points 1 year ago (1 children)

Yeah, reducing CFG can help a lot. It sometimes feels to me, that getting a good image is knowing at what point to let loose ...

 

Hi there, I'm curious to know other people's approach in working with Stable Diffusion. I'm just a hobbyist myself and work on creating images to illustrate the fictional worlds I'm building for fun.

However, I find that getting very specific images (that are still visually pleasing) is really difficult.

So, how do you approach it? Are you trying to "force" your imagined picture out by making use of control net, inpainting and img2img? I find that this approach usually leeds to the exact image composition I'm after but will yield completely ugly pictures. Even after hours of inpainting the best I can get to is "sorta ok'ish", surely far away from "stunning". I played around with control net for dozens of hours already, experimenting with multi-control, weighting, control net only in parts of the image, different starting and ending steps, ... but it's only kinda getting there.

Now, opposed to that, a few prompts can generate really stunning images, but they will usually only vaguely resemble what I had in mind (if it's anything else than a person in a generic pose). Composing an image by only prompts is by no means easier/faster than the more direct approach mentioned above. And I seem to always arrive at a point where the "prompt breaks". Don't know how to describe this, but in my experience when I'm getting too specific in prompting, the resulting image will suddenly become ugly (like architecture that is too closely described in the prompt having all wrong angles suddenly).

So, how to you approach image generation? Do you give a few prompts and see what SD can spit out with that? Taking delight in the unexpected results and explore visual styles more than specific image compositions? Or are you trying to be stubborn like me and want to use it as a tool for illustrating imagination - which at the latter it doesn't seem nearly as good at as at the former.

 

Hi there, On my router/modem I cannot change the DNS entries, thus just using Adguard/PiHole for DNS blocking ads doesn't work. Would a seperate Router circumvent this problem? Could I set up Adguard (or PiHole) on a Raspberry and use it as a DNS server for my home network?

The plan would be to use my ISP-provided router just as a modem to connect to the internet. Then us a second router to provide my home network, where also Adguard/PiHole can do their thing.

Would this setup work and how would I need to configure it?

[–] Shiimiish@lm.ainyataovi.net 5 points 1 year ago (1 children)

When I started I was just copying from online galleries like Civitai or Leonardo.ai, which gave me noticeable better images than what I have came up with myself before. However, it seemed to me that many of these images may also just have copied prompts without understanding what's really going on with them and I started to experiment for myself.

What I will do right now is to build my images "from ground up" starting with super basic prompts like "a house on a lake" and work from there. First adding descriptions to get the image composition right, then work in the style I'm looking for (photography, digital artwork, cartoon, 3D render, ...). Then I will work in enhancers and see what they change. I found that one has to be patient, only change one thing at a time and always do a couple of images (at least a batch of 8) to see if and what the changes are.

So, I still comb though image galleries for inspiration in prompting, but I will now most of the time just pick one keyword or enhancer and see what it does to my own images.

It is a long process that requires many iterations, but I find it really enjoyable.

[–] Shiimiish@lm.ainyataovi.net 1 points 1 year ago (1 children)

I just figured out that I could drag any of my images, made with A1111, into the UI and it would set up the corresponding workflow automatically. I was under the impression that this would only work for images already created with ComfyUI first. However, this gives great starting points to work with. I will play around with it tonight and see if I can extract upscaling and control-net workflows with it as a starting point from existing images.

[–] Shiimiish@lm.ainyataovi.net 1 points 1 year ago (3 children)

Do happen to have a tutorial for ComfyUI at hand, that you can link and that goes into some details? These custom workflows sound intriguing, but I'm not really sure where to start.

 

Hi there, I've seen a few videos on yt showing it off and it looks incredibly powerful in finetuning the outputs of SD. It also looks dauntingly complicated to learn how to use it effectively.

For those of you, who played around with it - do you think it gives better results than A1111? Is it indeed better in finetuning? How steep was the learning curve for you?

I'm trying to figure out if I'd want to put in hours to learn how to use it. If it improves my ability to get out exactly the images I want, I'll go for it. If it does what A1111 does, just dressed up differently I'll sit it out :)

[–] Shiimiish@lm.ainyataovi.net 2 points 1 year ago

please do, I thinking to start making LoRa's as well and the tool looks like it would make the process much easier. Let me know how it goes for you.

[–] Shiimiish@lm.ainyataovi.net 2 points 1 year ago

I started with the smallest offer available and later upgraded to the second smallest, which now has 4GB RAM. I also have rented additional diskspace, so that I have 30GB now. RAM and CPU are now certainly fine, but I don't know yet about disk space. I read that Lemmy/Mastodon can eat up space quickly and I have currently used up about half of my disk space.

[–] Shiimiish@lm.ainyataovi.net 1 points 1 year ago (1 children)

You should be able to configure this differently. Either switch of the confirmation mails completely or use the email credentials from another server.

[–] Shiimiish@lm.ainyataovi.net 2 points 1 year ago (3 children)

I use Synapse as Matrix server and Element as client. It doesn't need port 25 (8008 and 8448 are needed in my setup). On Lemmy and Mastodon I configured outgoing mail using smtp via my existing mail hoster, so I don't send mail from my own server. Also, all googling I did said to stay away from selfhosting email, as it is a hassle not to be immediately blocked as a spam mail server ..

[–] Shiimiish@lm.ainyataovi.net 3 points 1 year ago

I use Synapse as the Matrix server and Element as client on desktop and mobile. It does support video calls, but so far I only tested it for a minute.

 

Hi there, I was intrigued by the idea of self-hosting my social media accounts, but was more or less a complete noob with all things hosting. However, with the help of the community here (and quite a few hours spent on it) I finally have a working setup! Mastodon, Matrix, Lemmy, Nextcloud all self-hosted behind Nginx Proxy Manager.

Google can find a lot of answers, but sometimes some really specific input is needed - which you guys have provided over the last couple of weeks - so I just wanna say thank you for that!

 

So, I think I ironed out a lot of things to get my selfhosted setup running, but it seems that Nginx Proxy Manager is causing me troubles. When I restart my server, the container with NPM restarts as expected but I can't log into the web ui (the website comes up, but when I try to log in nothing happens) and it also doesn't provide the expected proxy functionalities. I'm not sure what happens - any advice would be welcome. right now my only workaround is to delete the container and make it from scratch, but this also means making all proxy hosts + certificates from scratch as well ...

 

I finally managed to selfhost Lemmy and Matrix, now it is time to also get a selfhosted Mastodon instance up. A few questions before I start:

I did some research into the topic and it seems that Mastodon doesn't like to run behind an existing reverse proxy and there are quite a few tweaks necessary to get it running - can someone confirm this? Or is this something easily set up?

I'm currently leaning to run it on a dedicated VPS (due to the issue above and also because it seems to need quite a bit of disk space) - this opens up to do a non-docker installation and follow the official install path. Do you think this will make it easier to keep it updated to new releases in the future?

If going with a docker install there seem to be quite a few problems with updating (at least a lot of threads discussing failed update procedures sprung up when I googles "mastodon docker update") - can someone confirm? Are there easy to follow guides for a docker based update routine?

Right now it seems the easiest would be to run on a dedicated server, follow the native installation procedure and use the templates provided for nginx, certbot, .... thoughts?

view more: next ›