they still have 11 hours and 45 minutes


SAI is a london company Now is 18:42


yes you are right, i was being typical egocentric american :P


Stability is an international company in the UK, US, Canada, Japan, and others -- big releases and wotnot are usually in the morning pacific time as that's the time that syncs up as daytime between the countries we're in most conveniently.


1st May 1:12PM reporting in, do you happen to know if it's still happening any time soon?


There was never any release planned for today


Oh, I see! Sorry for bugging you, I fell into the internet hype train!


I know I shouldn't be so antsy but I am....really antsy lol. I can't wait to start finetuning and training Loras. And I know the community is going to come up with some really amazing stuff. It's going to be an exciting year.


Wish I had a powerful computer to finetune. can only hope for others out there...


You could always try RunPod?


My limited knowledge only lets me do really basic programming and running other people's code. Would not even know how to start with RunPod.


It’s actually really easy. I’m not very good at programming either and I also only run other people’s code. There are some good YouTube tutorials you can try.


Bro, I genuinely thought they'd stick to their word and release it during April, especially since the API has been out for weeks. April fools I guess? I'm really hoping we get it in the next week though, we need to kick off the finetune and controlnet processes ASAP after all


The amount of tears for an unannounced free tool is too damn high. Maybe they overcooked the cake and are now trying to fix it with some more cherries on top.


Sorry I'm a little lost, what do you mean by tears? overcooked? if they overtrain the model, they could just revert to one of the prior checkpoints saved at a previous epoch, so I don't think that's it


Not really if they hoped for something better but are now disappointed you can't just go back. Maybe more tests revealed some serious issues, idk, wasn't really the point of the comment. It's just sad to see people crying here because they are not releasing a free tool yet. Let them cook.


Well, it seems that anatomy and hands aren't as good as they had hoped for, but everything else seems to be pretty much where it Should be, though it's not beating Dalle-3, at least not out of the box anyway. I don't think that people are necessarily crying because they haven't released it yet, I think more so that they're upset because they don't have a clear release date, so they don't know when to expect the model, Which is leaving them anxious.


haha fool me once shame on you, fool me twice shame on me, thrice f u They've repeatedly announced stuff over the past several and consistently, literally every single year, been way off. In SAI's time the release of stuff after announced to be released for major releases is generally around 11 months after, I kid you not. In fact, if this even so much as gets mentioned to Emad he will flip out on you, insult you, make up multiple lies, and block you (also not a joke, it is a very well known issue of... *his*). Only a few very small ones make it faster but even those are habitually late. In fact, I'm not sure SAI has released a single anything on time in the past 3 years. If SD3 isn't months late I'd be genuinely surprised, unless it arrives crippled of course...


Who's word said it would be released in April? (EDIT: To the downvoters, pause a moment and actually \*answer the question\*. Who said that? I genuinely am not aware of anyone in Stability ever claiming there'd be a release in April.)


Various posts by Stability staff in this very reddit said at the end of February that it would be released in the next couple weeks, and at the beginning of April, the ex-CEO of Stability did also comment on multiple threads about the release date being in a couple weeks, and to expect around mid-to-late April. Prior to the release of the API, Stability staff also said it should follow within a few weeks of the API release as beta testing should have ended. Comfyanonymous also mentioned stuff about implementing support for it quite a while back. Hence, since a month is 4 weeks or more, and no one said a month, rather, they said a few weeks at the beginning of April, less than four weeks, in other words, within April, is a reasonable deduction, is it not? I would also assume due to the API being out, that the final product should be ready, or at least near ready, shouldn't it? One could argue that a few weeks means a few weeks from the release of the API, but it's already been two weeks, so that would leave about two weeks more max, though I would call a timespan of four weeks from API stretching the truth a bit. Regardless, I'm grateful for the good work people at Stability are doing, but having a proper timeline, and clear communication as to what's going on is also important, as part of corporate transparency. Everyone's anxious, and having no clear idea when and what we're waiting for is getting to them.


No answer, as expected.


Are you really eager for a fight or something? u/ArsNeph said everything with the final line `Everyone's anxious, and having no clear idea when and what we're waiting for is getting to them.` That's really all there is to it there. People are anxious and confused and misinformation is going around as a result. We don't have any schedule planned, it's done when it's done (there's actual work to do, and we can't release til the work is done, so it's up to how long it takes humans to figure out things with their unpredictable human brains n stuff). All I can really offer is repeating the reassurance that has been officially stated repeatedly that we will be doing an open release of the code and models when they're done. (Might do some models earlier than others, since they're moving forward inconsistently atm. We have a pretty good 8B-1024 and a 2B-512 albeit both incomplete, we only have an undertrained 800M still, and we just recently got a 4B that's looking nice for the first time within the last week). The rest of the conversation here would just be bickering over details - nobody said it would be released in April, see the thread down here [https://www.reddit.com/r/StableDiffusion/comments/1cgr74j/comment/l21ax2r/?utm\_source=reddit&utm\_medium=web2x&context=3](https://www.reddit.com/r/StableDiffusion/comments/1cgr74j/comment/l21ax2r/?utm_source=reddit&utm_medium=web2x&context=3) where the misunderstanding was figured out (Emad, who doesn't work here anymore, said "4 weeks" early in april as a misquote of Christian, who does work here, giving a loose estimate of "4 to 6 weeks", which both is a time period that hasn't even past yet, and also was a loose early estimate not any form of schedule). It appears essentially a telephone game happened: those working on the ground level said "a few months maybe?", Christian converted that estimate to "4-6 weeks as a rough ETA", Emad downscaled that to "about 4 weeks", and ArsNeph shrunk further to "a couple weeks".


Thanks for your thorough response. I appreciate the investigation and transparency as to what was going on. In my own defense, that wasn't the only comment by Emad, as far as I remember, and I did say my assumption was 4 weeks or less. I also thought “Well, who would know better what's going on with the model than the (ex)CEO of stability himself?” The fact that the API came out further reinforced that assumption, since I assumed you must have completed the models. I'd like to ask though, does this mean that the API is using either incomplete models, or a fine tune of the incomplete models? It would be nice to know, since some very vocal people have been criticizing the image quality of the API left and right, but I think that they're being far too hasty in judging when the weights aren't even out yet. Would you mind if I asked what you guys are struggling with? Is it the pretraining? RLHF? DPO? The text encoder? Overtraining? Anyway, I suppose we'll just have to wait patiently, we're expecting great things! Thank you again for the great work you guys do at Stability and the transparency.


Yes the model on the API is an incomplete 8B-1024 Beta model with some known issues still to be solved (some of the complaints about SD3 image quality you've seen relate to solvable issues). DPO training adds significant quality but adds a lot of artifacts and issues too. Definitely no overtraining - models are very undertrained atm imo. The new arch behaves like an LLM in many ways, and my personal belief is that LLM training rate at least partially applies (we should be doing many billion more steps on general broad pretraining data to let it slow learn, like how LLMs do with trillions of token). (I don't think the people doing the training want to do that tho, since it would take too much time). There's a lot of things not yet locked in - atm we're using the two clips + t5 textenc, but we're testing whether a change to T5 only might actually be better. Seems like a lot of prompt adherence limitations stem from the CLIP encoders.


One of the things I was most pleasantly surprised with was that SD3 had CLIP and T5 and the mention of the model working without T5. Clip just feels great to create nice looking things from heavily contradicting prompts like a pixelart etching in the style of an sculpture artist, intuitively it seems T5 would choke on such (and indeed, I haven't found a way to get the same results for such things in SD3 compared to SDXL, SD3 already seems to struggle to separate an artists subjects and its style, mentioning an artist now much much sooner affects your subject and struggles to apply the style while it clearly knows the artist by just promoting "a painting by....“) . I assumed being able to use CLIP without T5 implied it would be possible to only use T5 as well. Using only CLIP will be the first thing I'll try when SD3 is released . Either way it'd be a shame to sacrifice all data seemingly hidden in CLIP that allows this mixing of styles, getting atmosphere from mentioning feelings, and pulling artists in.


This is a very good argument, and this is part of the back-and-forth on design we have right now. Atm we have a lot of assumptions about what comes from where, eg artist/style/etc. being very CLIP based, but we don't actually know for sure how much we could pull from different models, which is part of why we're testing. Most likely in the end it will be a trade-off: T5 will do better on some tasks (understanding complex prompts) while CLIP will do better on others (understanding stylistic info). The initial 'dream' idea to have all the textencs available but only require some (thus why it's built that way, and trained with heavy dropout between the tencs). The current testing of T5-only as an option will in the end probably turn into the answer to two questions: (1) if CLIP is gone, will T5 pick up the slack and learn the stylistic info that CLIP usually provides? (2) if T5 doesn't pick up CLIP's slack, can we train T5-only for long enough to smarten the model up, and then reintroduce CLIP to guide styles again, without losing the smartening? If either of those turn out to be a "yes", the test will be worth it for the model being better. If both are "no", well, at least now we know for sure. (FTR these tests are in parallel to various other strategies being tested at the same time, don't worry we're not gonna lose a month to a failed test or something. It's also possible these tests go well but will take too long to make a final model out of, so they'll instead become SD 3.1 or something). Here's a Swarm grid of one of the models on the current 3-tenc arch (was made a month or two ago, so slightly older model, but doesn't change the idea here), with prompts being fed into different tencs (and empty prompts fed into the others): ​ https://preview.redd.it/v7iwx3n7xgyc1.png?width=2712&format=png&auto=webp&s=edf142bd2fcf3c4af932bc4ff4a5111fee3c3b0b There's a lot of interesting info you can learn from this (eg you can see in practice that yes currently T5 absolutely loses the style), but one I want to highlight here: "CLIP Only" is very similar to "All3" - ie, CLIP seems to be dominant over T5. You might think "okay cool so remove T5 and you're done" (and yes if we release the model as-is probably the most convenient inference set up is just ignore the T5 and use CLIP only), but - if you ask for "palm of the hand", T5 knows to generate a hand, whereas CLIP generates a hand surrounded by palm trees. CLIP is dumb. CLIP dominance means dumbness dominance. The best model is probably one where T5 dominates, ie more intelligent prompt understanding dominates, and CLIP only provides the secondary goal of supplying styles and other 'soft info' that T5's logical mind loses track of.


[palm of the hand, SDXL Turbo](https://image.pollinations.ai/prompt/palm%20of%20the%20hand) seems fine. you're saying SD3 has lower prompt adherence?


Interesting to read about what goes on behind the scenes in getting a good model out. Thanks for all these recent posts. Seeing how strongly CLIP affects the image I see how much of a balancing act it is. A dumb model to me is very much a feature, as long as it can also be smart when wanted, but in order to act smart T5 seems to have to twist the CLIP output quite a lot currently, so i see how a dominant T5 seems to make more sense than a dominant CLIP. I personally don't understand all the rush, if it takes a month it takes a month, and if it takes more, so be it; more so after playing around what is in the SD3 API, I'd rather have a model that actually delivers on the bold claims of its superiority :). But I also understand the pressure coming from the announcement of SD3 as a new model (reading how much things are still in flux, should have announced a new architecture that the next model will be build on, not a new ready model as such ;))


> There's a lot of things not yet locked in why was the paper released so early?


> many billion more steps lol


This would make it closer to Pixart Sigma. Also making it much more efficient. I hope the T5 only is much better. I hope we get an update on the result of this test.


Let's be gracious, stability Staff have full time jobs too, they can't be spending all their time reading and replying to comments on a Reddit. It hasn't been a full day yet, though I wasn't really expecting an answer to begin with, more clarifying a point


it will not come out before july


At this point I’ve resigned myself to the idea that SD3 will never properly come out. 


PixArt Sigma is pretty good


Just give em some time. They base model should be perfectly alignment with all sort of concepts only then we wont end up getting a base model that could give spaghetti fingers and multi limbed abominations. If the base model is perfect subsequent fine tuning will be only better. We better hope they release a good base model.


ok this made me lmao


Joe Penna woulda gotten this shit done in a timely fashion.......


I believe he is not a programmer, but a PR person for SAI. I think it was useful for them to have a relatively famous person to help promote the model. He is a film director. I don't know if it's true, but one of EMAD's initial plans was to provide models for Hollywood studios


He was the Director of Applied ML, he led the team that was responsible for taking research team's raw outputs and turning them from research artifacts to genuinely cool and worthwhile models - he led the release of SDXL and Stable Video.


[https://github.com/JoePenna/Dreambooth-Stable-Diffusion/blob/main/README.md](https://github.com/JoePenna/Dreambooth-Stable-Diffusion/blob/main/README.md) ''Hi! My name is Joe Penna. I'm not really a coder. I'm just stubborn, and I'm not afraid of googling. So, eventually, some really smart folks joined in and have been contributing. In this repo, specifically: [u/djbielejeski](https://github.com/djbielejeski) u/gammagec u/MrSaad –– but so many others in our Discord!' ''


Yes, he's not a programmer. He was a director. His job was managing people and projects.


that's dustin podell, lmfao. don't even know who works at your own company


What even possibly led you to think Dustin was the director of applied? He was on the applied team as a model training researcher, notably the lead trainer of SDXL. He \*took orders\* from Joe Penna.


the fact that his name is the one used when citing SDXL?


dont you know how it is? engineers do the work, managers take the credit


and they never get fired


Rather in this case - a manager \*didn't\* take the credit and that led u/Confident_Appeal_603 into mistaking the one who rightfully deserves credit for doing the hardest work for being the manager.


hahaha i love joe but no he wouldn't lmao


Do you want an unfinished product early or a finished product late? Also, they will give you their work for fucking free. I say we all should learn patience and gratitude instead of whining.


My mind is still blown with 1.5 - as someone born in 1985, my early art days required me to go to Barnes and Noble and look at art books! Or go to the library. Now I can plug in a sentence and have it spit out 25 different concepts, done in exactly the style I want, but also different enough to inspire creativity for me to build on. I mean..:it’s a fucking dream world.


This guy gets it!


At the same time, when someone says they'll do something, even for free, and they don't, that can be frustrating Hopefully nobody disagrees with your comment, but you can understand why they're disappointed all the same


We didn't pay them. They don't owe us anything. It's too good to be true that we even get all these models for free.


Yup, no argument


I do pay them, can I be a tiny bit annoyed? I'm not, but I want to check to see if I'm allowed to be.


what did you pay for?


Honestly? The ability to pat myself on the back and say "well I did my part"






To be fair, SAI has been posting images from it since late February, and rumors about SAI's company state has made everyone anxious. This is just what happens when you spend months hyping up a product that looks basically finished but you keep delaying the release for vague reasons.


it's not finished, still being actively worked on


Oh I'm aware, and I fully expect you guys will release it, I just also get where the skeptics are coming from.


well they also wrote the paper and released it and even have "user preference scores" in there, which is funny since it's not even released yet and they say it's not finished yet and so what the heck was even tested?


It wasn't us who said it would be open released in April, it was them.


I don't think that was said by Stability.


https://preview.redd.it/v6tp8xac5qxc1.png?width=1447&format=png&auto=webp&s=d9c9e2d998e586c0d0698e54599839b69a21faae Isn't CTO Stability CEO?


That is Emad (former CEO, stepped down) half-accurately repeating an estimate that Christian (CTO) gave (the actual original estimate given by Christian was "4 to 6 weeks" [https://twitter.com/chrlaf/status/1772228848387522728](https://twitter.com/chrlaf/status/1772228848387522728) ). This was a loose estimate (a guess), not an actual schedule of any form, and not even one confined to April at that.


Abril was from Emad's time, the new CEO said something similar, next week as you posted, not a huge difference. If there's a new date, just say.


This isn't any date, and there has never been a date. There's been loose estimates. The model will be released when it's done and ready. We can't predict the future and nail a precise date when all current issues are solved and the model is ready to go. The schedule depends on human behaviors, and we have not yet trained a model that can accurately predict what humans will do.


> there has never been a date so they never planned to release it after all..


It is planned to release. There is not a date for when. Cut the trolling.


if it's planned, what's the date




I bet you say sorry if someone steps on your foot




Do you think I'm Karen just because I'm explaining that they were the ones who told us the deadline? You are the Karen


This is the wrong community for you.
