Learnings from 4 months of Image-Video VAE experiments

48 points - yesterday at 6:59 PM

Comments

asaiacai today at 11:20 PM

its cool to see the iterative improvements to your model laid out, but for everything that workedm i imagine there were at least a million other things you also tried but didnt work out. whats your process of trying these different techniques/architectures? do you just wait for one experiment to finish and visually inspect the results everytime. seems hard since these take a while to train. how do you shorten the feedback loop in this space?

greatgib today at 11:13 PM

Very nice well written article!

The kind that I like so much on HN. It tickle your mind but is still clear enough for an advanced beginner.

schopra909 yesterday at 7:00 PM

Hi HN, I’m one of the two authors of the post and the Linum v2 text-to-video model (https://news.ycombinator.com/item?id=46721488). We're releasing our Image-Video VAE (open weights) and a deep dive on how we built it. Happy to answer questions about the work!

DonThomasitos today at 10:47 PM

Nice summary! I missed the mention of EQ-VAE when it comes to generation quality. Tiny trick, huge impact! Have you tried it?

lastdong today at 10:13 PM

This seems like a great model to experiment fine tuning with original art, given it’s relatively small and with open license. Is that a fair assessment?

Thanks for the great write up and making it available to us all.

fjejfhdh today at 10:03 PM

[flagged]