Learnings from 4 months of Image-Video VAE experiments
48 points - yesterday at 6:59 PM
SourceComments
asaiacai today at 11:20 PM
its cool to see the iterative improvements to your model laid out, but for everything that workedm i imagine there were at least a million other things you also tried but didnt work out. whats your process of trying these different techniques/architectures? do you just wait for one experiment to finish and visually inspect the results everytime. seems hard since these take a while to train. how do you shorten the feedback loop in this space?
greatgib today at 11:13 PM
Very nice well written article!
The kind that I like so much on HN. It tickle your mind but is still clear enough for an advanced beginner.
schopra909 yesterday at 7:00 PM
Hi HN, Iβm one of the two authors of the post and the Linum v2 text-to-video model (https://news.ycombinator.com/item?id=46721488). We're releasing our Image-Video VAE (open weights) and a deep dive on how we built it. Happy to answer questions about the work!
DonThomasitos today at 10:47 PM
Nice summary! I missed the mention of EQ-VAE when it comes to generation quality. Tiny trick, huge impact! Have you tried it?
lastdong today at 10:13 PM
This seems like a great model to experiment fine tuning with original art, given itβs relatively small and with open license. Is that a fair assessment?
Thanks for the great write up and making it available to us all.
fjejfhdh today at 10:03 PM
[flagged]