We’ve all seen the machine generated images. If you’ve spent much time generating these yourself, you might have found that the image you wanted was swallowed by some nearby concept.
I recently wanted a horse pulling a cart, but I kept getting horses that were carts.
Even in the image I finally chose, there’s a bit of confusion between the horse’s rear legs and the cart.
The latent space1 is very large, but it is swiss-cheesed with these attractors. If the attractor is exactly what you want, you are in a post-scarcity dreamworld. If the attractor is juuuuuust beside what you want, you’re Tantalus in hell.
The melding of a horse with its cart, like the infamous mangled fingers, are a misunderstanding which will improve with time. But not all of the attractors are misunderstandings.
I recently launched a new open source library for testing software. The library allows software to take a “self portrait”, and you can use that portrait as a baseline against which you can measure future changes. Libraries like this already exist, but mine incorporated a novel “camera and lens” concept which helps developers make the portraits more detailed and expressive.
I started off by asking for “robot taking a selfie with a DSLR”, and quickly found two attractor basins. One basin was “take a picture of itself” means “selfie”, and “selfie” means “phone”. The second basin was “DSLRs and robots have the same texture”. Every image either ignored the DSLR part of the prompt, or it morphed the entire robot into a DSLR.
I did not like the implications of a “selfie”. This was a library for rigorous technical software testing, selfies were way too informal. After burning many GPU hours, I stumbled into this image:
It stopped me in my tracks. There was so much inviting whitespace, it practically designed itself.
If I abandoned the point I was aiming for and surrendered to the groove, the bounty was endless.
I could go on - these are not the result of a tasteful selection process, Midjourney drops these like a toddler with a trashbag of m&ms. The “robot selfie” was so easy for the models, we could easily layer other elements and situations. To contrast with the sad “horse and buggy”, we could put the robot in a car:
or the robot could drop its phone mid-selfie for the 404 page.
I had a name picked out already, it’s too embarassing to say but if you search the git history you might find it. When our team realized that somehow selfie.dev was unclaimed, we had no choice - “selfie” it was.
The really strange part is that the library’s features started to change too. We had a feature where every portrait went through something we called a “pipeline”. You could put things into the pipeline to distill a specific facet of the portrait, or compress existing facets into a brief summary. The API was pretty involved, and it felt very much like a “DSLR” feature, not a “selfie” feature.
Now pause for a minute — why was it so easy for the model to draw phone selfies, but hard to draw DSLR selfies? It’s because, in the real world, it is much easier to take a selfie than a DSLR. You can even go so far as to say that phones are better than DSLRs because they have fewer, simpler controls.
So we asked ourselves - could we? The answer was yes2, we could accomplish the same things with fewer, simpler controls. And I swear to god we would not have done that if it weren’t named selfie, and we would never have named our library selfie without the latent space groove we fell into.
I have a narcissistic hoarding compulsion to trademark robots taking selfies. “This thing I have is so good, everyone is going to want it for themselves! And it’s so easy, they’re all going to find it! No! NOO! The robot selfies are all mine, there’s not enough for you to have any!”
But the actual problem is, there are enough! I think that the software artifact we built really is very good, and I don’t want people to confuse it with the other libraries in the market. The medium is the message, and trademarks create the jungle in which the brands roam - take away trademarks and the jungle creatures all ooze into each other.3
Twelve years ago I quit a great job and liquidated my life savings to start a company, and this was the artwork I put on the landing page:
I am not good at this. But with AI, I’m much better!4 This landing page looks pretty cool, right?
I’ve actually been working on three different branding efforts. Selfie was the first one to launch. And the other two I have to scrap because they both got swiped! Stolen! And by stolen, I mean I was scrolling twitter and I saw that someone else had found the exact same groove in which I had been planning on building my brand. Two out of three!
There’s a new domain registry opening up, but this time time there’s no registrar.
If you’re not familiar with latent space, the best explanation I’ve seen is this paragraph from the Subconscious substack (click through for a great illustration too).
Latent space is like a map you can use to correlate things with other things, along many dimensions… You can scrub through this latent space to discover all kinds of weird and wonderful interpolations between characteristics. For example, you can generate new chairs by scrubbing through the latent space between chairs.
The most notable example of this is the Palworld “Pokémon with guns” phenomenon.
noticed and explored this trend beautifully in Fear of Oozification and Oozy Intelligence in Slow Time.Nick St. Pierre has an excellent image prompting course which I highly recommend, taught me everything I know on this topic.
>This landing page looks pretty cool, right?
Looks great!
I've been thinking about this for VectorsOfMind as well. Named after one of my favorite papers in psychology. Then fell down a snake venom rabbit hole where the name kind of still works. Or the Eve _Theory_ of Consciousness. I didn't think that hard about the name. TBH I may not have spent so much time developing it had I posted the original as the Eve Hypothesis of Consciousness.
EToC is quite a good acronym. Which psych paper does VectorsOfMind reference? It is quite convenient for you that “string of numbers” and “mechanism of spread” happen to be overloaded onto the same cool-sounding word.