The Lemon Fool

Posted: **March 17th, 2023, 6:45 pm**

Talking of "it's not photography", I thought it was maybe time for a thread on the AI image generators. There's three well-known ones :

Stable Diffusion (Wiki)
Came out of a German university so has a somewhat different ethos to the others - it's open source, you can download it to run locally and train it on your own data, and the online one is rather less restricted. There's no login or anything, you can have unlimited goes for free, just go to https://stablediffusionweb.com/#demo and start entering prompts. It's a good way to explore but the results aren't that great IME - people look like freaks and there's a lot of artefacts. To give you an idea I put "Jester juggling three 4cm lemons" in and got the first pic back which completely ignored the jester requirement and just gave me a random modern photo of someone juggling, I had to add "Medieval" to get the second.

DALL-E (Wiki)
DALL-E was created by OpenAI of ChatGPT fame, and uses versions of GPT, presumably the current DALL-E 2 used GPT 3.5 as it was released last summer. Since OpenAI are in bed with Microsoft, this is what is used in Microsoft's image stuff for Bing, Office etc, they also have an API. Model is that you have to register at https://labs.openai.com , and each request consumes a credit. You get 50 free credits in your first month, and 15 news ones each month after that, plus you can buy more. You can also upload images for adjustment. The same prompt generated this from DALL-E, which is better and at least it can count to three but adding medieval didn't really help.

Midjourney (Wiki)
This is another private company that one feels like to push copyright rules further than others, but is currently delivering the best results. They seem to charge by computer time so the free trial gives you around 20-25 attempts, then you're onto three levels of subscription of $8/24/48 per month paid yearly or $10/30/60 paid monthly, so $10 buys you one month with ~200 attempts, the higher levels give you unlimited slow attempts. It's a slightly weird setup as you don't go to their website https://www.midjourney.com/app/ once you're set up, instead you go to their channel on the Discord messaging app and enter the /imagine command followed by your prompt text, once it's done it will offer you the option to Upscale or do Variations on one of the four images it returns. Supposedly a more conventional webpage for entering prompts is under development.
Version 5 released this week seems to be a major upgrade - it's not perfect but where there's lots of training material it does remarkably well - things like portraits, sci-fi, interiors, "general" logos (but it's not great with words on logos) and very famous people (Tom Cruise is fine, Boris Johnson doesn't work. As in Midjourney, so in life.) As it happens jesters with lemons are something it doesn't cope well with, although the overall vibe is great - the first two I asked for holding a lemon but it seems to have gone for an apple, and it can't count to three on the last one, plus there's some weird things going on with the left hand. I'm sure it's fixable with better prompts, I just had other things I wanted to spend my free trial on; looking at other people in the newbie threads, those kinds of major errors are reasonably rare.

But with better prompts and more mainstream subjects, Midjourney can give stunning results. At least it gives normal eyes and teeth, and most of the time 5 fingers on a hand, although toes can be more iffy, skintones are a little too "plastic" and there's usually a misplaced shadow or two. Just generally these generators are great for random mashups of unrelated things, like Muppets in an 8 Mile-style rap battle. Once you're on the trial you can see their showcase - it seems to have a real thing for gorgeous redheads, and I look forward to hearing Joe Biden's debut rap album, based on its cover.

A lot of the showcase images use "dot matrix" in their prompts, more generally naming a particular camera film is helpful, a camera less so, although previously it helped to explicitly specify high definition, 4k, 8k etc this new version already aims for high definition so it's less relevant. "--stylize 1000" seems to help, also eg "--ar 16:9" changes the aspect ratio. This thread shows how to retain the same "character" over a number of images, and demonstrates it by animating the same character getting older. I've also seen an example of a workflow that starts with an AI image from Midjourney, then animates it in 3D and gives it a voice, all automagically.

As the jester proves - even v5 of Midjourney is far from perfect, but it's light years ahead of Stable Diffusion, which only came out a year ago. You can see why the artists who have unwittingly supplied it with learning material are so upset, along with the likes of Getty as it's great for stock photography. You can only imagine where this is going in future, if the current state of the art can imagine the following out of "nowhere".

kodak disposable camera photo of a group of elderly women shopping for colorful spices at a crowded Indian street market, golden hour --ar 16:9 --stylize 0

overhead food photography, Bibimbap, Korean restaurant --ar 16:9

Lifestyle style photo of a tuscan woman with deep sparkling green eyes and a devious smile, sitting on park bench, paisley shawl over orange pantsuit, natural evening lighting, shot on Agfa Vista 400, 4k --ar 16:9 --stylize 1000

Editorial Style Photo, Eye Level, Modern, Living Room, Fireplace, Leather and Wood, Built-in Shelves, Neutral with pops of blue, West Elm, Natural Light, New York City, Afternoon, Cozy, Art Deco --ar 16:9

Editorial style photo, medium closeup shot, off-center, a young, brunette, french woman, sitting at a Marble Table, wearing a black gucci dress and diamond necklace, in an Art Deco Dining Room with Velvet, Brass, and Mirror accents, Jewel Toned color pallete, West Elm, Chandelier, Restaurant, Evening, natural lighting, Fujifilm, Luxurious, Historical, 4k --seed 420 --ar 16:9 --stylize 1000

(no prompt given)

Posted: **March 22nd, 2023, 2:39 am**

Microsoft have now got in on the act :
https://bing.com/create

You have to login to Microsoft and then get 25 "boosted" goes for free, and then you're throttled to one every minute or so depending on how busy it is, and you can earn more boosts via their rewards programme. Initial impressions are that it's not quite as "natural" as Midjourney, but it can give pretty good results, especially if you really tune your prompts. But it's a bit more hit and miss - some of the portraits look very "AI" whereas Midjourney-land is populated by supermodels, Bing is generally fairly intelligent but again struggles with the jester picture (for lack of source material?) and ends up with extra arms and funny hands if you're not careful.

Note that it puts a small "b" watermark bottom left, but at 1028x1028 the images are higher resolution than some. It keeps a gallery of your last 20 goes, so remember to save the ones you want before they fall off the end. The filtering seems to be somewhat aggressive - the likes of Boris Johnson and Priti Patel are blacklisted, it's fine with slebs from earlier eras that millennials haven't heard of. But as with most of these things, it tends to mangle "known" faces but sometimes you get lucky - and some "slebs" are easier than others, see below!

So here's the same prompt for the woman on the bench - the face on the right is much more artificial, but at least it's understood the clothing requirement, and then various prompts aimed at a jester with lemons that work less or more well :

Posted: **March 22nd, 2023, 3:21 pm**

Also yesterday, Adobe launched the beta of Firefly which is their take on this stuff, with more of an effort to merge it into Photoshop workflows and eg make fonts draped in textures etc :
https://www.adobe.com/sensei/generative-ai/firefly.html

Also careful to say that it's been trained on their own library of stock images and open licence stuff to ensure no legal problems.

Posted: **June 5th, 2023, 9:18 am**

Arguably pedantic, but ..

Hallucigenia wrote:Came out of a German university ... completely ignored the jester requirement and just gave me a random modern photo of someone juggling,

A German (or other non-native English speaking) human would very likely do the same. Even a mother-tongue speaker might not necessarily share the nuances with which you use the word "Jester".

Posted: **June 5th, 2023, 10:56 am**

UncleEbenezer wrote:Arguably pedantic, but ..

Hallucigenia wrote:Came out of a German university ... completely ignored the jester requirement and just gave me a random modern photo of someone juggling,

A German (or other non-native English speaking) human would very likely do the same. Even a mother-tongue speaker might not necessarily share the nuances with which you use the word "Jester".

It's nothing to do with being German - these models are trained on "the internet", so they should default to whatever the internet thinks a word means. And to get an idea of that, the first page of Google results for "jester" are all about the job in medieval courts, and the first definition from Oxford, Cambridge and Collins dictionaries (ooh, I see Google now has a deal with Oxford as "official" dictionary for Google).

Anyway, these models continue to advance, see the Midjourney showcase for an idea of the current state of the art - not perfect, but still pretty impressive.

https://bing.com/create seems to have relaxed its constraints, nominally you get 100 boosted goes for free but that seems to get refreshed every few days and even without the boosts it does it pretty quickly.

The latest feature from Adobe is Generative Fill, which allows you to "extend" an image - so you can imagine album covers or artworks as details in a wider landscape.

The Lemon Fool

Images from AI

Images from AI

Re: Images from AI

Re: Images from AI

Re: Images from AI

Re: Images from AI