Images from AI
Posted: March 17th, 2023, 6:45 pm
Talking of "it's not photography", I thought it was maybe time for a thread on the AI image generators. There's three well-known ones :
Stable Diffusion (Wiki)
Came out of a German university so has a somewhat different ethos to the others - it's open source, you can download it to run locally and train it on your own data, and the online one is rather less restricted. There's no login or anything, you can have unlimited goes for free, just go to https://stablediffusionweb.com/#demo and start entering prompts. It's a good way to explore but the results aren't that great IME - people look like freaks and there's a lot of artefacts. To give you an idea I put "Jester juggling three 4cm lemons" in and got the first pic back which completely ignored the jester requirement and just gave me a random modern photo of someone juggling, I had to add "Medieval" to get the second.
DALL-E (Wiki)
DALL-E was created by OpenAI of ChatGPT fame, and uses versions of GPT, presumably the current DALL-E 2 used GPT 3.5 as it was released last summer. Since OpenAI are in bed with Microsoft, this is what is used in Microsoft's image stuff for Bing, Office etc, they also have an API. Model is that you have to register at https://labs.openai.com , and each request consumes a credit. You get 50 free credits in your first month, and 15 news ones each month after that, plus you can buy more. You can also upload images for adjustment. The same prompt generated this from DALL-E, which is better and at least it can count to three but adding medieval didn't really help.
Midjourney (Wiki)
This is another private company that one feels like to push copyright rules further than others, but is currently delivering the best results. They seem to charge by computer time so the free trial gives you around 20-25 attempts, then you're onto three levels of subscription of $8/24/48 per month paid yearly or $10/30/60 paid monthly, so $10 buys you one month with ~200 attempts, the higher levels give you unlimited slow attempts. It's a slightly weird setup as you don't go to their website https://www.midjourney.com/app/ once you're set up, instead you go to their channel on the Discord messaging app and enter the /imagine command followed by your prompt text, once it's done it will offer you the option to Upscale or do Variations on one of the four images it returns. Supposedly a more conventional webpage for entering prompts is under development.
Version 5 released this week seems to be a major upgrade - it's not perfect but where there's lots of training material it does remarkably well - things like portraits, sci-fi, interiors, "general" logos (but it's not great with words on logos) and very famous people (Tom Cruise is fine, Boris Johnson doesn't work. As in Midjourney, so in life.) As it happens jesters with lemons are something it doesn't cope well with, although the overall vibe is great - the first two I asked for holding a lemon but it seems to have gone for an apple, and it can't count to three on the last one, plus there's some weird things going on with the left hand. I'm sure it's fixable with better prompts, I just had other things I wanted to spend my free trial on; looking at other people in the newbie threads, those kinds of major errors are reasonably rare.
But with better prompts and more mainstream subjects, Midjourney can give stunning results. At least it gives normal eyes and teeth, and most of the time 5 fingers on a hand, although toes can be more iffy, skintones are a little too "plastic" and there's usually a misplaced shadow or two. Just generally these generators are great for random mashups of unrelated things, like Muppets in an 8 Mile-style rap battle. Once you're on the trial you can see their showcase - it seems to have a real thing for gorgeous redheads, and I look forward to hearing Joe Biden's debut rap album, based on its cover.
A lot of the showcase images use "dot matrix" in their prompts, more generally naming a particular camera film is helpful, a camera less so, although previously it helped to explicitly specify high definition, 4k, 8k etc this new version already aims for high definition so it's less relevant. "--stylize 1000" seems to help, also eg "--ar 16:9" changes the aspect ratio. This thread shows how to retain the same "character" over a number of images, and demonstrates it by animating the same character getting older. I've also seen an example of a workflow that starts with an AI image from Midjourney, then animates it in 3D and gives it a voice, all automagically.
As the jester proves - even v5 of Midjourney is far from perfect, but it's light years ahead of Stable Diffusion, which only came out a year ago. You can see why the artists who have unwittingly supplied it with learning material are so upset, along with the likes of Getty as it's great for stock photography. You can only imagine where this is going in future, if the current state of the art can imagine the following out of "nowhere".
kodak disposable camera photo of a group of elderly women shopping for colorful spices at a crowded Indian street market, golden hour --ar 16:9 --stylize 0
overhead food photography, Bibimbap, Korean restaurant --ar 16:9
Lifestyle style photo of a tuscan woman with deep sparkling green eyes and a devious smile, sitting on park bench, paisley shawl over orange pantsuit, natural evening lighting, shot on Agfa Vista 400, 4k --ar 16:9 --stylize 1000
Editorial Style Photo, Eye Level, Modern, Living Room, Fireplace, Leather and Wood, Built-in Shelves, Neutral with pops of blue, West Elm, Natural Light, New York City, Afternoon, Cozy, Art Deco --ar 16:9
Editorial style photo, medium closeup shot, off-center, a young, brunette, french woman, sitting at a Marble Table, wearing a black gucci dress and diamond necklace, in an Art Deco Dining Room with Velvet, Brass, and Mirror accents, Jewel Toned color pallete, West Elm, Chandelier, Restaurant, Evening, natural lighting, Fujifilm, Luxurious, Historical, 4k --seed 420 --ar 16:9 --stylize 1000
(no prompt given)
Stable Diffusion (Wiki)
Came out of a German university so has a somewhat different ethos to the others - it's open source, you can download it to run locally and train it on your own data, and the online one is rather less restricted. There's no login or anything, you can have unlimited goes for free, just go to https://stablediffusionweb.com/#demo and start entering prompts. It's a good way to explore but the results aren't that great IME - people look like freaks and there's a lot of artefacts. To give you an idea I put "Jester juggling three 4cm lemons" in and got the first pic back which completely ignored the jester requirement and just gave me a random modern photo of someone juggling, I had to add "Medieval" to get the second.
DALL-E (Wiki)
DALL-E was created by OpenAI of ChatGPT fame, and uses versions of GPT, presumably the current DALL-E 2 used GPT 3.5 as it was released last summer. Since OpenAI are in bed with Microsoft, this is what is used in Microsoft's image stuff for Bing, Office etc, they also have an API. Model is that you have to register at https://labs.openai.com , and each request consumes a credit. You get 50 free credits in your first month, and 15 news ones each month after that, plus you can buy more. You can also upload images for adjustment. The same prompt generated this from DALL-E, which is better and at least it can count to three but adding medieval didn't really help.
Midjourney (Wiki)
This is another private company that one feels like to push copyright rules further than others, but is currently delivering the best results. They seem to charge by computer time so the free trial gives you around 20-25 attempts, then you're onto three levels of subscription of $8/24/48 per month paid yearly or $10/30/60 paid monthly, so $10 buys you one month with ~200 attempts, the higher levels give you unlimited slow attempts. It's a slightly weird setup as you don't go to their website https://www.midjourney.com/app/ once you're set up, instead you go to their channel on the Discord messaging app and enter the /imagine command followed by your prompt text, once it's done it will offer you the option to Upscale or do Variations on one of the four images it returns. Supposedly a more conventional webpage for entering prompts is under development.
Version 5 released this week seems to be a major upgrade - it's not perfect but where there's lots of training material it does remarkably well - things like portraits, sci-fi, interiors, "general" logos (but it's not great with words on logos) and very famous people (Tom Cruise is fine, Boris Johnson doesn't work. As in Midjourney, so in life.) As it happens jesters with lemons are something it doesn't cope well with, although the overall vibe is great - the first two I asked for holding a lemon but it seems to have gone for an apple, and it can't count to three on the last one, plus there's some weird things going on with the left hand. I'm sure it's fixable with better prompts, I just had other things I wanted to spend my free trial on; looking at other people in the newbie threads, those kinds of major errors are reasonably rare.
But with better prompts and more mainstream subjects, Midjourney can give stunning results. At least it gives normal eyes and teeth, and most of the time 5 fingers on a hand, although toes can be more iffy, skintones are a little too "plastic" and there's usually a misplaced shadow or two. Just generally these generators are great for random mashups of unrelated things, like Muppets in an 8 Mile-style rap battle. Once you're on the trial you can see their showcase - it seems to have a real thing for gorgeous redheads, and I look forward to hearing Joe Biden's debut rap album, based on its cover.
A lot of the showcase images use "dot matrix" in their prompts, more generally naming a particular camera film is helpful, a camera less so, although previously it helped to explicitly specify high definition, 4k, 8k etc this new version already aims for high definition so it's less relevant. "--stylize 1000" seems to help, also eg "--ar 16:9" changes the aspect ratio. This thread shows how to retain the same "character" over a number of images, and demonstrates it by animating the same character getting older. I've also seen an example of a workflow that starts with an AI image from Midjourney, then animates it in 3D and gives it a voice, all automagically.
As the jester proves - even v5 of Midjourney is far from perfect, but it's light years ahead of Stable Diffusion, which only came out a year ago. You can see why the artists who have unwittingly supplied it with learning material are so upset, along with the likes of Getty as it's great for stock photography. You can only imagine where this is going in future, if the current state of the art can imagine the following out of "nowhere".
kodak disposable camera photo of a group of elderly women shopping for colorful spices at a crowded Indian street market, golden hour --ar 16:9 --stylize 0
overhead food photography, Bibimbap, Korean restaurant --ar 16:9
Lifestyle style photo of a tuscan woman with deep sparkling green eyes and a devious smile, sitting on park bench, paisley shawl over orange pantsuit, natural evening lighting, shot on Agfa Vista 400, 4k --ar 16:9 --stylize 1000
Editorial Style Photo, Eye Level, Modern, Living Room, Fireplace, Leather and Wood, Built-in Shelves, Neutral with pops of blue, West Elm, Natural Light, New York City, Afternoon, Cozy, Art Deco --ar 16:9
Editorial style photo, medium closeup shot, off-center, a young, brunette, french woman, sitting at a Marble Table, wearing a black gucci dress and diamond necklace, in an Art Deco Dining Room with Velvet, Brass, and Mirror accents, Jewel Toned color pallete, West Elm, Chandelier, Restaurant, Evening, natural lighting, Fujifilm, Luxurious, Historical, 4k --seed 420 --ar 16:9 --stylize 1000
(no prompt given)