My LoRA Experiment Part 4
First, a fair warning to the curious minds:
This entire endeavor is for educational purposes only. I'm trying to unravel the mysteries behind this technology and gain a deeper understanding of it. I chose "Kiwi" from "Cyberpunk Edgerunners" as my subject because she's a complex character, perfect for diving into uncharted waters. I don't think I will be sharing this current iteration of LoRA, as I consider it a failure, and sharing future iterations is still undecided.
And before you ask, no, I'm not an expert on this, so don't treat this as a guide. This is just me experimenting and sharing my results and thoughts. Maybe one day, I'll look back at this and laugh at how little I knew.
You might be wondering why this is part 4. Well, it's because this is my latest round of experimentation and is fresh in my mind. I'll get around to creating parts 1-3 when I have time.
Continuing our journey through trial and error, this time I think I failed as well, but the result is 70% satisfactory. Not good enough for me, but maybe it's a step in the right direction. Let's dive into this round of experimentation and see what we can learn.
This time around, I went back to using my own machine (Yes, brace yourself for a future article on another method I've tried. It was an absolute disaster, but I'll write about it when I have the time).
So, for this experiment, I started from scratch and made several minor adjustments. First, I added two new images to the training dataset, bringing the total up to 35 images (Part 1 will explain the dataset and my thought process). I also changed the keyword from "CyberpunkKiwi" to "kiwi(cyberpunk)".
I opted for a different checkpoint to train from this time: anylora checkpoint with a clip size of 2. I reduced the number of repeats from 95 to 75, and chose a network ranking setting of 36 x 16, which produced compact model sizes of 37 MB per epoch. I'm not entirely sold on the idea that smaller model sizes are better just yet, so next time, I'll likely try around 64 to see if it makes a difference—especially since, surprise, this experiment was unsatisfactory and will need to be redone. I made some minor tweaks here and there, and set the training batch size to 4 (I'll discuss training batch sizes and my thoughts on them in Part 3). Additionally, I fiddled with the color settings again. The training process seemed much quicker this time, with 24 epochs completed in about 9 hours. I was probably getting around 2s/it or less which is a huge improvement considering I was just tweaking settings for better or worse.
Testing the LoRA Models
I initially planned to do the basic calibration check at 768 x 768, but that would've taken around 2 hours for 24 epochs! So I canceled that and dropped it to 512, which took about an hour instead. After waiting impatiently for that to finish, I busied myself with other tasks. Once it was complete and I inspected the results, I found myself both impressed and disappointed. I could see the glimmers of my final goal as well as the shortcomings! I'll post some of the images I used for observation after the explanations, just in case some people don't want to see that. As I explained in Part 1, the point of this experiment is to ultimately produce four states of a single character. Most importantly, I need to see if I can get her tattoos working properly. However from this point I will add an additional state, hairstyle. This will be one of the last things to try and test. She has a few scenes where her hair is wet and might be interesting to see what happens.
The Good:
I was absolutely blown away by her tattoo design this time! It was perfect! When I prompted it to draw Kiwi in a bikini, I could see her tattoo, and when I prompted it to draw her normally with clothes on, it didn't try to render the tattoo over the clothes. I'm also very happy with her head composition and overall design. She has a long neck, and the image is able to reproduce that – something I noticed in Part 2 or 3. I’ve been able to produce a few images with various settings which have impressed me. Enough so for me to start doing some more manual work on fixing them up. But I feel I usually am able to find a few with every batch I train, I guess it becomes a progress timestamp.
The Bad:
Well, this time around, there's quite a lot of artifacting, which indicates I overdid the training. I could barely get any of the later epochs to do anything, as they were too stiff. She also has a default weird pose when she's showing her tattoos. I think it's a cool pose, but it's hard to get it to do something else, which is another sign of overtraining. Overall, I'm quite unhappy with the outcome this time around, but I wouldn't say it's worse than before – there was definitely progress made.
What I've Learned This Time:
Firstly, those who might know about this sort of thing were probably already screaming at something I mentioned near the beginning. That's right, I changed the prompt to "kiwi(cyberpunk)"! This was a huge mistake! While it acknowledges the character as Kiwi, putting "cyberpunk" in brackets means the LoRA will emphasize that keyword making the image a cyberpunk image, unrelated to the game or Edgerunners, as it's now a common word. This could explain why I can't change the scenes and the other problems I'm having! So, in the next batch of training, this keyword is going away. Even though it's overtrained throughout the epochs, a strength of 0.6 produces something decent. I like the results at strengths 0.7 and 0.8, but there's artifacting, and it becomes increasingly harder to manipulate. From epoch 13 at a strength of 1, it's a complete failure. I don't feel comfortable sharing all my test data, so I won't, but I will cherry-pick some things. Even though I don't have a single image of her in a bikini in my dataset, I'm beyond happy with the outcome so far. Since Part 1 isn't published yet, basically, I only have three real image states of her: the majority are close-up face shots, a few body shots with her iconic red outfit on, and finally, 3 or 4 images of her nude. So, going by those 3 or 4 images to reproduce her tattoos is extremely impressive, considering it doesn't clip through what she's wearing. Overall, between this and the previous experiment, the end results seem to be producing two separate versions of Kiwi, even though the dataset is exactly the same (Part 3 is a must!). But I believe this is due to the tag I used; it's really messing things up this time. Previously, the tag was unique enough for it to identify what it was seeing and match it to that tag's name.
Next Steps and Lessons:
Moving forward, I'll definitely be more careful with the tags and prompts I use during training. The impact they have on the results is more significant than I initially realized. Furthermore, I'll continue to experiment with the number of epochs and strength settings to find the sweet spot that produces the best results without overtraining. It's a delicate balance, and I'm still learning.
In conclusion, while this round of experimentation hasn't been an outright success, it has provided me with valuable insights and a better understanding of the LoRA model. As frustrating as it can be, failure is just another stepping stone to success.
What I plan on doing next time: It's a failure, so retrain from zero, but don't you worry, I'm not giving up just yet. Next time, I want to try an image network size of 64, dropping the repeats to 10 with a maximum of 30 epochs. I want to try the same base model as I used this time. I think changing the network and repeats are already two major changes, and it's not a good way to debug. But, you know, in this case, the time it takes to train a LoRA makes it prohibitive to really experiment with all the settings at once. If it fails, obviously, we'll go back and retrain with different settings. Learning from your mistakes is the key in computers, after all!
Alright, time to look at the images and share my thoughts
First some artifacted face close ups!
What is going on here? Well, it's artifacted, but her mask comes out well, her eyes are messed up, and her hair is spot on. Seriously, what's up with that watermark in the top left? None of my images I prepared have a watermark on them, which is interesting, so it must be a model quirk. I prepared my image set myself since I'm learning, you know.
A freshly generated face image at Epoch 13, 0.7 strength just for this blog post
The face is too childish; the eyes are in a hacking state, but that wasn't prompted. Overall, I'm not too impressed by it; the facial lines should be sharper. I'm not going to bother re-prompting because, honestly, this is the result, and it contributes to my decision to declare it a failure.
Now, my favorite part! After looking at the test set of images I generated, which took over an hour (ugh!), I thought epoch 17 at 0.7 strength was a good image set, so I chose it. Let's examine the result under my keen and discerning eyes.
The prompt was quite simple: I wanted a yellow dress because that doesn't match her character, and I certainly won't have anything like that in my image set. The face is artifacted so badly, but I'm not too worried at this point. What I am super excited about is her thigh area; it shows her tattoos! It also combined her yellow dress with her default clothing. I must say her outfit is simply stunning – a happy little accident, indeed. Since this is a full-body shot at 768x768 on a messed-up data set, I could inpaint and correct her face, and perhaps I will since I'm starting to take a liking to this image. That shocked bodily expression and that weird toilet-looking thing behind her – simply fascinating.
Alright, let's see what happens when we prompt it again, but with less clothing:
Yes, that's right! At that epoch, the same strength, and perhaps the same seed, I got the same shocked bodily pose, but her tattoos came out great! It's unfortunate that they're black, but I think this epoch isn't too great for the tattoos. Also, there's her head. Her neck size is sort of alright, her mask is messed up, and again, her line lacks the sharper features. But this is an epoch problem I feel; it's too low. Once again, we have the unfortunate cyberpunk city as a backdrop.
And now, for the pièce de résistance, a bonus image that I got at a slightly higher epoch:
Epoch 19 at a strength of 0.8.
You might be wondering why I'm so fixated on the bikini aspect – it's because it proves that the model is capable of generating something that isn't in my training set. The majority of what you see here doesn't exist in my training data: the bikini, the blanket/towel, or any of these backgrounds. All I have to work with are a few tasteful nude pictures in a lighted setting. What impresses me about this image is that her tattoos are the correct color, they match the pattern, her hips are angled, and the pattern is conforming to it. It also didn't draw her body in a pinkish hue as I have in my data training set. This could explain the pose and what's happening here. Somehow it repeated something similar to this, but none of my images have this outstretched Dracula cape pose or this Dracula cape. So it's highly likely that once I lower the repeats, this intriguing occurrence will stop happening – unfortunate, but only testing will tell. Perhaps part 5? Anyway, at this point, I'm really not too worried about her face too much, as the majority of my images are close-ups of her face, and if I generate close-up shots, they kind of match expectations, but nothing major.
Sidenote - it seems that upon completing this article and further analyzing the data, I have come to a realization as to why she assumes such an odd pose resembling that of the infamous Dracula. It appears that in the majority of the images within my training set, she is holding some sort of object. Whether it be a cigarette, tablet, or drink, she seems to have a habit of grasping onto something. This is yet another peculiar characteristic of hers that I have uncovered. I suppose I shall have to rewatch this anime again and pay attention to this.
Anyway, I was going to end with the above, but there's been too much talk of bikinis. However, this has been more of a personal breakthrough in terms of experimenting with character states.
Here is generally what would come up when using the final LoRA:
Artifcated and discolored, eyes are alright. From this angle, it looks like it might be missing some of the sharper face lines, but the mask came out OK. Her standard clothes are in place, with her long-looking neck. Frankly, I'm unhappy with the result.