Stable-diffusion.prg
Harbour wrappers for stable-diffusion.cpp, which allows generating images on text description, using an AI model.
- Quick review
- Models to use
- Compiling library
- Compiling samples
- Functions list
- Parameters
- Tests
- Links
Quick review
Stable Diffusion is a deep learning text-to-image model that was initially released in August 2022. The main goal of stable-diffusion.prg project is to provide possibility to create Harbour applications, which can generate images from text prompts, using appropriate AI models. The project provides sd and ggml libraries, which may be linked to your application. Under Windows it demands 64-bit MSVC compiler, under Linux/Unix - the standard 64-bit GNU C.
Models to use
There are many models, available for generating images from text, for example:
Realistic_Vision_V6.0_B1_noVAE
You can find many more here or with any search engine.
Compiling library
A preferred method to build the llama library and samples is HwBuilder - my utility, which builds programs, written on Harbour. An appropriate project file, sdlib.hwprj, is provided. sdlib.hwprj and other hwprj files supposes, that there is a section for 64-bit MSVC compiler in your copy of hwbuild.ini, you need to tune it:
[C_COMPILER_6]
id=msvc64
family=msvc
...
If you prefer to not use special utilities, you can build this library with following bat file:
Windows
$echo off
if not exist lib md lib
if not exist obj md obj
if not exist obj\ggml md obj\ggml
if not exist obj\ggml\msvc64 md obj\ggml\msvc64
if not exist obj\sd md obj\sd
if not exist obj\sd\msvc64 md obj\sd\msvc64
call "c:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" amd64
set CFLAGS=/TP /W3 /nologo /c /TP /W3 /nologo /EHsc /c -I./ -Iggml -Ithirdparty -Istable-diffusion.cpp -DLOG_DISABLE_LOGS=1 /W3 /WX- /diagnostics:column /O2 /Ob2 /D _MBCS /D WIN32 /D _WINDOWS /D NDEBUG /D _CRT_SECURE_NO_WARNINGS /Gm- /MD /GS /arch:AVX2 /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /std:c11 /external:W3 /Gd /TC /errorReport:queue
set CXXFLAGS=/TP /W3 /nologo /c /TP /W3 /nologo /EHsc /c -I./ -Iggml -Ithirdparty -Istable-diffusion.cpp -DLOG_DISABLE_LOGS=1
set HB_PATH=c:\harbour
set OBJ=obj\sd\msvc64
set OBJLIST=%OBJ%\hsd.obj %OBJ%\stable-diffusion.obj %OBJ%\model.obj %OBJ%\util.obj %OBJ%\upscaler.obj %OBJ%\zip.obj
cl.exe %CXXFLAGS% /I%HB_PATH%\include /Fo%OBJ%\hsd.obj source\hsd.cpp
cl.exe %CXXFLAGS% /Fo%OBJ%\stable-diffusion.obj stable-diffusion.cpp\stable-diffusion.cpp
cl.exe %CXXFLAGS% /Fo%OBJ%\model.obj stable-diffusion.cpp\model.cpp
cl.exe %CXXFLAGS% /Fo%OBJ%\util.obj stable-diffusion.cpp\util.cpp
cl.exe %CXXFLAGS% /Fo%OBJ%\upscaler.obj stable-diffusion.cpp\upscaler.cpp
cl.exe %CFLAGS% /Fo%OBJ%\zip.obj thirdparty\zip.c
lib /out:lib\sd.lib %OBJLIST%
set OBJ=obj\ggml\msvc64
set OBJLIST=%OBJ%\ggml.obj %OBJ%\ggml-alloc.obj %OBJ%\ggml-backend.obj %OBJ%\ggml-quants.obj %OBJ%\ggml-aarch64.obj %OBJ%\hcommon.obj
cl.exe %CFLAGS% /Fo%OBJ%\ggml.obj ggml\ggml.c
cl.exe %CFLAGS% /Fo%OBJ%\ggml-alloc.obj ggml\ggml-alloc.c
cl.exe %CFLAGS% /Fo%OBJ%\ggml-backend.obj ggml\ggml-backend.c
cl.exe %CFLAGS% /Fo%OBJ%\ggml-quants.obj ggml\ggml-quants.c
cl.exe %CFLAGS% /Fo%OBJ%\ggml-aarch64 llama.cpp\ggml-aarch64.c
cl.exe %CFLAGS% /I%HB_PATH%\include /Fo%OBJ%\hcommon.obj source\hcommon.c
lib /out:lib\ggml.lib %OBJLIST%
Of course, you need to use your paths.
Linux
#!/bin/bash
if ! [ -e lib ]; then
mkdir lib
chmod a+w+r+x lib
fi
if ! [ -e obj ]; then
mkdir obj
chmod a+w+r+x obj
fi
if ! [ -e obj/sd ]; then
mkdir obj/sd
chmod a+w+r+x obj/sd
fi
if ! [ -e obj/sd/gcc ]; then
mkdir obj/sd/gcc
chmod a+w+r+x obj/sd/gcc
fi
if ! [ -e obj/ggml ]; then
mkdir obj/ggml
chmod a+w+r+x obj/ggml
fi
if ! [ -e obj/ggml/gcc ]; then
mkdir obj/ggml/gcc
chmod a+w+r+x obj/ggml/gcc
fi
# Set your Harbour path here
export HRB_DIR=/home/guest/apps/harbour
export CFLAGS="-c -Wall -fPIC -O3 -Wall -pthread -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -I. -Iggml -Ithirdparty -Istable-diffusion.cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DLOG_DISABLE_LOGS=1 -c -I$HRB_DIR/include"
export CXXFLAGS="-c -Wall -std=c++11 -fPIC -O3 -Wall -pthread -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -I. -Iggml -Ithirdparty -Istable-diffusion.cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DLOG_DISABLE_LOGS=1 -c -xc++ -I$HRB_DIR/include"
set OBJ=obj/sd/gcc
set OBJLIST=%OBJ%/hsd.obj %OBJ%/stable-diffusion.obj %OBJ%/model.obj %OBJ%/util.obj %OBJ%/upscaler.obj %OBJ%/zip.obj
gcc $CXXFLAGS -o$OBJ/hsd.o source/hsd.cpp
gcc $CXXFLAGS -o$OBJ/stable-diffusion.o stable-diffusion.cpp/stable-diffusion.cpp
gcc $CXXFLAGS -o$OBJ/model.o stable-diffusion.cpp/model.cpp
gcc $CXXFLAGS -o$OBJ/util.o stable-diffusion.cpp/util.cpp
gcc $CXXFLAGS -o$OBJ/upscaler.o stable-diffusion.cpp/upscaler.cpp
gcc $CFLAGS -o$OBJ/zip.o thirdparty/zip.c
gcc $CFLAGS -o$OBJ/hcommon.o source/hcommon.c
gcc $CFLAGS -o$OBJ/ggml.o llama.cpp/ggml.c
gcc $CFLAGS -o$OBJ/ggml-alloc.o llama.cpp/ggml-alloc.c
gcc $CFLAGS -o$OBJ/ggml-backend.o llama.cpp/ggml-backend.c
gcc $CFLAGS -o$OBJ/ggml-quants.o llama.cpp/ggml-quants.c
gcc $CFLAGS -o$OBJ/ggml-aarch64.o llama.cpp/ggml-aarch64.c
ar rc lib/libllama.a $OBJLIST
ar rc lib/libggml.a $OBJ/ggml.o $OBJ/ggml-alloc.o $OBJ/ggml-backend.o $OBJ/ggml-quants.o $OBJ/ggml-aarch64.o $OBJ/hcommon.o
Compiling samples
It is better to use HwBuilder to build a sample application - test1.hwprj is provided, but you may use the following bat file:
Windows
@echo off
call "c:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" amd64
set HB_PATH=c:\harbour
set HB_LIBS=gtwvt.lib hbvm.lib hbrtl.lib gtgui.lib gtwin.lib hbcpage.lib hblang.lib hbrdd.lib hbmacro.lib hbpp.lib rddntx.lib rddcdx.lib rddfpt.lib hbsix.lib hbcommon.lib hbct.lib hbcplr.lib hbpcre.lib hbzlib.lib
set SD_LIBS=sd.lib ggml.lib
set VC_LIBS=ucrt.lib user32.lib gdi32.lib comdlg32.lib shell32.lib comctl32.lib winspool.lib advapi32.lib winmm.lib ws2_32.lib iphlpapi.lib OleAut32.Lib Ole32.Lib
%HB_PATH%\bin\harbour -n -q -i%HB_PATH%\include test1.prg
cl.exe /TP /W3 /nologo /c /I%HB_PATH%\include /Fotest1.obj test1.c
link /NODEFAULTLIB:libucrt.lib /NODEFAULTLIB:msvcrt.lib /INCREMENTAL:NO /NOLOGO /SUBSYSTEM:CONSOLE /TLBID:1 /MANIFEST /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /manifest:embed /DYNAMICBASE /NXCOMPAT /MACHINE:X64 /machine:x64 /LIBPATH:d:\harbour_2023\lib\win\msvc64 /LIBPATH:d:\harbour_2023\hwgui\lib\vc /LIBPATH:lib test1.obj %HB_LIBS% %SD_LIBS% %VC_LIBS%
Linux
#!/bin/bash
# Set your Harbour path here
export HRB_DIR=/home/guest/apps/harbour
$HRB_DIR/bin/linux/gcc/harbour -n -q -i$HRB_DIR/include test1.prg
gcc -c -Wall -I$HRB_DIR/include -otest1.o test1.c
gcc test1.o -otest1 -L$HRB_DIR/lib/linux/gcc -Llib -Wl,--start-group -lgttrm -lhbvm -lhbrtl -lgtcgi -lgttrm -lhbcpage -lhblang -lhbrdd -lhbmacro -lhbpp -lrddntx -lrddcdx -lrddfpt -lhbsix -lhbcommon -lhbct -lhbcplr -lsd -lggml -lpthread -lm -lz -lpcre -ldl -Wl,--end-group -fPIC -O3 -Wall -lstdc++ -shared-libgcc
Functions list
sd_set_params( cParams, [lAdd] ) –> nSuccess
The function sets the image generation parameters. Parameters list cParams is a string with name=value pairs, divided by ~ character:
name1=value1~name2=value2~…
See the list of parameters in appropriate section of this manual.
If lAdd is missed or is False, all parameters are set to default values before applying new.
Return value nSuccess is 0 if the function has completed successfully.
sd_open_model( cModel, [lImg2Img] ) –> nSuccess
The function opens stable diffusion model cModel, lImg2Img must be set to true, if Img2Img operations will be used with the model.
Return value nSuccess is 0 if the function has completed successfully.
sd_close_model()
Closes previously opened model.
sd_convert( cModelFrom, cModelTo, cType ) –> nSuccess
Converts cpkt/safetensors model cModelFrom to gguf cModelTo. cType must be one of f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0.
Return value nSuccess is 0 if the function has completed successfully.
sd_txt_to_img( cPrompt ) –> nSuccess
Generates an image on base of passed prompt.
Return value nSuccess is 0 if the function has completed successfully.
sd_img_to_img( cImageFrom, cPrompt ) –> nSuccess
Generates an image on base of another image and a prompt. The Img2img feature works the exact same way as txt2img, the only difference is that you provide an image to be used as a starting point instead of the noise generated by the seed number.
Return value nSuccess is 0 if the function has completed successfully.
sd_print_params() –> cCurrentParams
Returns a formatted text with a current parameters.
sd_print_usage() –> cOptions
Returns a text with list of parameters to use.
sd_print_sysinfo() –> cSysinfo
Returns a text with the system information.
Parameters
Below is a list of parameters, which may be used currently in stable-diffusion.prg. The description is borrowed from stable-diffusion.cpp/README.md.
- t, threads=N number of threads to use during computation (default: -1)
If threads <= 0, then threads will be set to the number of CPU
physical cores
- vae=[VAE] path to vae
- taesd=[TAESD_PATH] path to taesd. Using Tiny AutoEncoder for fast decoding (low quality)
- control-net=[CONTROL_PATH] path to control net model
- embd-dir=[EMBEDDING_PATH] path to embeddings
- stacked-id-embd-dir=[DIR] path to PHOTOMAKER stacked id embeddings
- input-id-images-dir=[DIR] path to PHOTOMAKER input id images dir
- normalize-input normalize PHOTOMAKER input id images
- upscale-model=[ESRGAN_PATH] path to esrgan model. Upscale images after generate,
just RealESRGAN_x4plus_anime_6B supported by now
- upscale-repeats Run the ESRGAN upscaler this many times (default 1)
- type=[TYPE] (for sd_convert) weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)
If not specified, the default is the type of the weight file
- lora-model-dir=[DIR] lora model directory
- i, init-img=[IMAGE] path to the input image, required by img2img
- control-image=[IMAGE] path to image condition, control net
- o, output=OUTPUT path to write result image to (default: ./output.png)
- p, prompt=[PROMPT] the prompt to render
- n, negative-prompt=PROMPT the negative prompt (default: "")
- cfg-scale=SCALE unconditional guidance scale: (default: 7.0)
- strength=STRENGTH (for sd_img_to_img) strength for noising/unnoising (default: 0.75)
- style-ratio=STYLE-RATIO strength for keeping input identity (default: 20%%)
- control-strength=STRENGTH (for sd_img_to_img) strength to apply Control Net (default: 0.9)
1.0 corresponds to full destruction of information in init image
- H, height=H image height, in pixel space (default: 512)
- W, width=W image width, in pixel space (default: 512)
- sampling-method={euler, euler_a, heun, dpm2, dpm++2s_a, dpm++2m, dpm++2mv2, lcm}
sampling method (default: euler_a)
- steps=STEPS number of sample steps (default: 20)
- rng={std_default, cuda} RNG (default: cuda)
- s=SEED, seed=SEED RNG seed (default: -1, use random seed for < 0)
- b, batch-count=COUNT number of images to generate
- schedule={discrete, karras} Denoiser sigma schedule (default: discrete)
- clip-skip=N ignore last layers of CLIP network; 1 ignores none, 2 ignores one layer (default: -1)
<= 0 represents unspecified, will be 1 for SD1.x, 2 for SD2.x
- vae-tiling process vae in tiles to reduce memory usage
- control-net-cpu keep controlnet in cpu (for low vram)
- canny apply canny preprocessor (edge detection)
- v, verbose print extra info
A bit more explanation about some of the parameters:
-
n, negative-prompt - it is the opposite of a prompt, something what you DO NOT want Stable Diffusion to generate, for example: lowres, error, cropped, worst quality, low quality, jpeg artifacts, out of frame, watermark, signature.
-
steps - Stable Diffusion creates an image by starting with a canvas full of noise and denoise it gradually to reach the final output. This parameter controls the number of these denoising steps. Usually, higher is better but to a certain degree.
-
sampling-method - it is the algorithm that take the generated image after each step and compare it to what the text prompt requested, and then add a few changes to the noise till it gradually reaches an image that matches the text description.
-
cfg-scale - This parameter can be seen as the �Creativity vs. Prompt� scale. Lower numbers give the AI more freedom to be creative, while higher numbers force it to stick more to the prompt. The default CFG is 7, which gives the best balance between creativity and generating what you want. Going lower than 5 is generally not recommended as the images might start to look more like AI hallucinations, and going above 16 might start to give images with ugly artifacts.
-
-s, seed - is a number that decided the initial random noise, and since the random noise is what determines the final image, it is the reason you get a different image each time you run the exact same prompt, and why you get the same generated image if you run the same seed with the same prompt multiple times. The value -1 (default here) gives a random choice each time you generate an image.
-
strength - it is used in sd_img_to_img() function. Noise is added to the image you use as an init image for img2img, and then the diffusion process continues according to the prompt. The amount of noise added depends on the strength parameter, which ranges from 0 to 1, where 0 adds no noise at all and you will get the exact image you added, and 1 completely replaces the image with noise and almost acts as if you used normal txt2img instead of img2img.
-
vae - this is a path to a special vae model, which is loaded additionally to a main model and is tuned for some specific tasks: drawing the eyes, increasing color saturation, etc.
-
taesd - is tiny distilled version of vae.
-
control-net - is a neural network that controls image generation in Stable Diffusion by adding extra conditions, see this article for more explanation.
-
upscale-model - a path to special model to improve the quality of images generated by Stable Diffusion, see this article for more explanation.
-
embd-dir - embedding, also called textual inversion, is an alternative way to control the style of your images in Stable Diffusion. See this article for more explanation.
Tests
test1.prg
This test generates one image file:
test1 <path_to_model> [cParamsString]
test2.prg
This one opens a model and generates as many images as you want, asking for a prompt every time:
test2 <path_to_model> [cParamsString]
test3.prg
This test demonstrates img2img method, it generates an image on base of another image, created by stable-diffusion:
test3 <path_to_model> <path_to_input_image> [cParamsString]
test4.prg
Converts models in the formats ckpt/safetensors/diffusers to gguf and perform quantization in advance:
test4 <path_to_model_from> <path_to_model_to>