Running Stable Diffusion in pure C/C++ in Termux on Android
I came across this repository leejet/stable-diffusion.cpp, which is an implementation of Stable Diffusion in pure C/C++, and I decided to try it out on my Android phone to see the performance of text-to-image generator.
The instruction in the repository is already very detailed and straight-forward. I just needed to run few commands below in Termux to install necessary components before I followed the instruction.
# For bulding the project apt git cmake -y # For converting weights apt install rust tur-repo apt install python-pandas python-torch -y
I used Stable Diffusion v1.5 but in half precision mode (fp16) only. It took around 55 minutes to generate a 512×512 image on my phone (Snapdragon 888 chipset with 8GB RAM). I had to remove the phone case and slapped a mini USB fan to the back of the phone so that it wouldn’t get overheat.
The result:
The log:
./bin/sd -m ~/storage/shared/v1-5-pruned-emaonly-ggml-model-f16.bin -p "a lovely cat" [INFO] stable-diffusion.cpp:2687 - loading model from '/data/data/com.termux/files/home/storage/shared/v1-5-pruned-emaonly-ggml-model-f16.bin' [INFO] stable-diffusion.cpp:2712 - ftype: f16 [INFO] stable-diffusion.cpp:2941 - total params size = 1969.97MB (clip 235.01MB, unet 1640.45MB, vae 94.51MB) [INFO] stable-diffusion.cpp:2943 - loading model from '/data/data/com.termux/files/home/storage/shared/v1-5-pruned-emaonly-ggml-model-f16.bin' completed, taking 13.11s [INFO] stable-diffusion.cpp:3066 - condition graph use 239.58MB of memory: params 235.01MB, runtime 4.57MB (static 1.64MB, dynamic 2.93MB) [INFO] stable-diffusion.cpp:3066 - condition graph use 239.58MB of memory: params 235.01MB, runtime 4.57MB (static 1.64MB, dynamic 2.93MB) [INFO] stable-diffusion.cpp:3552 - get_learned_condition completed, taking 3.01s [INFO] stable-diffusion.cpp:3568 - start sampling [INFO] stable-diffusion.cpp:3260 - step 1 sampling completed, taking 99.22s [INFO] stable-diffusion.cpp:3260 - step 2 sampling completed, taking 110.11s [INFO] stable-diffusion.cpp:3260 - step 3 sampling completed, taking 108.13s [INFO] stable-diffusion.cpp:3260 - step 4 sampling completed, taking 103.45s [INFO] stable-diffusion.cpp:3260 - step 5 sampling completed, taking 104.38s [INFO] stable-diffusion.cpp:3260 - step 6 sampling completed, taking 102.38s [INFO] stable-diffusion.cpp:3260 - step 7 sampling completed, taking 102.27s [INFO] stable-diffusion.cpp:3260 - step 8 sampling completed, taking 108.72s [INFO] stable-diffusion.cpp:3260 - step 9 sampling completed, taking 99.60s [INFO] stable-diffusion.cpp:3260 - step 10 sampling completed, taking 99.32s [INFO] stable-diffusion.cpp:3260 - step 11 sampling completed, taking 189.10s [INFO] stable-diffusion.cpp:3260 - step 12 sampling completed, taking 214.05s [INFO] stable-diffusion.cpp:3260 - step 13 sampling completed, taking 183.40s [INFO] stable-diffusion.cpp:3260 - step 14 sampling completed, taking 203.24s [INFO] stable-diffusion.cpp:3260 - step 15 sampling completed, taking 219.05s [INFO] stable-diffusion.cpp:3260 - step 16 sampling completed, taking 219.44s [INFO] stable-diffusion.cpp:3260 - step 17 sampling completed, taking 241.86s [INFO] stable-diffusion.cpp:3260 - step 18 sampling completed, taking 215.12s [INFO] stable-diffusion.cpp:3260 - step 19 sampling completed, taking 219.98s [INFO] stable-diffusion.cpp:3260 - step 20 sampling completed, taking 220.93s [INFO] stable-diffusion.cpp:3287 - diffusion graph use 2264.22MB of memory: params 1640.45MB, runtime 623.77MB (static 69.56MB, dynamic 554.21MB) [INFO] stable-diffusion.cpp:3573 - sampling completed, taking 3163.83s [INFO] stable-diffusion.cpp:3496 - vae graph use 2271.63MB of memory: params 94.51MB, runtime 2177.12MB (static 1153.12MB, dynamic 1024.00MB) [INFO] stable-diffusion.cpp:3586 - decode_first_stage completed, taking 197.78s [INFO] stable-diffusion.cpp:3600 - txt2img completed in 3364.61s, use 2358.73MB of memory: peak params memory 1969.97MB, peak runtime memory 2177.12MB save result image to 'output.png'
Memory usage during inference steps:

