[ML] compiling llama.cpp with conda installed cuda env

2 min readMay 11, 2024

Introduction

Compiling llama.cpp runtime with cuda installed in a custom cuda env (would usually use docker container, but was in specific circumstances)

Situation: given a container with CUDA 11.8, but can’t modify it or install other cuda versions
no docker access, needed to run binary directly
→ Make conda env with CUDA 12.1 & compile

Step 0. Making conda env with desired CUDA version

Making conda env with CUDA 12.1 (below process needs to be changed for different versions)

conda create -n cuda12_env python=3.11 -c nvidia/label/cuda-12.1.1
conda install -y cuda cuda-runtime cuda-cudart cuda-cudart-dev cuda-toolkit cuda-nvcc libcublas libnvjpeg libnvjitlink -c nvidia/label/cuda-12.1.1
# install cuda runtime, toolkit, nvcc
conda install -y nvidia/label/cuda-12.1.1::cuda
conda install -y nvidia/label/cuda-12.1.1::cuda-cudart

to use other cuda versions or if you need additional installs → check https://anaconda.org/nvidia/repo to get the tag with desired cuda version

Check nvcc is looking at the version in conda env location

# System version
(base) root@...:~/...# nvcc -V
...
Cuda compilation tools, release 11.8, V11.8.89
(base) root@...:~/...# which nvcc
/opt/conda/bin/nvcc
# After activation
(cuda12_env) root@...:~/...# nvcc -V
...
Cuda compilation tools, release 12.1, V12.1.105
(cuda12_env) root@...:~/...# which nvcc
{CONDA-ENV-LOCATION}/bin/nvcc

Step 1. Compiling llama.cpp with conda env version

If needed upgrade cmake version

cmake version 3.23.3 caused errors for me → upgraded to 3.29.4
https://github.com/NVlabs/instant-ngp/issues/196

pip install cmake --upgrade
cmake --version
>>> cmake version 3.29.3

If it gives “CMake Error: Could not find CMAKE_ROOT !!!” deactivate and activate conda env

Start the build process inside the cloned llama.cpp folder

mkdir build
cd build
cmake .. -DLLAMA_CUDA=ON \
-DCUDAToolkit_ROOT="{CONDA-ENV-LOCATION}/bin" \
-DLLAMA_CUDA_F16=ON

cmake will use nvcc found inside “CUDAToolkit_ROOT”
https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html
If there are errors related to cuda versions (ex. “cuda_device_runtime.o’ newer than toolkit (124 vs 121)) → conda list to check any cuda related packages that has different version than intended cuda (ex. 12.4.. instead of 12.1…) then install 12.1 counterpart of the package
If there are changes to conda packages, remove the build folder then start over

Log should be like

(cuda12_env) root@c96a6cbb40e9:/workspace/llama.cpp/build# make .. -DLLAMA_CUDA=ON -DCUDAToolkit_ROOT="/opt/conda/envs/cuda12_env/bin" -DLLAMA_CUDA_F16=ON
-- Found CUDAToolkit: /opt/conda/envs/cuda12_env/include (found version "12.1.105") 
-- CUDA found
-- The CUDA compiler identification is NVIDIA 12.1.105
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /opt/conda/envs/cuda12_env/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Using CUDA architectures: 60;61;70
-- CUDA host compiler is GNU 9.4.0

-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with LLAMA_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done
-- Generating done
-- Build files have been written to: /workspace/llama.cpp/build

Then run the build process from parent directory

cd ..
cmake --build build --config Release

Log should look like

(cuda12_env) root@c96a6cbb40e9:/workspace/llama.cpp# cmake --build build --config Release
[  0%] Building C object CMakeFiles/ggml.dir/ggml.c.o
[  1%] Building C object CMakeFiles/ggml.dir/ggml-alloc.c.o
[  1%] Building C object CMakeFiles/ggml.dir/ggml-backend.c.o
...
[100%] Linking CXX executable ../../bin/q8dot
[100%] Built target q8dot
(cuda12_env) root@c96a6cbb40e9:/workspace/llama.cpp# ll build/bin/
...
-rwxr-xr-x 1 root root 33608328 May 12 03:05 main*

Step 2. Test built binary

cd build/bin
./main -ngl 100 -m $MODEL_FPATH
...

[ML] compiling llama.cpp with conda installed cuda env

Introduction

Step 0. Making conda env with desired CUDA version

Step 1. Compiling llama.cpp with conda env version

Step 2. Test built binary

Written by Youngrok Song