[ML] compiling llama.cpp with conda installed cuda env
2 min readMay 11, 2024
Introduction
Compiling llama.cpp runtime with cuda installed in a custom cuda env (would usually use docker container, but was in specific circumstances)
- Situation: given a container with CUDA 11.8, but can’t modify it or install other cuda versions
- no docker access, needed to run binary directly
- → Make conda env with CUDA 12.1 & compile
Step 0. Making conda env with desired CUDA version
Making conda env with CUDA 12.1 (below process needs to be changed for different versions)
conda create -n cuda12_env python=3.11 -c nvidia/label/cuda-12.1.1
conda install -y cuda cuda-runtime cuda-cudart cuda-cudart-dev cuda-toolkit cuda-nvcc libcublas libnvjpeg libnvjitlink -c nvidia/label/cuda-12.1.1
# install cuda runtime, toolkit, nvcc
conda install -y nvidia/label/cuda-12.1.1::cuda
conda install -y nvidia/label/cuda-12.1.1::cuda-cudart
- to use other cuda versions or if you need additional installs → check https://anaconda.org/nvidia/repo to get the tag with desired cuda version
Check nvcc is looking at the version in conda env location
# System version
(base) root@...:~/...# nvcc -V
...
Cuda compilation tools, release 11.8, V11.8.89
(base) root@...:~/...# which nvcc
/opt/conda/bin/nvcc
# After activation
(cuda12_env) root@...:~/...# nvcc -V
...
Cuda compilation tools, release 12.1, V12.1.105
(cuda12_env) root@...:~/...# which nvcc
{CONDA-ENV-LOCATION}/bin/nvcc
Step 1. Compiling llama.cpp with conda env version
If needed upgrade cmake version
- cmake version 3.23.3 caused errors for me → upgraded to 3.29.4
- https://github.com/NVlabs/instant-ngp/issues/196
pip install cmake --upgrade
cmake --version
>>> cmake version 3.29.3
- If it gives “CMake Error: Could not find CMAKE_ROOT !!!” deactivate and activate conda env
Start the build process inside the cloned llama.cpp folder
mkdir build
cd build
cmake .. -DLLAMA_CUDA=ON \
-DCUDAToolkit_ROOT="{CONDA-ENV-LOCATION}/bin" \
-DLLAMA_CUDA_F16=ON
- cmake will use nvcc found inside “CUDAToolkit_ROOT”
- https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html
- If there are errors related to cuda versions (ex. “cuda_device_runtime.o’ newer than toolkit (124 vs 121)) → conda list to check any cuda related packages that has different version than intended cuda (ex. 12.4.. instead of 12.1…) then install 12.1 counterpart of the package
- If there are changes to conda packages, remove the build folder then start over
Log should be like
(cuda12_env) root@c96a6cbb40e9:/workspace/llama.cpp/build# make .. -DLLAMA_CUDA=ON -DCUDAToolkit_ROOT="/opt/conda/envs/cuda12_env/bin" -DLLAMA_CUDA_F16=ON
-- Found CUDAToolkit: /opt/conda/envs/cuda12_env/include (found version "12.1.105")
-- CUDA found
-- The CUDA compiler identification is NVIDIA 12.1.105
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /opt/conda/envs/cuda12_env/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Using CUDA architectures: 60;61;70
-- CUDA host compiler is GNU 9.4.0
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with LLAMA_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done
-- Generating done
-- Build files have been written to: /workspace/llama.cpp/build
Then run the build process from parent directory
cd ..
cmake --build build --config Release
Log should look like
(cuda12_env) root@c96a6cbb40e9:/workspace/llama.cpp# cmake --build build --config Release
[ 0%] Building C object CMakeFiles/ggml.dir/ggml.c.o
[ 1%] Building C object CMakeFiles/ggml.dir/ggml-alloc.c.o
[ 1%] Building C object CMakeFiles/ggml.dir/ggml-backend.c.o
...
[100%] Linking CXX executable ../../bin/q8dot
[100%] Built target q8dot
(cuda12_env) root@c96a6cbb40e9:/workspace/llama.cpp# ll build/bin/
...
-rwxr-xr-x 1 root root 33608328 May 12 03:05 main*
Step 2. Test built binary
cd build/bin
./main -ngl 100 -m $MODEL_FPATH
...