[ML] compiling llama.cpp with conda installed cuda env

Youngrok Song
2 min readMay 11, 2024

--

Introduction

Compiling llama.cpp runtime with cuda installed in a custom cuda env (would usually use docker container, but was in specific circumstances)

  • Situation: given a container with CUDA 11.8, but can’t modify it or install other cuda versions
  • no docker access, needed to run binary directly
  • → Make conda env with CUDA 12.1 & compile

Step 0. Making conda env with desired CUDA version

Making conda env with CUDA 12.1 (below process needs to be changed for different versions)

conda create -n cuda12_env python=3.11 -c nvidia/label/cuda-12.1.1
conda install -y cuda cuda-runtime cuda-cudart cuda-cudart-dev cuda-toolkit cuda-nvcc libcublas libnvjpeg libnvjitlink -c nvidia/label/cuda-12.1.1
# install cuda runtime, toolkit, nvcc
conda install -y nvidia/label/cuda-12.1.1::cuda
conda install -y nvidia/label/cuda-12.1.1::cuda-cudart

Check nvcc is looking at the version in conda env location

# System version
(base) root@...:~/...# nvcc -V
...
Cuda compilation tools, release 11.8, V11.8.89
(base) root@...:~/...# which nvcc
/opt/conda/bin/nvcc
# After activation
(cuda12_env) root@...:~/...# nvcc -V
...
Cuda compilation tools, release 12.1, V12.1.105
(cuda12_env) root@...:~/...# which nvcc
{CONDA-ENV-LOCATION}/bin/nvcc

Step 1. Compiling llama.cpp with conda env version

If needed upgrade cmake version

pip install cmake --upgrade
cmake --version
>>> cmake version 3.29.3
  • If it gives “CMake Error: Could not find CMAKE_ROOT !!!” deactivate and activate conda env

Start the build process inside the cloned llama.cpp folder

mkdir build
cd build
cmake .. -DLLAMA_CUDA=ON \
-DCUDAToolkit_ROOT="{CONDA-ENV-LOCATION}/bin" \
-DLLAMA_CUDA_F16=ON
  • cmake will use nvcc found inside “CUDAToolkit_ROOT
  • https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html
  • If there are errors related to cuda versions (ex. “cuda_device_runtime.o’ newer than toolkit (124 vs 121)) → conda list to check any cuda related packages that has different version than intended cuda (ex. 12.4.. instead of 12.1…) then install 12.1 counterpart of the package
  • If there are changes to conda packages, remove the build folder then start over

Log should be like

(cuda12_env) root@c96a6cbb40e9:/workspace/llama.cpp/build# make .. -DLLAMA_CUDA=ON -DCUDAToolkit_ROOT="/opt/conda/envs/cuda12_env/bin" -DLLAMA_CUDA_F16=ON
-- Found CUDAToolkit: /opt/conda/envs/cuda12_env/include (found version "12.1.105")
-- CUDA found
-- The CUDA compiler identification is NVIDIA 12.1.105
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /opt/conda/envs/cuda12_env/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Using CUDA architectures: 60;61;70
-- CUDA host compiler is GNU 9.4.0

-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with LLAMA_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done
-- Generating done
-- Build files have been written to: /workspace/llama.cpp/build

Then run the build process from parent directory

cd ..
cmake --build build --config Release

Log should look like

(cuda12_env) root@c96a6cbb40e9:/workspace/llama.cpp# cmake --build build --config Release
[ 0%] Building C object CMakeFiles/ggml.dir/ggml.c.o
[ 1%] Building C object CMakeFiles/ggml.dir/ggml-alloc.c.o
[ 1%] Building C object CMakeFiles/ggml.dir/ggml-backend.c.o
...
[100%] Linking CXX executable ../../bin/q8dot
[100%] Built target q8dot
(cuda12_env) root@c96a6cbb40e9:/workspace/llama.cpp# ll build/bin/
...
-rwxr-xr-x 1 root root 33608328 May 12 03:05 main*

Step 2. Test built binary

cd build/bin
./main -ngl 100 -m $MODEL_FPATH
...

--

--