Run a command on your GNU/Linux dedicated Nvidia GPU (e.g. ollama)

To make sure a command (CLI) is using your NVIDIA GPU in GNU/Linux, you can use the environment variables __NV_PRIME_RENDER_OFFLOAD=1 and __GLX_VENDOR_LIBRARY_NAME=nvidia before your command.

Example:

__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia ollama run orca-mini

You can also verify that it works by running:

__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia glxinfo | grep vendor

Which should output something like:

server glx vendor string: NVIDIA Corporation
client glx vendor string: NVIDIA Corporation
OpenGL vendor string: NVIDIA Corporation

Getting Started

To get started with Ollama, you’ll need to ensure that your system meets the necessary requirements. This includes having a compatible operating system, such as Linux, Windows, or macOS, and a compatible graphics card, such as an NVIDIA GPU. You can check the system requirements on the Ollama website.

Once you’ve confirmed that your system meets the requirements, you can proceed to install Ollama. You can use the official installation script, which can be run in a terminal using the following command: curl -s https://raw.githubusercontent.com/ollama/ollama/master/install.sh | bash

Running Ollama on Nvidia GPU

To run Ollama on an NVIDIA GPU, you’ll need to ensure that you have the necessary drivers installed. You can check for NVIDIA drivers using the command: nvidia-smi or ubuntu-drivers list --gpgpu If you receive an error, you can install the required drivers using the command: sudo ubuntu-drivers install Once you’ve installed the drivers, you can run Ollama on your NVIDIA GPU using the following command: docker run -d --name ollama -p 11434:11434 -v /home/ollama:/home/ollama:z --gpus all ollama/ollama:gpu

Deploying a Web UI (Optional)

If you want to interact with Ollama using a web interface, you can deploy a web UI using Open WebUI. You can install Open WebUI using Docker by running the following command: docker run -d -p 8080:8080 open-webui/open-webui Once Open WebUI is installed, you can access it by opening a web browser and entering http://localhost:8080. You can create an account and select a model to interact with the LLM.

Optimizing Performance

To optimize the performance of Ollama, you can use several techniques. One way is to use a GPU-accelerated version of Ollama, which can significantly improve performance. You can also optimize the performance of Ollama by adjusting the batch size and sequence length.

Additionally, you can use tools like nvidia-smi to monitor the performance of your GPU and adjust the settings accordingly. You can also use docker stats to monitor the performance of your Docker containers and adjust the settings accordingly.

By following these tips, you can optimize the performance of Ollama and get the most out of your system.