You have bought a new AI workstation and are excited about using it. You are always dreaming about training models and winning Kaggle competitions with your workstation. Before deep diving into training your AI models, you should set up your AI development environment. But how? Which software and frameworks should you install?
This comprehensive AI workstation guideline for setting up a machine learning, deep learning, and computer vision development environment. I will update this post regularly keep it updated. Also, this tutorial is prepared for workstations that contain NVIDIA GPUs. So, if you are using AMD GPU, please, get rid of it and buy an NVIDIA GPU.
Required Software Stack for AI Workstation
- Ubuntu 20.04 LTS Desktop
- NVIDIA GPU Driver
- NVIDIA Container Toolkit
- TensorFlow and PyTorch Images
Benefits of Working With Containerized Environment
Before starting to set up, I want to clarify some points. Why are we using Docker and containers? Because, generally, we use official Docker images. So, we use standardized, more stable, and optimized environments and dependencies. You can move your code from your workstation to the cloud or another computer and run with these images effortlessly. Also, you can switch your code between different versions of frameworks easily.
You don’t have to use containers, but I recommend them. Once you get used to using containers, you never quit using them.
AI Workstation Setup Steps
Let’s start setting up our AI workstation!
Install Ubuntu 20.04
We need a clean and fresh Ubuntu 20.04 installation. Why are we not using 22.04? Because it is not stable for AI development currently. I recommend removing the old OS completely and installing Ubuntu 20.04 LTS from the beginning. Also, I recommend selecting minimal installation and not doing updates while installing.
Download the Ubuntu 20.04 LTS ISO image from here. After that, flash the Ubuntu ISO to a USB disk using balenaEtcher. Then, plug in your USB to your workstation and boot from the USB. You can follow the official Ubuntu installation tutorial from here.
If you successfully installed Ubuntu power on your workstation, log in and open a terminal. We will continue the installation from it. Also, be sure to have an internet connection.
Install Required Softwares
sudo apt-get update sudo apt-get upgrade sudo apt-get install -y build-essential cmake make gcc g++ curl wget htop iotop lm-sensors stress git
Install NVIDIA GPU Driver
We will download the NVIDIA GPU driver from here. Select your GPU model, Linux 64-bit, and Production Branch options and click the search button. On the next page, download the ‘.run’ file containing the driver. It is about 370MB.
Now navigate to the directory where you downloaded the driver file, change its permissions, and run it.
cd ~/Downloads sudo chmod +x NVIDIA-Linux-x86_64-515.76.run sudo ./NVIDIA-Linux-x86_64-515.76.run
After running the third command, the installation window will be prompted. Until installation fails, always press enter and continue. When the installation fails, select the yes option and reboot your workstation. After rebooting, run the last command again. The installation will be completed successfully. Then, reboot your workstation again.
After the reboot, open the terminal again and run nvidia-smi.
If you see GPU usage and statistics after running this command, you successfully installed the NVIDIA GPU driver.
Install Docker & NVIDIA Container Toolkit
curl https://get.docker.com | sh && sudo systemctl --now enable docker distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update sudo apt-get install -y nvidia-docker2 sudo systemctl restart docker sudo groupadd docker sudo usermod -aG docker $USER
After running these steps, restart your workstation again. Then, open the terminal again and run the code below.
docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
If you see the same GPU statistics screen after driver installation, you successfully installed the Docker and NVIDIA Docker.
Install PyTorch & TensorFlow Docker Image
We will pull TensorFlow and PyTorch images from NGC Catalog and NVIDIA Container Registry. You can select a version you need from the tags tab. I will continue with the latest. Each container is almost the size of 6 GB. Downloading can take some time.
docker pull nvcr.io/nvidia/pytorch:22.09-py3 docker pull nvcr.io/nvidia/tensorflow:22.09-tf1-py3
After the completion, you can continue to the next section.
How to Use Docker Images with Jupyter Notebooks
If you have completed all steps up to now, your workstation is ready to work! We will now create a new container from the images we downloaded.
mkdir ~/Code docker run --name tf1-py3-2209 --gpus all -ti --ipc=host --net=host -v ~/Code/:/Code/ nvidia/tensorflow:22.09-tf1-py3 docker exec -it tf1-py3-2209 sh
After running the last command, you will connect our TensorFlow container, and all commands we executed with this terminal will be run in our container anymore. Let’s start jupyter lab in our container. It is already installed in our container.
jupyter lab --ip 0.0.0.0 --allow-root
Jupyter will be started, and the token will be prompted. Open your browser and go to http://172.17.0.1:8888/lab address. Then insert your token, which prompted early.
That’s it! You can work with your favorite machine learning framework easily.
Working with a stable environment in AI is important. Containers supply for us. We installed a clean Ubuntu OS, Docker, NVIDIA Container Toolkit, and framework images. After that, we started a Jupyter environment in our container.