In the first part of the story, we used a free Google Colab instance to run a Mistral-7B model and extract information using the FAISS (Facebook AI Similarity Search) database. In this part, we will go further, and I will show how to run a LLaMA 2 13B model; we will also test some extra LangChain functionality like making chat-based applications and using agents. In the same way, as in the first part, all used components are based on open-source projects and will work completely for free.
Let’s get into it!
LLaMA.cpp
A LLaMA.CPP is a very interesting open-source project, originally designed to run an LLaMA model on Macbooks, but its functionality grew far beyond that. First, it is written in plain C/C++ without external dependencies and can run on any hardware (CUDA, OpenCL, and Apple silicon are supported; it can even work on a Raspberry Pi). Second, LLaMA.CPP can be connected with LangChain, which allows us to test a lot of its functionality for free without having an OpenAI key. Last but not least, because LLaMA.CPP works everywhere, it’s a good candidate to run in a free Google Colab instance. As a reminder, Google provides free access to Python notebooks with 12 GB of RAM and 16 GB of VRAM, which can be opened using the Colab Research page. The code is opened in the web browser and runs in the cloud, so everybody can access it, even from a minimalistic budget PC.
Before using LLaMA, let’s install the library. The installation itself is easy; we only need to enable LLAMA_CUBLAS
before using pip:
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip3 install llama-cpp-python
!pip3 install huggingface-hub
!pip3 install sentence-transformers langchain langchain-experimental
!huggingface-cli download TheBloke/Llama-2-7b-Chat-GGUF llama-2-7b-chat.Q4_K_M.gguf --local-dir /content --local-dir-use-symlinks False
For the first test, I will be using a 7B model. Here, I also installed a huggingface-hub
library, which allows us to automatically download a “Llama-2–7b-Chat” model in the GGUF format needed for LLaMA.CPP. I also installed a LangChain…