Run a Large Language Model Locally in the Terminal

We can run a Large Language Model (LLM) – although not quite as good as ChatGPT – on a local machine.
One of the easiest to run is Alpaca, a fine-tuning of LLaMA.
The following works on an Apple M1 Mac.

Clone and build the repo:

$ git clone https://github.com/antimatter15/alpaca.cpp

$ cd alpaca.cpp/

$ make chat

Download the pre-trained model weights:

$ wget -O ggml-alpaca-7b-q4.bin -c https://gateway.estuary.tech/gw/ipfs/QmQ1bf2BTnYxq73MFJWu1B7bQ2UD6qG7D7YDCxhTndVkPC

(See the source repo below for alternatives if this fails).

Run the model:

$ ./chat

Output:

main: seed = 1679968451
llama_model_load: loading model from 'ggml-alpaca-7b-q4.bin' - please wait ...
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size = 512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from 'ggml-alpaca-7b-q4.bin'
llama_model_load: .................................... done
llama_model_load: model size = 4017.27 MB / num tensors = 291

== Running in chat mode. ==
- Press Ctrl+C to interject at any time.
- Press Return to return control to LLaMa.
- If you want to submit another line, end your input in '\'.

> What is the age of the universe?
The current estimate for when our Universe was created, 
according to modern cosmology and astronomy, 
is 13.798 billion years ago (±0.2%).
>

References

https://github.com/antimatter15/alpaca.cpp

Leave a Reply Cancel reply