SPQRAeternum

GUIDE: Local AI with OobaBooga (updated) (Patreon)

Published:

2023-05-25 18:10:26

Imported:

Tags:

AI Guide

Content

UPDATE 9/11/2023

In more recent versions, the "--chat" parameter was removed and having it set might prevent OB from starting. I updated the guide to remove that parameter from section "4.Configure". For people that have problems with OB starting, you can edit the webui.py file from the OoobaBooga folder and remove the "--chat" parameter, and keep only something like CMD_FLAGS = '--api'.

UPDATE 8/25/2023
In more recent updates of OobaBooga, for Llama or Llama2 AI models like some that I recommended in this post (they have llama2 in their name) it's recommended to use the Model loader ExLlama for speed when loading a model in the Model tab. For vam purposes, if you get bad performance, turning down the max_seq_len variable should help a bit at the cost of making the AI more forgetful.

ABOUT

My old tutorial is obsolete already so I'm posting a quick new one for people that want to update their old version or get started with local text AI. This tutorial doesn't require any other previous setup.

There are a few alternatives to running local AI and many AI models to choose from. This solution is the best in terms of both quality & speed that I found, so I'll focus just on it to keep things simple and efficient. I'l post at the bottom some other alternatives too.

INSTALL

1. Download files

Download files from https://github.com/oobabooga/one-click-installers (Code > Download zip). Extract them to some new folder where you want oobabooga to be installed.

2. Run setup

Run start_windows.bat. It will take a few minutes. When asked about GPU, pick Nvidia (or the other variants if you don't have an nvidia card, the performance will not be as good though and some of the later settings might not work)

3. Pick a model

When the setup is done it might ask you to pick an initial model to download. You can paste this: PygmalionAI/pygmalion-350m

This will install this tiny AI model https://huggingface.co/PygmalionAI/pygmalion-350m which is good for testing things. You can add as many models as you want later from the Models tab in the UI:

You can copy paste a code from hugginface from a model like user/model. You can get that from the copy button next to the models names:

The models will have lots of nerdy tags, here's a quick glossary:

4bit or 3bit - These mean that the model has been optimized to be smaller. You definitely want models that have this. It will increase performance a lot and the model swill be smaller in file size.

350M,6B,7B, 70B etc - these are number of millions or billions parameters the model has. The more it has, the smarter it is but it will respons slower. For 6-8GB GPU cards I think 7B is the recommended size.

GPTQ vs GGML - GPTQ versions are better for nvidia cards. GGML are for users that want to run the AI on the CPU instead of the graphic cards, that's usually needed for AMD cards.

Some models I recommend for nvidia cards:
nsfw - Monero/Pygmalion-Metharme-7b-4bit-TopScore
nsfw bigger - notstoic/pygmalion-13b-4bit-128g
smart: TheBloke/Llama-2-7b-Chat-GPTQ
smart bigger: TheBloke/Llama-2-13B-chat-GPTQ

For AMD:
nsfw : TehVenom/Pygmalion-7b-4bit-Q4_1-GGML
smart: TheBloke/Llama-2-7B-Chat-GGML

4. Configure

You can edit in notepad or any text editor the file webui.py in the folder where you installed OobaBooga. You have to look for the line with "CMD_FLAGS" in it, and add --api to it to enable the OobaBooga API (the interface that allows other programs to send commands to oobabooga). Like this for example CMD_FLAGS = '--api'. If you have multiple parameters they can be separated by space like CMD_FLAGS = '--param1 --param2 --api'

--api - will enable the OobaBooga api to allow other local software to connect to it, necessary for Alive to get AI messages

After you make this change you might want to restart OB to make sure everything works fine.

5. Run

Now when you run start_windows.bat, if everything went ok you'll see something like this:

You can then go to http://127.0.0.1:7860/ in any browser and you should see the OobaBooga interface.

6. If you really like it

You can support oobabooga here https://ko-fi.com/oobabooga and help and reward him to update it and make it better. At some point a few weeks ago he actually took it down for a while and almost quit due to haters, which would have been a big shame.

ALTERNATIVES TO OOBABOOGA

For AMD cards, and allegedly better performance in general, there's also this: https://github.com/0cc4m/koboldcpp. People seem to use it a lot these days but for me after a few quick tests it's slower. Maybe for people that try to run big/huge AI models this approach is better, I only played around with small AI models (7B).

Koboldcpp is simpler to install, you can download just the executable here https://github.com/LostRuins/koboldcpp/releases/latest and when you run it, it will ask for a .bin file which is the AI model. It runs on the CPU rather than the GPU so the models are a bit different, you can find them marked as ggml: https://huggingface.co/models?sort=likes&search=ggml

There's also KoboldAI https://github.com/KoboldAI/KoboldAI-Client but at the time of writing this it doesn't support the 4bit models which are a must imo for local casual use.

In Alive for now there's only support for OobaBooga. The other alternatives won't work with VAM.