519
submitted 10 months ago by boem@lemmy.world to c/technology@lemmy.world
you are viewing a single comment's thread
view the rest of the comments
[-] NotMyOldRedditName@lemmy.world 5 points 10 months ago* (last edited 10 months ago)
[-] dep@lemmy.world 1 points 10 months ago

Is there a post somewhere on getting started using things like these?

[-] NotMyOldRedditName@lemmy.world 1 points 10 months ago* (last edited 10 months ago)

I don't know a specific guide, but try these steps

  1. Go to https://github.com/oobabooga/text-generation-webui

  2. Follow the 1 click installation instructions part way down and complete steps 1-3

  3. When step 3 is done, if there were no errors, the web ui should be running. It should show the URL in the command window it opened. In my case it shows "https://127.0.0.1:7860". Input that into a web browser of your choice

  4. Now you need to download a model as you don't actually have anything to run. For simplicity sake, I'd start with a small 7b model so you can quickly download it and try it out. Since I don't know your setup, I'll recommend using GGUF file formats which work with Llama.cpp which is able to load the model onto your CPU and GPU.

You can try this either of these models to start

https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF/blob/main/mistral-7b-v0.1.Q4_0.gguf (takes 22gig of system ram to load)

https://huggingface.co/TheBloke/vicuna-7B-v1.5-GGUF/blob/main/vicuna-7b-v1.5.Q4_K_M.gguf (takes 19gigs of system ram to load)

If you only have 16 gigs you can try something on those pages by going to /main and using a Q3 instead of a Q4 (quantization) but that's going to degrade the quality of the responses.

  1. Once that is finished downloading, go to the folder you installed the web-ui at and there will be a folder called "models". Place the model you download into that folder.

  2. In the web-ui you've launched in your browser, click on the "model" tab at the top. The top row of that page will indicate no model is loaded. Click the refresh icon beside that to refresh the model you just downloaded. Then select it in the drop down menu.

  3. Click the "Load" button

  4. If everything worked, and no errors are thrown (you'll see them in the command prompt window and possibly on the right side of the model tab) you're ready to go. Click on the "Chat" tab.

  5. Enter something in the "send a message" to begin a conversation with your local AI!

Now that might not be using things efficiently, back on the model tab, there's "n-gpu-layers" which is how much to offload to the GPU. You can tweak the slider and see how much ram it says it's using in the command / terminal window and try to get it as close to your video cards ram as possible.

Then there's "threads" which is how many cores your CPU has (non virtual) and you can slide that up as well.

Once you've adjusted those, click the load button again, see that there's no errors and go back to the chat window. I'd only fuss with those once you have it working, so you know it's working.

Also, if something goes wrong after it's working, it should show the error in the command prompt window. So if it's suddenly hanging or something like that, check the window. It also posts interesting info like tokens per second, so I always keep an eye on it.

Oh, and TheBloke is a user who converts so many models into various formats for the community. He'll have a wide variety of gguf models available on HuggingFace, and if formats change over time, he's really good at updating them accordingly.

Good luck!

[-] dep@lemmy.world 1 points 10 months ago

Wow I didn't expect such a helpful and thorough response! Thank you kind stranger!

[-] NotMyOldRedditName@lemmy.world 1 points 10 months ago

You're welcome! Hope you make it through error free!

load more comments (6 replies)
load more comments (6 replies)
load more comments (12 replies)
this post was submitted on 27 Oct 2023
519 points (94.8% liked)

Technology

58123 readers
4297 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS