ChatGPT subscribers may get a ‘GPT builder’ option soon (www.theverge.com)

submitted 10 months ago by fer0n@lemm.ee to c/technology@beehaw.org

35 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] lol3droflxp@kbin.social 3 points 10 months ago

Doesn’t that mean RAM?

[-] conciselyverbose@kbin.social 9 points 10 months ago

If it's actually High Bandwidth Memory, it's the VRAM they use for some video cards/SoCs.

It might be mostly the same components, but the high bandwidth part is important and harder to do. They get the much higher throughput by physically stacking the chips on top of each other directly on the chip. The much lower distance signals have to travel (combined with a lot of pins to send signal through) do more than you can do with traditional RAM.

[-] GiveMemes@jlai.lu 3 points 10 months ago

There's a company making analog chips that do the matrix calculations at a (15 or) 60x (I forget which) more efficient rate than moden chips (by multiplying voltages I believe). Even though one is only about 1/3 the processing power of a modern gpu, stack enough together and you're cooking. The matrix multiplication aspect is what we're using the VRAM for right?

[-] conciselyverbose@kbin.social 3 points 10 months ago

The actual models telling them what to multiply are, to my knowledge.

VRAM isn't the low level "working" memory. You still have to pull structures from memory and into actual use. If you're working on pen and paper, a bookshelf might be system storage and your desk might be RAM/VRAM, but you still need to copy the numbers from your desk onto the piece of paper you're working on. That's lower level cache, registers, the tensor cores, etc.

If the chip you're discussing is a better calculator, that's useful, but you still need the big desk to hold the huge amount of information you need to reference at any given time.

My brain is mush for some reason today, so that might not make sense, but better matrix operations shouldn't remove the need to have access to a huge model.

[-] GiveMemes@jlai.lu 1 points 10 months ago

Thanks for the informative reply! Looks like I need to brush up on my hardware knowledge lol

[-] lol3droflxp@kbin.social 1 points 10 months ago

I get that this is expensive. However, it should also work with RAM if you accept slower speeds I guess. The question is of course if it’s still usable then.

[-] averyminya@beehaw.org 4 points 10 months ago

Most current locally hosted software has some option to offload to RAM, CPU, and disk. VRAM is fastest, but RAM and CPU offloading lets you cut down to less than 4GB VRAM for certain applications, at plenty reasonable speed.

[-] abhibeckert@beehaw.org 1 points 10 months ago* (last edited 10 months ago)

GPT-4 is already kinda slow - it works best as a "conversational" tool where you ask follow up questions and clarify things that have already been said. That's painful when you have to wait 10 seconds for a response. I couldn't imagine it being useful if it was minutes.

[-] interolivary@beehaw.org 1 points 10 months ago

Having to wait 10 seconds for a response is "painful"?

[-] abhibeckert@beehaw.org 2 points 10 months ago* (last edited 10 months ago)

To put some numbers on it - RAM runs at tens of gigabytes per second (bytes, not bits). High Bandwidth Memory runs at several hundred or sometimes terabytes per second (OpenAI is likely using the latter, and that memory isn't just expensive it's also supply constrained, so the prices are astronomically high right now).

You can buy HBM, and you can use it as your main system RAM, but it's painfully expensive. The actual amount of bandwidth also scales linearly with with the amount of memory you buy as well. So a 500GB is 10x faster than 50GB - because it write to all of the chips simultaneously (and then read from all of them when you access the data back).

It's pretty standard on high end GPUs these days. Apple also uses it on all their computers (if you buy a Mac with 64GB of RAM, it'll run at 800MB/s - which isn't quite as fast as a high end GPU but it's close and it is HBM). It's part of why Macs are so expensive (and also why the cheaper ones have very little RAM).

this post was submitted on 06 Nov 2023

56 points (100.0% liked)

Technology

37554 readers

435 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

coldredlight@beehaw.org

Los@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org