805
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 18 Jul 2024
805 points (99.5% liked)
Technology
60319 readers
3717 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 2 years ago
MODERATORS
But instead of relying on the GPU to power it the dedicated AI chip did the work. Like it had it's own distinct chip on the graphics card that would handle the upscaling.
I forget who demoed it, and searching for anything related to "AI" and "upscaling" gets buried with just what they're already doing.
That's already the nvidia approach, upscaling runs on the tensor cores.
And no it's not something magical it's just matrix math. AI workloads are lots of convolutions on gigantic, low-precision, floating point matrices. Low-precision because neural networks are robust against random perturbation and more rounding is exactly that, random perturbations, there's no point in spending electricity and heat on high precision if it doesn't make the output any better.
The kicker? Those tensor cores are less complicated than ordinary GPU cores. For general-purpose hardware and that also includes consumer-grade GPUs it's way more sensible to make sure the ALUs can deal with 8-bit floats and leave everything else the same. That stuff is going to be standard by the next generation of even potatoes: Every SoC with an included GPU has enough oomph to sensibly run reasonable inference loads. And with "reasonable" I mean actually quite big, as far as I'm aware e.g. firefox's inbuilt translation runs on the CPU, the models are small enough.
Nvidia OTOH is very much in the market for AI accelerators and figured it could corner the upscaling market and sell another new generation of cards by making their software rely on those cores even though it could run on the other cores. As AMD demonstrated, their stuff also runs on nvidia hardware.
What's actually special sauce in that area are the RT cores, that is, accelerators for ray casting though BSP trees. That's indeed specialised hardware but those things are nowhere near fast enough to compute enough rays for even remotely tolerable outputs which is where all that upscaling/denoising comes into play.
Found it.
https://www.neowin.net/news/powercolor-uses-npus-to-lower-gpu-power-consumption-and-improve-frame-rates-in-games/
I can't find a picture of the PCB though, that might have been a leak pre reveal and now that it's revealed good luck finding it.
Having to send full frames off of the GPU for extra processing has got to come with some extra latency/problems compared to just doing it actually on the gpu... and I'd be shocked if they have motion vectors and other engine stuff that DLSS has that would require the games to be specifically modified for this adaptation. IDK, but I don't think we have enough details about this to really judge whether its useful or not, although I'm leaning on the side of 'not' for this particular implementation. They never showed any actual comparisons to dlss either.
As a side note, I found this other article on the same topic where they obviously didn't know what they were talking about and mixed up frame rates and power consumption, its very entertaining to read
I've been trying to find some better/original sources [1] [2] [3] and from what I can gather it's even worse. It's not even an upscaler of any kind, it apparently uses an NPU just to control clocks and fan speeds to reduce power draw, dropping FPS by ~10% in the process.
So yeah, I'm not really sure why they needed an NPU to figure out that running a GPU at its limit has always been wildly inefficient. Outside of getting that investor money of course.
Ok, i guess its just kinda similar to dynamic overclocking/underclocking with a dedicated npu. I don't really see why a tiny 2$ microcontroller or just the cpu can't accomplish the same task though.
Nvidia's tensor cores are inside the GPU, this was outside the GPU, but on the same card (the PCB looked like an abomination). If I remember right in total it used slightly less power, but performed about 30% faster than normal DLSS.
from the articles I've found it sounds like they're comparing it to native...