They are a great way to prototype ASICs or for performing relatively simple low latency/high-throughput tasks below the economies of scale where actually taping out an ASIC would make sense but there is pretty much no case where an FPGA with a bunch of the same logic path is going to outperform a dedicated ASIC of the same logic.
NPUs are already the defacto ASIC accelerator for ML. Trying to replicate that functionality on an FPGA fabric of an older process node with longer path lengths constraining timing is going to be worse than a physically smaller dedicated ASIC.
It was the same deal with crypto-mining, the path for optimizing parallel compute is often doing it badly on a GPU first, moving to FPGA if memory isn't a major constraint, then tape out ASICs once the bugs in the gateware are ironed out (and economies of scale allow)
And that doesn't even begin to cover the pain of FPGA tooling in general and particularly vendor HLS stacks.
Why are you being so condescending about this?
FPGAs are a great tool, but they're not magic.
They are a great way to prototype ASICs or for performing relatively simple low latency/high-throughput tasks below the economies of scale where actually taping out an ASIC would make sense but there is pretty much no case where an FPGA with a bunch of the same logic path is going to outperform a dedicated ASIC of the same logic.
NPUs are already the defacto ASIC accelerator for ML. Trying to replicate that functionality on an FPGA fabric of an older process node with longer path lengths constraining timing is going to be worse than a physically smaller dedicated ASIC.
It was the same deal with crypto-mining, the path for optimizing parallel compute is often doing it badly on a GPU first, moving to FPGA if memory isn't a major constraint, then tape out ASICs once the bugs in the gateware are ironed out (and economies of scale allow)
And that doesn't even begin to cover the pain of FPGA tooling in general and particularly vendor HLS stacks.