Simplifying Vulkan one subsystem at a time
196 points - today at 1:26 PM
SourceComments
The GPU programming seems to be both super low level, but also high level, cause textures and descriptors need these ultra specific data format's, and then the way you construct and upload those formats are very complicated and change all the time.
Is there really no way to simplify this ?
Regular vertex data was supposed to be strictly pre formatted in pipeline too, util it was not suddenly, and now we can just give the shader a `device_address`extension memory pointer and construct the data from that.
Addiitionally most of these fixes aren't coming into Android, now getting WebGPU for Java/Kotlin[0] after so many refused to move away from OpenGL ES, and naturally any card not lucky to get new driver releases.
Still, better now than never.
[0] - https://developer.android.com/jetpack/androidx/releases/webg...
From the linked video, "Feature parity with OpenCL" is the thing I'm most looking forward to.
So this goes into Vulkan. Then it has to ship with the OS. Then it has to go into intermediate layers such as WGPU. Which will probably have to support both old and new mode. Then it has to go into renderers. Which will probably have to support both old and new mode. Maybe at the top of the renderer you can't tell if you're in old or new mode, but it will probably leak through. In that case game engines have to know about this. Which will cause churn in game code.
And Apple will do something different, in Metal.
Unreal Engine and Unity have the staffs to handle this, but few others do. The Vulkan-based renderers which use Vulkan concurrency to get performance OpenGL can't deliver are few. Probably only Unreal Engine and Unity really exploit Vulkan properly.
Here's the top level of the Vulkan changes.[1] It doesn't look simple.
(I'm mostly grumbling because the difficulty and churn in Vulkan/WGPU has resulted in three abandoned renderers in Rust land through developer burnout. I'm a user of renderers, and would like them to Just Work.)
[1] https://docs.vulkan.org/refpages/latest/refpages/source/VK_E...
BDA, dynamic rendering and shader objects almost make Vulkan bearable. What's still sorely missing is a single-line device malloc, a default queue that can be used without ever touching the queue family API, and an entirely descriptor-free code path. The latter would involve making the NV bindless extension the standard which simply gives you handles to textures, without making you manage descriptor buffers/sets/heaps. Maybe also put an easy-path for synchronization on that list and making the explicit API optional.
Until then I'll keep enjoying OpenGL 4.6, which already had BDA with c-style pointer syntax in glsl shaders since 2010 (NV_shader_buffer_load), and which allows hassle-free buffer allocation and descriptor-set-free bindless textures.
Everyone keeps telling me OpenCL is deprecated (which is true, although it's also true that it continues to work superbly in 2026) but there isn't a good / official OpenCL to Vulkan wrapper out there to justify it for what I do.
I'm sure the comments will be all excuses and whys but they're all nonsense. It's just a poorly thought out API.
Once Vulkan is finally in good order, descriptor_heap and others, I really really hope we can get a WebGPU.next.
Where are we at with the "what's next for webgpu" post, from 5 quarters ago? https://developer.chrome.com/blog/next-for-webgpu https://news.ycombinator.com/item?id=42209272
Graphics people, here is what you need to do.
1) Figure out a machine abstraction.
2) Figure out an abstraction for how these machines communicate with each other and the cpu on a shared memory bus.
3) Write a binary spec for code for this abstract machine.
4) Compilers target this abstract machine.
5) Programs submit code to driver for AoT compilation, and cache results.
6) Driver has some linker and dynamic module loading/unloading capability.
7) Signal the driver to start that code.
AMD64, ARM, and RISC-V are all basically differing binary specs for a C-machine+MMU+MMIO compute abstraction.
Figure out your machine abstraction and let us normies write code that’s accelerated without having to throw the baby out with the bathwater ever few years.
Oh yes, give us timing information so we can adapt workload as necessary to achieve soft real-time scheduling on hardware with differing performance.