The 512-bits upper bound to the "Callee Cleanup" backwards compatibility exists for the x86 because a retn stackbytes or retf stackbytes has the constraint of 255 bytes on the stack. If a system is 1024-bits then a call would take 128 bytes, but by most operation systems' convention, a call always call for at least two words onto the stack. So a 1024-bits machine would call 256 bytes > 255 bytes limit that Callee could clean up. However, the transition between 512-bits to 1024-bits would be to only place the return instruction onto the stack and one pointer on the stack for the Callee to clean up.
Note that the limitation is only for the Callee Cleanup Backwards Compatibility. If a new instruction replace this old instruction then it would solve this problem. All that happens is a lost in compatibility of binaries. Those binaries need to recompile.
The retn stackbytes or retf stackbytes are ancient 8-bit/16-bit hybrid of the x86.
The Callee Cleanup convention is a way to prevent stack overflow if there is a limit to the number of stack frames (how deep functions are called).
I think it would be better if the data is not pass on the stack. Let the stack hold the stack frames, and a different place to hold the data.