![]() ![]() ![]() And maybe three-way superscalar with three symmetrical pipelines will also be possible in the future. But say they make the FP pipelines more flexible so that both can do FP32/16 - then they can double the peak FP throughput for only a minor increase in area (all the other machinery is still in place). Right now they seem to have separate FP32, FP16, and INT pipelines. ![]() The beauty of this is that it opens up Apple for Nvidia-style expansion of pipeline functionality. I do wonder what this means for the operand bandwidth and data path contention, one needs a fairly wide bus to sustain two 32-wide SIMD pipelines simultaneously. By the way, is this the first time that a SIMT GPU is doing really superscalar execution with two instruction from two programs decoded and dispatched per cycle? To my knowledge, until now only Nvidia did something like that, but they still dispatch one SIMD per cycle (and alternate between two SIMDs). ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |