移动平台GPU浮点测试

最新推荐文章于 2024-07-06 23:45:33 发布

米饭盖饭

最新推荐文章于 2024-07-06 23:45:33 发布

阅读量1.8k

点赞数 2

分类专栏： Unity

Unity 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

在项目中写了个带流光的shader，然后在某些机型下出来的效果并不连续，查了下资料，说不同型号的移动GPU处理浮点类型的方式不一样，看了下面的文章发现arm mail 400的处理能力最差，正好出问题的机型也是arm 400的（三星note2），怀疑是使用unity的_Time乘以其他值时导致精度丢失，所以乘出来的效果不连续，解决方案是在cpu端算好了再传到shader中，或者是按照转载文章3里那样把float中的每bit进行下位操作，移动平台果然坑好深啊。。。

遇到同问题的兄弟。。http://forum.unity3d.com/threads/floating-point-issue-with-frac-_time-on-mobile-devices.184930/

移动平台GPU浮点测试的文章：http://www.youi.tv/mobile-gpu-floating-point-accuracy-variances/

为了防止哪天文章打不开，这里粘一份吧

MOBILE GPU FLOATING POINT ACCURACY VARIANCES

When coding GPU shaders for multiple target platforms, it is important to consider the differences of the hardware implementations and their impact. This is especially true when creating user interfaces, as the use of shaders for visual enhancement or experience augmentation is crucial and cannot vary between OS platforms or devices.

One of the key differentiators between mobile GPU families is the capability of the computational units. These differences are normally seen with handling of code complexity or visual artifacts created by the rendering schemes, especially with tile-based systems. These can sometimes be overcome using simpler shader algorithms or creative approaches to the geometry constructs being used.

However, the more significant contributor to the quality of the shader output lies in the accuracy of the floating point calculations within the GPU. This contrasts greatly from CPU computational accuracy and variances are common between the different mobile GPU implementations from ARM Mali, Imagination Technology, Vivante, and others.

Being able to compare the accuracy of various GPU models allows us to prepare for the lowest accuracy units to ensure the shader output is still acceptable while optimizing for incredible visual effects from the better performing hardware. The You.i Engine makes direct use of this information to ensure a consistent look and feel, a key differentiator in the user interface market.

In a perfect world, only one reference implementation would be needed. This is simply not viable with today’s hardware. At worst, we may need to do several implementations, targeting the various accuracy levels, to ensure a common visual effect and consistent user experience. If calculation errors occur when outside the usable range of the floating point units, then we must account for that to prevent undesirable effects.

Let’s compare some current mobile devices using some simple fragment shader code:

 
          1 
        
          2 
        
          3 
        
          4 
        
          5 
        
          6 
        
          7 
        
          8 
        
          9 
        
          10 
        
          11 
        
          12 
        
          13 
        
         precision  
         highp  
         float 
         ; 
        
         uniform  
         vec2  
         resolution 
         ; 
        
         void  
          
         main 
         ( 
          
         void 
          
         ) 
        
         { 
        
         float  
          
         x  
          
         =  
          
         ( 
          
         1.0 
         – 
          
         ( 
          
         gl_FragCoord 
         . 
         x 
          
         / 
          
         resolution 
         . 
         x 
          
         ) 
         ) 
         ; 
        
         float  
          
         y  
          
         =  
          
         ( 
          
         gl_FragCoord 
         . 
         y 
          
         / 
          
         resolution 
         . 
         y 
          
         ) 
          
         * 
          
         26.0 
         ; 
        
         float  
          
         yp  
          
         =  
          
         pow 
         ( 
          
         2.0 
         , 
          
         floor 
         ( 
         y 
         ) 
          
         ) 
         ; 
        
         float  
          
         fade  
          
         =  
          
         fract 
         ( 
          
         yp 
          
         + 
          
         fract 
         ( 
         x 
         ) 
          
         ) 
         ; 
        
         if 
         ( 
         fract 
         ( 
         y 
         ) 
          >=  
         0.9 
         ) 
        
         gl_FragColor  
          
         =  
          
         vec4 
         ( 
          
         vec3 
         ( 
          
         fade 
          
         ) 
         , 
          
         1.0 
          
         ) 
         ; 
        
         else 
        
         gl_FragColor  
          
         =  
          
         vec4 
         ( 
          
         0.0 
          
         ) 
         ; 
        
         }

This example will calculate a varying fade level from bright white down to black over 26 iterations on the screen. The further down the screen the smooth blended line goes, the more precision we have in the floating point unit.

For reference, we will use a desktop rendering of the shader, for our purposes this sample from a laptop nVidia GeForce GT 630M is more than enough:

Our Benchmark:

If GPUs were capable of infinite resolution, we would see 26 horizontal bars that gradually change from white to black. Any deviation in that pattern is considered a computational error in the GPU.

Here we see that the nVidia GeForce GT 630M has a lot of floating point accuracy and easily achieves 16 solid bars before any degradation, slowly truncating after that. This is a typical result – it has a very capable floating point unit and therefore, this serves as a good comparison benchmark.

The drift from the left edge indicates error in calculation (areas that should be white are black), which would translate into undesirable visual glitches if not accounted for.

Device: Acer Iconia Tab A700 | GPU: nVidia Tegra 3 T30

It is immediately apparent that the range of the floating point unit is quite less than the benchmark. We see about 8 bars before degradation and a linear loss of accuracy after that. This result is acceptable, but low end for mobile when compared to the Huawei MediaPad below. The lack of drift from the left edge indicates negligible error in calculation, which means we can use the full range of this device.

Device: Kobo ARC | GPU: Imagination Technology SGX544

The floating point unit is quite good, achieving about 14 lines before loss of usable range – very close to the 16 lines of our reference. The loss of accuracy after the point is error prone, as it can be seen by the artifacts in the image. This emphasizes the need for preventing calculations.

Device: Huawei MediaPad | GPU: Vivante GC4000

This is an almost perfect result; very similar to our benchmark and very linear in degradation. No issues here: lots of usable range; no artifact errors. A great mobile reference platform.

Device: Samsung Galaxy Note 10.1 | GPU: ARM Mali-400 MP4

The floating point unit performs the worst of all the devices tested, achieving only 5 lines before loss of usable range. Luckily, there are no large error artifacts in the image, so at least the poor accuracy is not compounded further by erroneous results.

Device: ZTE N970 | GPU: Qualcomm Adreno 225

Here, we have an almost perfect result, very similar to our benchmark and almost identical tot he Huawei MediaPad. Again, a great mobile reference platform.

Device: Samsung Nexus 10 | GPU: ARM Mali T-604

Here we have a decent result, though not as strong as the GC4000 or Adreno 225. The results are nearly the same as our reference platform, showing the same potential error from the drift from the left edge.

To see the effect of these computational differences, or to try some on your own device to examine performance, check out the following:After comparing the output across different GPU chipsets we immediately see the difference in performance and usable range of the floating point units. It is important to note that this is not related to device performance or even GPU implementation differences by different manufacturers – it is simply the computation range of the GPU itself. Most comparisons are done through tests of pure performance: triangles per second or texel fill rate. Although these numbers are valuable, they do not tell the full story of the GPU’s true capability. When applied to natural user interfaces, these computational differences are even more important since, unlike games, there is no tolerance for any visual artifacts.

YouTube video: The Importance of Shaders, showing the result of these calculation errors
You.i Shader Effect Test, an Android application for viewing the shaders for comparison on a device