Android之OpenCl 展示:访谈ziilabs公司Tim Lewis


Exposing OpenCL on Android: Q&A with Tim Lewis of ZiiLabs
July 28th, 2011 by Vincent Hindriksen Leave a reply ?

Android之OpenCl    展示:访谈ziilabs公司Tim Lewis

ZiiLabs has been offering an early access program for OpenCL SDK since last year. This program was very selective in choosing developers and little news has been put on their webpage. Now they are planning to make their Android NDK a standard component, it’s a good time to ask them some questions. GPGPU-consultant Liad Weinberger of Appilo also added a few questions.

Ziilabs 去年推出了仅供内部使用的Opencl SDK, 该项目精心挑选了工程师并且在公司主页上只有很少的新闻消息。 现在ziilabs正致力于将其Android NDK打造成的一个标准组件,这正好是我们提出一些问题的好时机。appilo的GPGPU顾问Liad Weinberger 也提出了一些问题。

The Q&A has been with Tim Lewis, director Marketing and Partner Relations of ZiiLabs, who has taken the time to give some insights in what we can expect around accelerated computations on Android. ZiiLabs has been better known as 3DLabs and has reinvented itself in 2009 (you can read the full history here). Like other companies in the ARM-industry they mostly design chips and let other parties manufacture devices using their schematics, drivers and software. Now to the questions.

TL 是Ziilabs市场和合作关系总监,他再Android上计算加速上给出了很多见解。Ziilabs的一些信息。。。

Vincent Hindriksen: You have had the early access program for a year now. Why did we hear so little?
Tim Lewis: Since the initial launch we have been working with a small group of partners via the Early Access Program to help us ensure that the technology we release to mainstream developers meets are quality standards. The partners have provided invaluable feedback and we are now looking forward to releasing it to a wider audience later this year.
VH: The last few months the number of searches on my site for ARM or Android and OpenCL has increased a lot. Do you see an increased interest too?
问:你们发布了内部版已经一年了,为什么外界的消息这么少?
答:从项目最初开始,我们只建立一个合作小组开发这个内部项目,以帮助我们确保这个给主流开发者技术标准的质量。这些成员提出了很多非常有用的反馈,我们目前准备今年将它发布给跟多的观众。

问:前几个月来我们网站搜索ARM Android Opencl的人忽然增加了很多,你也注意到这样的关注了吗?
答:是的,我们注意到公众对于通过媒体处理器的编程和浮点功能,来加强ARM的cpu处理性能反面的与日俱增的兴趣。

TL: Yes, we see a growing interest in finding ways to enhance the CPU performance of the ARM cores by taking advantage of the programmability and floating point performance of the media processor.

VH: Are drivers available for Android only, or also for Windows 8, Windows CE and other Linux flavours?
TL: We are focused on Android.

问:这方面的驱动只提供给Android吗,或者其它的,如WIN8,WInce,和其它Linux 变种?
答:我们仅聚焦在Android平台。

VH: Is the emulator using the PC’s OpenCL-capabilities? And can you tell something about developing OpenCL-software for a ZiiLabs device?
TL: In terms of programming OpenCL for our processors we map physical processing elements to work items and allow the programmer to specify a work group size of up to 8 processing elements (the number of processing elements in a cluster). The runtime then maps the whole problem onto the array by iterating over the problem as many times as required to run the kernel on the specified problem size (total number of work-items). If the programmer specifies a work group size of 1 then the runtime maps 8 work-groups onto a physical cluster. If the programmer specifies a work group size of 8 then the runtime maps 1 work-group onto a physical cluster. And we can of course run multiple work-groups concurrently as we have more than one cluster, in the case of ZMS-20 that’s 6 clusters and ZMS-40, 12 clusters.
The OpenCL cross compiler is supported on Linux based PC hosts, but does not currently support any native PC OpenCL implementation.

问:模拟器上是否会使用和PC相同的opencl功能?你可以谈谈怎么在ziilabs设备上开发opencl软件。
答:提到我们处理器上的opencl编程,我们将物理处理器单元映射成work item,用于编程者指定一个多达8个处理单元的workgroup(同组)。The runtime then maps the whole problem onto the array by iterating over the problem as many times as required to run the kernel on the specified problem size (total number of work-items). 通过运行时间。 如果编程者指定一个size=1的workgroup,runtime会映射8个workgroup在一个物理cluster上,如果指定一个size=8的workgroup,runtime映射一个workgroup在一个物理cluster上,我们当然可以同时运行多个workgroup,例如在ZMS-20上,有6个cluster,zMS-40有12个cluster。
Opencl交叉编译支持任何linux主机上,但是不支持直接运行在本地PC上。

VH: As OpenCL is very low-level, how does it handle crashes? Do you have tips for developers?
TL: We have extended debug tools to help developers write and debug progams.

问:Opencl是非常底层的,它怎么来处理崩溃异常哪?对开发者有什么技巧吗?
答:我们提供了调试工具帮助开发者编写和调试程序。

VH: Many ARM-chips have specialised silicon to do multimedia-computations like encoding/decoding video. Does OpenCL make use of this or only the GPU?
TL: The ZiiLABS ZMS processors use our general purpose StemCell Media Processing SIMD array to offload the ARM from all media intensive tasks such as video encode/decode and OpenGL ES. Traditionally we have used hand-written Microcode which has been optimised over the last 10 years to create these key software components. However for new components and to enable 3rd party developers we are increasingly turning to OpenCL to fully leverage the power of the StemCell array.

问:许多ARM新品提供了定制的硅模块执行多媒体动作,例如编解码视频。Opencl可以利用这些模块吗,还是只能使用GPU来做。
答:ZMS处理器使用我们通用的Stemcell媒体处理SIMD阵列,支持所有的媒体密集任务,如视频编解码,OPENGLES。 通常我们使用手写的微码来创建软件模块,这些代码不断地被优化有10年之久了。对于新模块和第三方开发者,我们正逐步在把opencl转化成更有效率的代码。

VH: Do you think OpenCL (or any comparable technology) will make specialised silicon replace with more processing-cores?
TL: As our architecture is based around a fully programmable array of floating point processing units, it should come as no surprise that I strongly agree with the statement that specialised silicon will be replaced by programmable cores. There will however always be a place for specialised, fixed function silicon or specific components that do not lend themselves to a SIMD approach. However, as mobile SOC’s become more complex and are required to perform more PC-like tasks so the need for flexibility in terms of features and performance increase. The emergence of the VP8 codec is a good example. Fixed function devices have to go through a rev of silicon before support for VP8 can be added, whereas we could add fully optimised 1080p support very quickly by simply updating our “codec” program. And when running any single function we can dedicate more of the available silicon to that task, which helps us achieve better peak performance. As we move forward we expect customers to be able to implement their own proprietary software components that leverage the performance of the media processor via OpenCL.

问:你认为opencl会使多核处理器取代定制硅模块吗?
答:我们的处理器体系结构是全功能编程阵列或者浮点数处理单元,你会惊奇它所带来的运算能力,我完全同意说通用计算取代定制硅是必然。FF,specialised IP, 不是一种SIMD的实现(单指令多数据)。移动计算变得越来越复杂,而且要求和PC相同的任务,这就需要更多性能和功能的可伸缩性。VP8编码就是一个例子。使用FF的设备在出厂前要完成完整的硅算法。然而,我们只需要把优化好的编解码程序下载到芯片内就可以快速的实现算法。我们可以使用跟多的硬件完成一个单一功能的程序,这带来了更好的峰值性能。再向前看,我们希望客户能够通过opencl在部署有他们自己知识产权的软件模块。

VH: Android has Renderscript compute as an alternative to OpenCL. How is your support for RenderScript and if so, does it work together with OpenCL?
TL: We are committed to 100% Android compatibility, so we support Renderscript as well as offering OpenCL.

问:Android有自己的RS计算作为一个opencl的替代品,你们是否支持RS,如果这样,还可以同时使用opencl吗
答:我们尽力做到100%和Android兼容,所以我们会支持RS和Opencl。

VH: What do you think of the upcoming “battle” between RenderScript, CUDA and OpenCL?
TL: Developers will drive this and our goal is to put the core technologies into their hands so that they can make the right decision for them, given that we will be focusing on Renderscript and OpenCL.

问:你怎样看待即将带来的RS,CUDA和OPencl之间的这场“战役”哪?
答:开发者们会驱动这场“战役”,我们的目标是把核心计算放到开发者手中,他们会做正确的决定,我们要做的只是RS和OpenCL。

VH: From your page the typical usage seems to be media-processing, but what typical applications did you have in mind? (Note: I indirectly ask for the strengths and weaknesses of your processor)
TL: If I had to pick one areas where we are seeing most interest it is in implementing proprietary image processing algorithms, including enhanced face-tracking and object recognition but the interest is much broader including enhanced audio processing and general floating point compute.

问:从你们网页上看起来典型应用是媒体处理,那么你觉得是那种典型应用哪?
答:如果让我来选最有兴趣的那应该是实现算法的专利。包括增强人脸跟踪,目标识别,另外还有音频的增强处理,浮点计算也是值得推广的。

VH: You chose for the full profile and not the embedded profile. Is there a reason behind that?
TL: We come from a PC graphics background, so because are chips can support it, we chose to support full.

问:你选择了全特效而不是嵌入式特性,这后面有没有原因哪?
答:我们是有PC图形背景的,所以只要芯片支持的,我们选择支持全特性。

VH: You are claiming 26GFlops of compute power (A dual ARM A9 has 6-10 GFLOPS). Did you use LinPack or made your own test?
TL: We have our own internal tests and this figure of 26 for ZMS-20 is almost doubled with the ZMS-40.

问:你们声称有26GFlops的计算能力(ARM A9有 6-10 GFlops),你们使用了LinPack测试的吗, 还是你们自己的测试程序?
答:我们使用我们内部的测试工具,ZMS-20 有26GFlops的计算能力, ZMS-40有双倍的计算能力。

VH: X86-GPUs work with a wave fronts of 64 workers (AMD) or warps of 32 workers (Nvidia). Does the ZMS-20 (48 cores) work with a comparable concept?
TL: I refer you back to the answer to the question on developing OpenCL programs.

?问:X86的GPU 有浮动的64个线程(amd) 或者32个线程(Nvidia).ZMS-20(48核)有对比的概念吗?
答:我在之前谈到开发opencl程序的时候提到走了这个问题。

VH: Extensions like OpenGL memory-sharing seems obvious, but which extensions are actually available?
TL: We are still finalising the extensions we will support but OpenGL ES and direct access to Camera sensor data seem of particular interest to our partners.
Liad Weinberger: What are the typical (idle/norm/stress) power requirements of each of your OpenCL supporting chips.
TL: We target the mobile space, so typically we are under 1W of total system power and looking to 10+ hours of HD video playback from a tablet sized battery.

问:像OPengl内存共享这样的扩展是显然支持的,那么那些别的扩展是支持的哪?
答:我们正要把它确定下来,看起来OpenGL ES和直接访问camera传感器数据更受我们成员的欢迎。
LW: 如果使用opencl,功耗怎样呢?
答:我们的目标是移动空间。整个系统的功耗一般是1W,我们希望在tablet电池供电情况下提供10+小时的高清播放。

VH: As the Watts per GFLOP is lower when using GPUs, the battery can be spared. Do you have done any tests on battery-endurance when using OpenCL and when using the processor only?
TL: We don’t have any specific results we can share, but certainly media intensive tasks on using the array extend the battery life compared to running the ARM.

问:每GFLOP的功耗比使用GPU要低些,就可以节省一定的电池。你们有测试过使用处理器和opencl时的电池容限吗? (应该是不开高清播放的意思)
答:我们还没有对应的测试结果可以看,但是使用阵列(stllcell)来处理媒体密集型任务要比使用ARM更节省电池寿命。

VH: Being on batteries, is it better to use all cores to the fullest or have some focus on avoiding peak-consumption? In other words: how could the batteries be spared while getting all the work done?
TL: We support a variety of power saving depending on the task and workload, including reducing the number of clusters being used or simply reducing the overall array speed.

问:使用电池时,多核全开和避免峰值损耗哪个更好一点? 换句话说:怎样在执行所有任务的时候怎样来节省电量?
答:我们提供一系列的降低功耗措施,例如减少同时工作的cluster数量,降低阵列的工作速率。

VH: What can you say more about the market position of your products?
TL: We are a strong player across a wide range of markets from surveillance cameras and portable media players to tablets and embedded platforms. Over the coming months I expect to see our customers making some interesting announcements in both the embedded and tablet space.

问:你可以说一下你们产品的市场定位吗?
答:我们支持非常广阔的市场包括:安防摄像头,移动媒体器,嵌入式平台,平板。在未来的几个月内我们期待我们的客户能够发布嵌入式和平板产品。

VH: How long do you expect to have the advantage of being first in offering OpenCL for embedded devices?
TL: We expect to keep pushing the limits of what can be achieved on a mobile processor. Our recently announced ZMS-40 doubles the array size to 96 floating point cores, which, through OpenCL, puts an enormous amount of compute power into the hands of developers targeting the tablet and embedded markets.

问:你认为opencl推向嵌入式设备得到好处要多长时间哪?
答:我们尽力推进在移动处理上的计算能力。目前ZMS-40有96个浮点核心,通过opencl,开发者可以得到大量的计算能力。

VH: Which device do you suggest for developers who want to start developing for OpenCL on Android?
TL: As the core technology provider we have to synchronise our software releases with those of our platform customers. I would expect our customers to be announcing suitable platforms toward the 4th quarter of the year.
LW: Could you give information about the amount of OpenCL-capable ZMS chips on the market? Or next year?
TL: We do not disclose market sensitive data about volumes. What I can say is that we continue to see an amazing increase in the take-up of our processors as user requirements for low-energy, high-performance processors across a broad range of markets including high-volume consumer segments such as tablets, connected TV and the remote medical sectors.
LW: Is such board available if one cannot commit to an amount of units end up purchasing?
TL: We expect to support platforms that are available to end-users and so have no volume requirement.

问:你认为开发者应该选择那一款设备开始Android的opencl编程哪?
答:我们的核心技术会向我们的客户发布,我希望我们的客户在今年的4季度能退出相应的平台产品。(HTC这样的客户,或者google)
问:你能给出一些有opencl能力的ZMS芯片的供货信息吗?
答:我们不会透露市场敏感的数据。我可以说,我开始看到陆续的有更多的用户需求通过我们的芯片来完成低功耗、高性能领域,例如平板,connected TV, 远程医疗。

VH: Is there something you want to share with us, which not got mentioned?
TL: Not at the moment, other than to thank you for the questions and for developers to continue to “watch this space” as we prepare to roll out the next phase of our OpenCL program.

问:你又没有其它没有提到的信息要告诉我们?
答:目前没有,不过要感谢你们为开发者提出这些问题,感谢继续关注我们的opencl 相关的网站信息。

VH: Thank you very much for your time. Can readers ask questions in the comments, so you can answer them?
TL: Yes. Please email me directly at tim[dot]lewis[at]ziilabs[dot]com.

We left out questions how the product was compared to Nvidia Tegra, ImTech PowerVR and AMD G-T40N, as ZiiLabs does not comment on competitors’ products. When the products of ZiiLab’s partners hit the market end of the year, we will be able to tell more. Be the first to know when that happens: ZiiLabs@twitter, StreamComputing@twitter, StreamComputing@facebook.

If you want to see more, watch the below Youtube where Tim Lewis demoes what is possible with current generations of ZiiLabs-hardare.

我们希望能够比较这些产品:
    NV 的 tegra
    Imagination 的 PowerVR,
    AMD 的  G-T40N
   

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值