One Image Is Worth 1,000 Labels

One Image Is Worth 1,000 Labels

Very recently there was an interview[0] with Yann LeCun about Deep Learning which lead to open letter[1]. The letter raised the question if industries with limited amount of data need different approaches, especially unsupervised ones, to make sense of their data.

To add to this discussion, we would like to mention the recent hype of big, deep, convolutional networks -ConvNets- that are used by many companies today. The drawbacks of these systems are that they are very data hungry and that they require labeled data. In other words, if a company has millions of samples, but not corresponding labels, or a labeled dataset, but the size of it is rather small, ConvNets are no (real) option. Since almost every deep learning system that is used today is supervised, unsupervised end-to-end alternatives larger images are rare and usually no drop-in replacements for the supervised model.

In the domain of fashion, we have neither problems with limited datasets, nor with missing labels. However, the use of labels is rather limited. Why? Without a doubt it is useful to predict the -previously unknown- category of an image, for instance shirt, blouse, coat, …, but then we need to a deeper, more conceptual description of the image. That means in order to get an understanding of the image, we need hierarchical features, to describe the concepts of them that go far beyond a simple label like ‘coat’. For these features we use Deep Learning, but not in a strictly supervised way and this is why we cannot simply use traditional ConvNets.

The reason is that ConvNets try to separate object classes, for instance shirts and coats, but also some high-level concepts that might be present in both classes, for instance, buttons on clothes, just to name one. But of course, there are thousands of other examples like that. A different perspective is that a label does not tell much about the details of the image. For instance, a coat has a color, different shapes, texture and can be made of different materials. All these features are completely ignored by the label, because the ultimate goal is just to decide if the image is *some* kind of coat or not.

So, in the domain of fashion, at least for most approaches, like a similarity search, the use of labels is rather limited because such models mostly learn features to separate classes. The result is already useful, but as the aim is not to describe all (important) details of an image, the learned features are unlikely to be optimal for fine-grained approaches like a visual similarity search.

Most of the offered services for images today try to predict something, a category, or tags. Deep Learning in this area is clearly focused on supervised methods. However, we agree with the open letter in that special domains, or for companies with limited data, or when labels are not sufficient for the task at hand, as for the fashion domain, unsupervised learning is very important.

Since the revolution of Deep Learning started with unsupervised learning, there are plenty of methods available. However most of them cannot be easily generalized to full-size images. For instance, it is known that features of a pre-trained ConvNet can be used for a broad range of applications, but to train such a net, labels are required. This leads to a chicken-egg problem and if we use hand-crafted features from traditional computer vision, the expressive power might be limited.

Bottom line, for services like similarity search, we need to learn features from images that describe the concepts in the data and that are able to disentangle most explaining factors without the necessity to use any labels. Because of the importance of this topic, we decided that this is the first project for the brand new Deep Learning lab! We have initiated first tests with very impressive results and welcome you to take part in the discussion.


Comments are closed.





当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


