特征工程:类别变量编码_列表与类别:第1部分:问题简介

特征工程:类别变量编码

This article explores the difference between two entities:

本文探讨了两个实体之间的区别:

List and Category.   In part one,  we'll look at the basic concepts, and set some groundwork.  In part two, we'll get to some conclusions.

I know that these two things are different, but I need to find out a method using which I make my software program identify the distinction.  I am trying to be methodical.  After a search/research and brainstorming for some hours, I am able to come up with this much:

我知道这两件事是不同的,但是我需要找出一种方法,使我的软件程序可以识别出这种区别。 我试图变得有条不紊。 经过几个小时的搜索/研究和头脑风暴,我能够提出以下建议:

A List is a collection of items.

列表

Properties of a list:

列表的属性:

name of the list

清单名称



list of items

项目清单



list of sublists

子列表列表



size

尺寸



A Category is a class or division in a scheme of classification.

类别

Properties of a category:

类别的属性:

name

名称



list of items

项目清单



list of subcategories

子类别列表



size

尺寸



My inability to describe the entity in terms of its attributes, or...

我无法根据实体的属性来描述实体,或者...



It is impossible to describe an entity in terms of only its attributes and something else is required, or...

不可能仅根据实体的属性来描述实体,而需要其他一些要素,或者...



The difference in entities is not because of the difference in attributes.

实体的差异不是由于属性的差异。



It is possible to compare entities.

可以比较实体。



One of the way to do so is to compare the properties and behavior first.

一种方法是首先比较属性和行为。



How do we compare two entities?

我们如何比较两个实体?



Should we compare the attributes of those entities to describe the difference between them?

我们是否应该比较那些实体的属性以描述它们之间的差异?



If the attributes of the two entities are same, could they still be differentiated based on the values of those attributes and be grouped into a common class?

如果两个实体的属性相同,是否仍可以根据这些属性的值来区分它们并将它们分组为一个公共类?



Or Does it mean that two entities with identical attributes are same?

还是说两个具有相同属性的实体是相同的?

Given two entities, I will first explore the possibility of comparing the entities by comparing their attributes and behavior.  Comparing the attributes could mean comparing the number of attributes as well as type of those attributes.  If there is one property/attribute which is in one entity and not in the other, that would mean these two entities are different.  If the properties themselves of the entities are different, that would also mean that entities are different.

给定两个实体,我将首先探讨通过比较实体的属性和行为来比较实体的可能性。 比较属性可能意味着比较属性的数量以及这些属性的类型。 如果一个属性/属性在一个实体中而不在另一个实体中,则意味着这两个实体是不同的。 如果实体的属性本身不同,那也意味着实体是不同的。

So, if the attributes are identical, then does it mean that the entities are same? Let's say that I have two instances, which have the same properties/attributes, just the values of those attributes are different.  Would that imply that these instance belong to the same entity? Could the behavior of those entities also makes a difference whether they belong to the same entity or different?  Let's look at a few examples:

那么,如果属性相同,是否意味着实体相同? 假设我有两个实例,它们具有相同的属性/属性,只是这些属性的

vs. Green


It's not hard to make out that both these instance belongs to class 'Color', though the behavior of both colors are different. Blue color and Green color signifies separate things also, and have different usage. So, would this mean that two instances could belong to same class even if their behavior is not same? One answer would be that there could be two subclasses of 'Color' class, one of which will have 'Blue' as its instance and the other ones will have 'Green' as its instance. Both of these subclasses could add to the default behavior of a color.

不难确定这两个实例都属于“颜色”类,尽管两种颜色的行为都不同。 蓝色和绿色也表示不同的事物,并且具有不同的用法。 那么,这是否意味着即使两个实例的行为不同,它们也可能属于同一类? 一个答案是可能有两个“颜色”类的子类,其中一个子类将“蓝色”作为实例,另一个子类将“绿色”作为实例。 这两个子类都可以添加颜色的默认行为。



So, Blue and Green, while both of them being the part of Color class, can have different behavior. One thing to note here is that, we have created the hierarchy as per the context, keeping in mind the difference in behavior. For example, since Green color can be used in a traffic signal, and Blue color cannot be, so subclassing could be done in way that there would be two subclasses of Color, TrafficSignalColor (whose instance would be Green), and other would be NonTrafficSignalColor (whose instance would be Blue). But if the difference is something else, then subclassing would be done differently. We can identify all such usages and argue that we can make subclassing non-contextual, but the success of this exercise would be entirely dependent on how capable we are in finding out all the behavioral aspects and usages of the instances.

因此,蓝色和绿色(虽然它们都是Color类的一部分)可以具有不同的行为。 这里要注意的一件事是,我们根据上下文创建了层次结构,同时牢记行为上的差异。 例如,由于可以在交通信号灯中使用绿色,而不能在交通信号灯中使用蓝色,因此可以通过以下方式进行子类化:将有两个颜色子类:TrafficSignalColor(其实例为绿色),另一个为NonTrafficSignalColor (其实例为蓝色)。 但是,如果还有其他区别,则子类化将以其他方式进行。 我们可以识别所有此类用法,并争辩说可以使子类化成为非上下文的,但是此练习的成功完全取决于我们是否有能力找出实例的所有行为方面和用法。



Also, another point to be noted is, I am assuming the class's instances to exhibit properties, behavior, and usages rather than the usual notion of just properties and behavior.

另外,需要注意的另一点是,我假设类的实例展示了属性,行为和用法,而不是通常的属性和行为的概念。



vs. A 'Rotomac' ball-point pen
a) An instance of that book could belong to multiple classes (if we allow the addition/removal of few attributes) such as Book, or Novel (if we remove the attribute 'type'). If we keep just the author name, publication source, type, price, genre, release date, and release media, then the instance could be belong to class called 'Publication.'   a)该书的一个实例可能属于多个类(如果我们允许添加/删除少量属性),例如书或小说(如果我们删除属性“ type”)。 如果我们仅保留作者姓名,出版物来源,类型,价格,类型,发行日期和发行媒体,则该实例可能属于称为“出版物”的类。

If we keep on adding or removing the attributes, this instance could belong to any number of classes (apart from) like SaleableItems (if we keep price, type, release date, release media), or RentalItems, or SoldItems (if we add sold date, buyer name), FavoriteList (if we add buyer name, etc.) and many more such classes. So, there are a few things to notice here:

如果我们继续添加或删除属性,则该实例可以属于任何数量的类(除了),如SaleableItems(如果我们保留价格,类型,发布日期,发布媒体),RentalItems或SoldItems(如果我们添加sold)日期,买方名称),FavoriteList(如果我们添加了买方名称等)以及更多此类。 因此,这里有一些注意事项:



>> Which class would an instance belong to, would be determine by what all attributes, one could think of for an instance. In other worlds, the answer to 'where this instance fits in?' is limited-by/subjected-to your knowledge-of/ability-to-de scribe that instance.

>> An instance could belong to a set of classes which can be mutually exclusive to each other. For example, in the above example RentalItems, RentedItems, and SoldItems.
>>实例属于哪个类,将由实例可能想到的所有属性决定。 在其他世界中,“此实例适合的位置?”的答案 受您的知识/能力的限制/约束 写下那个实例。

>>一个实例可能属于一组可以互斥的类。 例如,在上面的示例中,RentalItems,RentedItems和SoldItems。
b) An instance of that pen could also belong to multiple classes, such as: Pen, ball-point pen, etc. It could also belong to a class of WritingDevice, WritingTool, etc. if we look from the point of view of its usage.  It could also belong to a class of StationaryItem, GeometryBoxItems, SchoolBagItems, etc. if we look from the point of view of where it fits in. As above, this instance could belong to many number of classes (apart from) like SaleableItems (if we keep price, type, release date, release media), or RentalItems, or SoldItems (if we add sold date, buyer name), FavoriteList (if we add buyer name, etc) and many more such classes. So, another few set of things to note here is: b)该笔的一个实例也可以属于多个类,例如:Pen,圆珠笔等。如果我们从它的角度来看,它也可以属于WritingDevice,WritingTool等类。用法。 如果从适合的角度来看,它也可能属于StationaryItem,GeometryBoxItems,SchoolBagItems等类。如上所述,此实例可能属于许多类(除了),如SaleableItems(如果我们保留价格,类型,发布日期,发布媒体),RentalItems或SoldItems(如果我们添加出售日期,买方名称),FavoriteList(如果我们添加买方名称等)以及更多此类。 因此,这里需要注意的另外几件事是:

>> By looking from different points of view, we can figure out many sets of classes an instance can belong to.  So, which class an instance belongs to is limited by our imagination and also the current context in which you are looking at the instance. >>从不同的角度看,我们可以找出实例可以属于的许多类集。 因此,实例所属的类受我们的想象力以及您正在查看该实例的当前上下文的限制。

c) Both the Pen and Book as described above in terms of attributes have some common attributes like type, price, genre, release date, and release media. So, another few set of things to note here is: c)就属性而言,如上所述的“笔”和“书”都有一些共同的属性,例如类型,价格,类型,发行日期和发行媒体。 因此,这里需要注意的另外几件事是:
>> If we ignore rest of the attributes, then we can make both of these instances belong to the same class. This is a familiar conclusion since it is the case generally that common attributes can be re-factored to the super class, and then rest of the attributes will go to the subclasses. >>如果忽略其余属性,则可以使这两个实例都属于同一类。 这是一个熟悉的结论,因为通常情况下,可以将公共属性重构为超类,然后其余属性将属于子类。



>> As we can see (in both a) and b) and also above mentioned point), that both the instance belongs to SaleableItems, RentalItems, SoldItems, FavoriteList etc. >>如我们所见(在a)和b)中以及上述要点中),该实例均属于SaleableItems,RentalItems,SoldItems,FavoriteList等。

Now this is where the real trouble begins for me. As per Object Oriented Programming (OOP) principles, an object must exhibit Inheritance hierarchy.  In other words, if an object (instance in our case) belongs to multiple classes, these classes should be in an hierarchical order.  But these classes SaleableItems, RentalItems, SoldItems, and FavoriteList class are not in a hierarchical order, instead they are either mutually exclusive to each other, or they do not fall in the same hierarchy. This could mean two things:

现在这才是真正的麻烦开始的地方。 根据面向对象编程(OOP)的原则,对象必须具有继承层次结构。 换句话说,如果一个对象(在我们的例子中为实例)属于多个类,则这些类应按层次结构排列。 但是这些类SaleableItems,RentalItems,SoldItems和FavoriteList类不是按层次结构排列的,而是彼此互斥或不属于同一层次结构。 这可能意味着两件事:

i) Either, I have made a mistake in reaching to a conslusion in c) above. (which I have tried my best to avoid),... i)在以上c)中得出结论时,我都错了。 (我已尽力避免),...



ii) Or, the OOP principal of inheritance is erroneous, (which is a big conclusion to make since OOP has been around for long time, widely used, appreciated, and I haven't read about this problem before, probably no one has). ii)或者,继承的OOP原则是错误的(这是一个很重要的结论,因为OOP已经存在很长时间了,得到了广泛的使用和赞赏,而且我以前从未读过这个问题,也许没有人知道)。 。

As much as I want to think, revise and adjust the first point 'i)' above, it appears to me 'ii)' is true. It is easy to verify that using any language that claims to support OOPs fully, like Java. Say that there is class called Pen and Book. As already explained both of them can either belong to SaleableItems, RentalItems, SoldItems, FavoriteList etc class. Now, since all of them are not in a hierarchical order, you won't be able to up-cast a Pen or a Book object to all of them (unless, of course you can figure out a way to force a hierarchy here, which will incorrect semantically).

我想想,修改和调整上面的第一个点“ i)”,对我来说,“ ii)”似乎是正确的。 可以很容易地验证是否使用任何声称完全支持OOP的语言(例如Java)。 假设有一个名为“笔与书”的课程。 如前所述,它们都可以属于SaleableItems,RentalItems,SoldItems,FavoriteList等类。 现在,由于它们都不是按层次结构排列的,因此您将无法将Pen或Book对象向上投射到所有对象(除非,当然,您可以找到一种在此处强制执行层次结构的方法,这将在语义上不正确)。

Some more questions that can be asked after the conclusion made above are:

在得出以上结论之后,还可以问一些其他问题:

How do we represent this in terms of OOPs, is there a work-around?

我们如何用OOP表示这一点,是否有解决方法?



How do we represent this in terms of RDBMS and HDBMS at least, since they may not have all the restrictions of OOP?

既然RDBMS和HDBMS可能没有OOP的所有限制,我们至少如何表示它们呢?



I feel this is not a satisfactory conclusion, since semantics of both List and Category are entirely different.  Even if attributes are same and the behaviour is varying, then also objects could belong to the same class (check the Blue vs. Green example above).  Even if I consider Usage of an entity apart from properties and behaviour, then also no real progress is made.

我觉得这不是一个令人满意的结论,因为List和Category的语义完全不同。 即使属性相同且行为有所不同,对象也可能属于同一类(请参见上面的“蓝色与绿色”示例)。 即使我考虑除了属性和行为之外的实体用法,也没有取得真正的进展。

More question than answers have emerged from this article and I am more confused than ever.  While I am trying to research and explore about classes, objects, entity and attributes, it is becoming increasing obvious to me that there is more to OOP than I know (or what I am told).  I guess more limitations will surface as I am beginning to explore OOP and entity/attribute relationships.

本文提出的问题多于答案,我比以往更加困惑。 当我尝试研究和探索类,对象,实体和属性时,对我来说,越来越明显的是,OOP的意义远远超过我所知道的(或告诉我的)。 我认为随着我开始探索OOP和实体/属性关系,还会出现更多的限制。

翻译自: https://www.experts-exchange.com/articles/4182/List-vs-Category-Part-1-Introduction-to-the-problems.html

特征工程:类别变量编码

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值