数据类型不应被视为其行为的来源

cullen2012

于 2020-09-09 05:45:40 发布

阅读量168

点赞数

文章标签：数据库编程语言 python java 大数据

原文链接：https://habr.com/en/post/504898/

版权

When you start learning a programming language(or just programming in general), usually there are first few chapters in books that introduce us such thing like data types (or just types). And we don't focus on this subject as much as we should because it's so simple, right? Well… I think that there is one little detail which can lead to big mistakes of understanding what really data type is.

当您开始学习编程语言(或一般而言只是编程)时，通常书中的前几章向我们介绍诸如数据类型 (或类型 )之类的东西。而且我们没有像我们应该的那样专注于这个主题，因为它是如此简单，对吧？好吧……我认为其中只有一个细节，可能会导致在理解真正的数据类型上犯下很大的错误。

Let's start with primitive data types like byte, number, string, boolean and so on. Most programming languages support them is some or other forms. And in most cases they are not attached to any functions. Interpreter of a programming language can apply operators or functions on them, so we don't have to deal with such things. For example, we don't have to impelment binary operator like plus (+) for numbers and strings. I want to emphesize the fact that (+) operator can be also considered as a simple function, as it's just one of the syntax variations of how we can write it in our code. For example, in PHP language we use operator dot (.) for concatenating strings, not (+). So, basically for primitive data types we don't have to declare and implement operations which can be applicable for them.

让我们从原始数据类型开始，例如byte ， number ， string ， boolean等。大多数编程语言支持它们的某些或其他形式。而且在大多数情况下，它们没有附加任何功能。编程语言的解释者可以在其上应用运算符或函数，因此我们不必处理此类事情。例如，对于numbers和strings ，我们不必强制使用加号( + )之类的二进制运算符。我想强调一下( + )运算符也可以被视为简单函数的事实，因为它只是我们如何在代码中编写它的语法变体之一。例如，在PHP语言中，我们使用运算符点( . )来连接字符串，而不是( + )。因此，基本上，对于原始数据类型，我们不必声明和实现适用于它们的操作。

Then it becomes more interesing with such type like array, which in most cases is just a set of pointers to other types. In the code we declare array via brackets: [ and ], and inside of these brackets we list its pointers or elements. For example, that's how we declare an array in JavaScript:

然后，它与诸如数组之类的类型变得更加有趣，在大多数情况下，数组只是指向其他类型的一组指针。在代码中，我们通过方括号[和]声明数组，并在这些方括号内列出其指针或元素。例如，这就是我们在JavaScript中声明数组的方式：

/* In JavaScript we can use different types 
    of elements in array as it's dynamic language */

const array = [ 1, '2', 3, true ]

Most languages attach functions like add(), size(), get() and so on to this type, so we can do following:

大多数语言都将诸如add() ， size() ， get()类的函数add()到此类型，因此我们可以执行以下操作：

// Pseudo code
array = []
array.add(1)
array.add(true)
array.get(0) // is 1

Of course for languages like Java we can only assign and get pointers by their indexes directly, but in most dyncamic languages we actually can do something like it's been shown above.

当然，对于像Java这样的语言，我们只能直接通过其索引分配和获取指针，但是在大多数动态语言中，我们实际上可以执行上面显示的操作。

Before we continue let's see the definition of data type which is given in Wikipedia:

在继续之前，让我们看一下Wikipedia中给出的数据类型的定义：

“In computer science and computer programming, a data type or simply type is an attribute of data which tells the compiler or interpreter how the programmer intends to use the data.”

“在计算机科学和计算机编程中，数据类型或简单类型是数据的属性，它告诉编译器或解释器程序员打算如何使用数据。”

But do we really need to put the information of such intention inside of types? Well, it's a good question.

但是，我们真的需要将这种意图的信息放入类型中吗？好吧，这是一个好问题。

数据类型中的行为二元论 (Behavioural Dualism in Data Types)

When you look at following expression:

当您查看以下表达式时：

array.add(5)

Don't you ask yourself, why we don't do otherwise like:

您不要问自己，为什么我们不这样做：

5.addToArray(array)

You might think this is ridiculous nonsense, but is it?

您可能会认为这是荒谬的废话，是吗？

In object-oriented languages we have such things like interfaces, usually they are represented as a set of operations or methods that can be applied for them. And basically we define custom types via such abstractions. After constructing some interface we have to override such operations in some class that implements it. Let's take a look at following interface in Java:

在面向对象的语言中，我们有诸如接口之类的东西，通常将它们表示为可以应用于它们的一组操作或方法。基本上，我们通过此类抽象定义自定义类型。构造一些接口后，我们必须在实现该接口的某些类中重写此类操作。让我们看一下Java中的以下接口：

// Simple variation of List from JDK (just for Object)
public interface List {
  int size();
  boolean isEmpty();
  boolean contains(Object obj);
  Object[] toArray();
  Object get(int index);
  Object set(int index);
  void add(Object obj);
  void remove(Object o);
  void sort(Comparator<? super E> c);
  void clear();
  /** many many other mathods... **/
}

So, in interfaces we don't implement methods, we just declare what kind of methods they support (so we can implement them in classes, which are based on such interfaces later on).

因此，在接口中我们不实现方法，我们只声明它们支持哪种方法(这样我们就可以在稍后基于此类接口的类中实现它们)。

But here is the thing… Theoretically there are unlimited number of methods or operations for every type. If you think about List interface for a second, you can come up with many ideas: to save a list to some database, to send its representation via HTTP request, to cache it in memory or to convert it to a string, we can do literally whatever we want. Actually, if you look at real List from java.util package in JDK, you'll see tons of methods, which every class must(!) override if you want to create an implmentation of List. But do we really need all of them in our program?

但这就是问题……从理论上讲，每种类型都有无限数量的方法或操作。如果您想一想List接口，您可以想出很多主意：将列表保存到某个数据库，通过HTTP请求发送其表示形式，将其缓存在内存中或将其转换为字符串，我们可以实际上，我们想要什么。实际上，如果您从JDK java.util包中查看真实的List ，则会看到大量的方法，如果要创建List ，每个类都必须重写该方法。但是我们在程序中真的需要所有这些吗？

Another problem is here that for some reason we decided that a certain operation belongs to a certain type. That's what I would call “behavioural dualism in data types”. Let's take a look at two following interfaces Teacher and Student:

这里的另一个问题是由于某种原因，我们决定某个操作属于某个类型。这就是我所说的“数据类型中的行为二元论”。让我们看一下以下两个接口Teacher和Student ：

public interface Teacher {
  void giveInformation(Student student, Information information);
}

public interface Student {
  Assessment studyInformation(Information information);
}

Sounds logical, right? Well, is following logical too?

听起来合乎逻辑，对吧？好吧，也遵循逻辑吗？

public interface Student {
  Information getInformation(Teacher teacher);
}

public interface Information {
  Assessment processBy(Student student);
}

Or maybe we just place all logic in one place?

或者，也许我们只是将所有逻辑放在一个地方？

public interface Information {
  void transfer(Teacher teacher, Student student);
  Assessment processBy(Student student);
}

What about another place?

那另一个地方呢？

public interface Assessment {
  void give(Teacher teacher, Student student, Information information);
}

So, which of these four varations is correct? Or maybe all them are correct? Are any other? You also might say that it depends, right? But how to decide the right way of representing behaviour in our program?

那么，这四个变体中的哪个变体是正确的？也许它们都是正确的？还有其他吗？您可能还会说这取决于情况，对吧？但是，如何确定在程序中表示行为的正确方法呢？

Looking at these interfaces you can simplify all the logic just in one simple function:

查看这些接口，您可以通过一个简单的函数简化所有逻辑：

// Pseudo code
Assessment assessment = givenAssessment(
  fromTeacher, toStudent, forInformation
)

where Teacher, Student and Information are just data structures for function givenAssessment() that produces another structure Assessment. And in this case we always deal with real data, because even though givenAssessment() is a function, it actually represents Assesment which is expressed in the name of the function. This is the reason actually why I use only nouns with verb adjectives in function names as it makes the code declarative. And if you name arguments properly, you can actually read the code like in plain English:

其中Teacher ， Student和Information是用于功能只是数据结构givenAssessment()产生另一结构Assessment 。在这种情况下，我们总是处理真实的数据，因为即使givenAssessment()是一个函数，它实际上代表Assesment这是在函数的名称表示。这就是为什么我在函数名称中仅使用带动词形容词的名词的原因，因为它使代码具有声明性。而且，如果您正确命名参数，则实际上可以像普通英语一样阅读代码：

“It's a given assessment from a teacher to a student for the information (which the teacher gave to the student).”

“这是从老师到学生的信息评估(老师给学生的信息)。”

行为不属于数据类型 (Behaviour does not belong to a data type)

The main point which I am trying to prove is that operations, functions or methods in our code should not be attached to any types, because any function can process different types that can co-exists only in a context which presents there. In another words, data types should not dictate what kind of functions they are applicable to, it's the functions who dictate what kind of types they can process.

我要证明的主要观点是，我们代码中的操作，函数或方法不应附加到任何类型，因为任何函数都可以处理仅在存在此上下文的环境中才能共存的不同类型。换句话说，数据类型不应该决定它们适用于哪种类型的功能，而是由功能决定它们可以处理哪种类型的功能。

So, instead of

所以，代替

array.add(element)

we need to be able to write something like:

我们需要能够编写如下内容：

array = arrayWithNewElement(array, element)

Or instead of

或者代替

db.save(user)

we can write something like:

我们可以这样写：

user = savedUserIntoDatabase(db, user)

So, why is the suggested approach better? Well, I can come up with some pros:

那么，为什么建议的方法更好？好吧，我可以提出一些优点：

We have only one single point of behaviour, which is a some function that can do the whole work.
我们只有一个单一的行为点，它是可以完成全部工作的某些功能。
We don't have to build (or implment) types from their behaviour, the only thing we need is data. And when it's needed we can add functions that can process certain types of our data.
我们不必根据其行为来构建(或实现)类型，我们唯一需要的就是数据。并且在需要时，我们可以添加可以处理某些类型数据的函数。
We don't have so called behavioural dualism in data types, where it's not clear why certain types are attached to certain methods, because it's no longer possible as we separate data from behaviour. It's really important, because sometimes when we mix them, we often create meaningless types, which are not even in our business logic.
我们在数据类型中没有所谓的行为二元论，尚不清楚为什么将某些类型附加到某些方法上，因为在将数据与行为分离时不再可能。这确实很重要，因为有时我们混合使用它们时，常常会创建无意义的类型，而这些类型甚至不在我们的业务逻辑中。
Our code is decomposed. If we create proper functions, the only thing we need to do is just to pass arguments and to get result by invoking them.
我们的代码被分解了。如果我们创建适当的函数，我们唯一需要做的就是传递参数并通过调用参数来获得结果。
Our code becomes more declarative if we use proper naming of functions. And by proper naming I mean that we express the whole idea of behaviour behind the function, the result that we get from it and maybe even sometimes arguments which are required for the function. So, instead of thinking about how we build result, we actually see the result. For example,
如果我们使用适当的函数命名，我们的代码将变得更具说明性。通过适当的命名，我的意思是我们表达了函数背后的行为的整体概念，从函数中获得的结果，甚至有时表达了函数所需的参数。因此，我们实际上没有考虑如何构建结果，而是看到了结果。例如，

List<User> users = filteredUsersFromDatabaseByAgeAndGenderAndWhoIsFriendWithSpecifiedUser(db, 25, 'women', someUser)

If you think that's very verbose, well… Just read it, and you'll see that there is nothing to remove from the name. And the only thing you need to do is just to read, you don't have to guess. After reading just the name of the function you'll understand what structure you get from the function (List<User>), behaviour of the function (filtering) and of course what arguments you need for the function (database, age, gender, specified user who is friend).

如果您认为这很冗长，那么……阅读它，您会发现名称中没有任何要删除的内容。您唯一需要做的就是阅读，而不必猜测。只需阅读函数的名称，您就会了解从函数中获得的结构( List<User> )，函数的行为(过滤)以及函数所需的参数(数据库，年龄，性别，指定的朋友用户)。

Sure we can create dozen of interfaces like DB, User, Gender, Friend or some others and create a lot of complexity. But if we just need to get real result, we just can do it with one function.

当然，我们可以创建许多接口，如DB ， User ， Gender ， Friend或其他一些接口，并且会带来很多复杂性。但是，如果我们只需要获取实际结果，则只需使用一个功能即可。

I don't know about you, but I would gladly read such long code all the time. Because I like to read, I don't like to guess.

我不认识你，但是我很乐意一直阅读这么长的代码。因为我喜欢阅读，所以我不喜欢猜测。

This is it.

就是这个。

翻译自: https://habr.com/en/post/504898/

cullen2012

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
数据类型不应被视为其行为的来源

When you start learning a programming language(or just programming in general), usually there are first few chapters in books that introduce us such thing like data types (or just types). And we don't...
复制链接

扫一扫