What’s Wrong with the Thread Object? 5

July 7, 2009

Posted by Bartosz Milewski under Programming
[9] Comments

I started writing a post about implementing actors in D when I realized that there was something wrong with the way thread spawning interacts with data sharing. Currently D’s Thread class closely mimics its Java counterpart, so I started wondering if this may be a more general problem–a problem with mixing object-oriented paradigm with multithreading.

In functional languages like Erlang or Concurrent ML, you start a thread by calling a function–spawn or create, respectively. The argument to this function is another function–the one to be executed in a new thread. If you think of a new thread as a semi-separate application, the thread function is its “main”. You may also pass arguments to it–the equivalent of argc, argv, only more general. Those arguments are, of course, passed by value (we’re talking functional programming after all).

In non-functional languages it’s possible and often desirable to share data between threads. It’s very important to know which variables are shared and which aren’t, because shared variables require special handling–they need synchronization. You may share global variables with a thread, or you may pass shared variables to it during thread creation. You may also provide access to more shared variables during the thread’s lifetime by attaching and detaching them from the original shared variables–that’s how message queues may be described in this language.

In the object-oriented world everything is an object so, predictably, a (capital letter) Thread is an object too. An object combines data with code. There is a piece of data associated with every thread–the thread ID or a handle–and there are things you can do to a thread, like pause it, wait for its termination, etc.

But there’s more: A Thread in Java has a thread function. It’s a method called run. The user defines his or her own thread by inheriting from Thread and overriding run. Since run takes no arguments, data has to be passed to the new thread by making it part of the derived thread object. In general a thread object contains two types of data: shared and non-shared. The non-shared data are the value arguments passed by the creator to the thread function, possibly some return values, plus state used internally by the thread function. Thread data may form some logical abstraction or, as it often happens, have the structure of a kitchen sink after a dinner party.

Because of the presence of shared state, a Thread object must be considered shared, thus requiring synchronization. As I described in my previous posts, public methods of a shared object must either be synchronized or lock free. But what about the run method? It cannot be synchronized because that would make the whole thread object inaccessible to other threads, including its creator (which may be okay for a daemon thread, but wold be a fatal flaw in general).

It makes perfect sense for run to be private, since nobody should call it from the outside (or, for that matter, from the inside). But the reason for private methods not requiring synchronization is that they are only called from public methods, which must be synchronized. This is not true for run–run is not called from under any lock! So essentially run has unfettered access to all (potentially shared) data stored in this. Unless the programmer is very disciplined, the potential for data races is high. And that’s where Java stands right now (the Runnable interface has the same problems).

If you’ve been following my blog, you know that I’m working on a type system that eliminates races. If I can convince Walter and Andrei, this system might be implemented in the D programming language. As I mentioned, D’s treatment of threads comes directly from Java and therefore is inherently unsafe. Like in Java, D’s Thread object has a run method.

So how could D, with the race-free type system, improve on Java? One possibility is to make the run method lockfree. A lockfree method (public or private) is not synchronized but its access to this is severely restricted. It can only operate on lockfree data members (if any), unless it takes the object’s lock or calls a synchronized method (all public methods of a shared object are, by default, synchronized). Let me give you a small example:
class Counter: Thread {
public:
// public methods are implicitly synchronized
void inc() { ++_cnt; }
void dec() { --_cnt; }
private:
override void run() lockfree {
inc(); // ok: calling a synchronized method
wait(10000);
synchronized(this) { // ok: explicit synchronization on this
_cnt = _cnt * _cnt;
}
// _cnt /= 2; // error: not synchronized!
}
int _cnt;
}
// usage:
auto cnt = new shared Counter;
cnt.start;
cnt.inc;
cnt.join;

This approach would work and guarantee freedom from races, but I don’t think it fits D. Unlike Java, which is so OO that even main is a method of an object, D is a multi-paradigm language. It doesn’t have to force threads into an OO paradigm. And the thread function, being a thread equivalent of main doesn’t have to be a method of any object. In my opinion, the functional approach to thread creation would serve D much better.

Here’s how I see it:
class Counter {
public:
void inc() { ++_cnt; }
void dec() { --_cnt; }
int get() const { return _cnt; }
void set(int cnt) { _cnt = cnt; }
private:
int _cnt;
}

void counterFun(shared Counter cnt) {
cnt.inc;
Thread.sleep(10000);
synchronized(cnt) {
int c = cnt.get;
cnt.set(c * c);
}
}
// usage:
auto cnt = new shared Counter;
Thread thr = Thread.spawn(&counterFun, cnt);
thr.start;
cnt.inc;
thr.join;

I find it more logical to have start and join operate on a different object, thr, rather than on the counter, cnt. The static template method spawn accepts a function that takes an arbitrary number of arguments that are either values or shared or unique objects. In particular, you could pass to it your (shared) communication channels or message queues to implement the message-passing paradigm.

Such spawn primitive could be used to build more complex classes–including the equivalents of a Java’s Thread or a Scala’s Actor.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
GeoPandas是一个开源的Python库,旨在简化地理空间数据的处理和分析。它结合了Pandas和Shapely的能力,为Python用户提供了一个强大而灵活的工具来处理地理空间数据。以下是关于GeoPandas的详细介绍: 一、GeoPandas的基本概念 1. 定义 GeoPandas是建立在Pandas和Shapely之上的一个Python库,用于处理和分析地理空间数据。 它扩展了Pandas的DataFrame和Series数据结构,允许在其中存储和操作地理空间几何图形。 2. 核心数据结构 GeoDataFrame:GeoPandas的核心数据结构,是Pandas DataFrame的扩展。它包含一个或多个列,其中至少一列是几何列(geometry column),用于存储地理空间几何图形(如点、线、多边形等)。 GeoSeries:GeoPandas中的另一个重要数据结构,类似于Pandas的Series,但用于存储几何图形序列。 二、GeoPandas的功能特性 1. 读取和写入多种地理空间数据格式 GeoPandas支持读取和写入多种常见的地理空间数据格式,包括Shapefile、GeoJSON、PostGIS、KML等。这使得用户可以轻松地从各种数据源中加载地理空间数据,并将处理后的数据保存为所需的格式。 2. 地理空间几何图形的创建、编辑和分析 GeoPandas允许用户创建、编辑和分析地理空间几何图形,包括点、线、多边形等。它提供了丰富的空间操作函数,如缓冲区分析、交集、并集、差集等,使得用户可以方便地进行地理空间数据分析。 3. 数据可视化 GeoPandas内置了数据可视化功能,可以绘制地理空间数据的地图。用户可以使用matplotlib等库来进一步定制地图的样式和布局。 4. 空间连接和空间索引 GeoPandas支持空间连接操作,可以将两个GeoDataFrame按照空间关系(如相交、包含等)进行连接。此外,它还支持空间索引,可以提高地理空间数据查询的效率。
SQLAlchemy 是一个 SQL 工具包和对象关系映射(ORM)库,用于 Python 编程语言。它提供了一个高级的 SQL 工具和对象关系映射工具,允许开发者以 Python 类和对象的形式操作数据库,而无需编写大量的 SQL 语句。SQLAlchemy 建立在 DBAPI 之上,支持多种数据库后端,如 SQLite, MySQL, PostgreSQL 等。 SQLAlchemy 的核心功能: 对象关系映射(ORM): SQLAlchemy 允许开发者使用 Python 类来表示数据库表,使用类的实例表示表中的行。 开发者可以定义类之间的关系(如一对多、多对多),SQLAlchemy 会自动处理这些关系在数据库中的映射。 通过 ORM,开发者可以像操作 Python 对象一样操作数据库,这大大简化了数据库操作的复杂性。 表达式语言: SQLAlchemy 提供了一个丰富的 SQL 表达式语言,允许开发者以 Python 表达式的方式编写复杂的 SQL 查询。 表达式语言提供了对 SQL 语句的灵活控制,同时保持了代码的可读性和可维护性。 数据库引擎和连接池: SQLAlchemy 支持多种数据库后端,并且为每种后端提供了对应的数据库引擎。 它还提供了连接池管理功能,以优化数据库连接的创建、使用和释放。 会话管理: SQLAlchemy 使用会话(Session)来管理对象的持久化状态。 会话提供了一个工作单元(unit of work)和身份映射(identity map)的概念,使得对象的状态管理和查询更加高效。 事件系统: SQLAlchemy 提供了一个事件系统,允许开发者在 ORM 的各个生命周期阶段插入自定义的钩子函数。 这使得开发者可以在对象加载、修改、删除等操作时执行额外的逻辑。
SQLAlchemy 是一个 SQL 工具包和对象关系映射(ORM)库,用于 Python 编程语言。它提供了一个高级的 SQL 工具和对象关系映射工具,允许开发者以 Python 类和对象的形式操作数据库,而无需编写大量的 SQL 语句。SQLAlchemy 建立在 DBAPI 之上,支持多种数据库后端,如 SQLite, MySQL, PostgreSQL 等。 SQLAlchemy 的核心功能: 对象关系映射(ORM): SQLAlchemy 允许开发者使用 Python 类来表示数据库表,使用类的实例表示表中的行。 开发者可以定义类之间的关系(如一对多、多对多),SQLAlchemy 会自动处理这些关系在数据库中的映射。 通过 ORM,开发者可以像操作 Python 对象一样操作数据库,这大大简化了数据库操作的复杂性。 表达式语言: SQLAlchemy 提供了一个丰富的 SQL 表达式语言,允许开发者以 Python 表达式的方式编写复杂的 SQL 查询。 表达式语言提供了对 SQL 语句的灵活控制,同时保持了代码的可读性和可维护性。 数据库引擎和连接池: SQLAlchemy 支持多种数据库后端,并且为每种后端提供了对应的数据库引擎。 它还提供了连接池管理功能,以优化数据库连接的创建、使用和释放。 会话管理: SQLAlchemy 使用会话(Session)来管理对象的持久化状态。 会话提供了一个工作单元(unit of work)和身份映射(identity map)的概念,使得对象的状态管理和查询更加高效。 事件系统: SQLAlchemy 提供了一个事件系统,允许开发者在 ORM 的各个生命周期阶段插入自定义的钩子函数。 这使得开发者可以在对象加载、修改、删除等操作时执行额外的逻辑。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值