文件和操作系统--技术文章分享 02

最新推荐文章于 2024-10-17 14:17:43 发布

5yuan

最新推荐文章于 2024-10-17 14:17:43 发布

阅读量101

点赞数

文章标签：操作系统 elf文件

本文链接：https://blog.csdn.net/weixin_43912361/article/details/120243717

版权

本文探讨了操作系统如何处理文件双击打开的机制，从早期的无属性到添加文件类型编码、扩展名、创作者代码、文件关联、魔术数字、MIME类型和统一类型标识符等方法。每种方案都有其优缺点，例如文件扩展名便于用户理解，但可能导致安全问题；魔术数字提供了一种确定文件类型的方法，但检查过程复杂且不总是可靠。随着技术发展，现代操作系统采用更灵活的文件类型管理和程序关联策略。

摘要由CSDN通过智能技术生成

The Perils of File Typing

文章连接: https://invisibleup.com/articles/34/

问题来源

Suppose you double-click on a file on your computer. You’re doing this so you can open the file and work with it. But does your operating system know what that means? How does it know what to open the file in? Let’s look at some solutions that have been proposed over the years to solving this issue.

实现功能: 实现双击打开文件.

问题: OS内部是如何处理这个 double click 的动作的, 具体来说: Click(file) 这个函数来如何实现?

解决方案

一种很简单的想法就是给各个文件添加一个属性, OS根据这个属性来决定双击一个文件的时候要干什么.

click(file) {
	switch file.property:
	....
}

添加属性的方式可以有很多种选择, 方式如下:
Nothing, Type Codes, File Extensions, Creator Codes, File Associations, Magic Numbers, MIME types and Uniform Type Identifers.

关于可扩展性内容

考虑这样的环境, 设计该操作系统的工程师设计了三种文件类型:

可运行二进制文件.
字符序列文件.

该工程师分别为上述三种文件设置了三个的property属性: 0x01, 0x02. 并且给点击事件设计了如下的函数.

func click(file f) {
	switch f.property{
		case 0x01:
		// load program to run
		case 0x02:
		// run text program and load this file to text process
	}
}

这样操作系统很容易知道对每个文件该执行什么样的操作, 只需要看看所操作文件里面设置的属性就好了.

这时有个新的需求, 这里有个 word 文件, 我不想用系统默认的 text program 来打开它, 我想使用 word 这个程序来打开它, 很自然的一个解决方案是给文件多增加一个属性 0x03:

func click(file f) {
	switch f.property{
		case 0x01:
		// load program to run
		case 0x02:
		// run text program and load this file to text process
		case 0x03:
		// run word program and load this file to wrod process
	}
}

OK, 那这样的话, 如果遇到更多的文件类型只需要增加属性并修改OS的源码就可以了; 例如可以有如下的例子:

func click(file f) {
	switch f.property{
		case 0x01:
		// load program to run
		case 0x02:
		// run text program and load this file to text process
		case 0x03:
		// run word program and load this file to wrod process
		case 0x04: 
		// run imgeViewer porgram and load this file to imgeViewer process
		case 0x05:
		// run player program and load this file to player process
		...
	}
}

其中 04 05 这个属性分布式图片文件和视频文件的.

这种设计方案有个明显的缺点就是, 当有一个新的文件类型我们需要用一个新的程序打开时, 我们就必须重新写并编译这个OS, 这个是很麻烦的. 而且当我们想卸载某个程序的时候, 同样也需要修正这段代码.

注意到 上面每个case要干的东西的形式其实是类似的, 我们可以写一个更加简明的写法:

func click(file f) {
	switch f.property{
		case 0x01:
		// load program to run
		other:
		// 1. find corresponding program according f.property
		String program_path = db.find(f.property)
		echo f.content | run program_path
	}
}

通过维护一个db文件, 将所有程序和对应的property作为表项进行存入就可以实现双击事件了.

注意, 能写成像db的形式是因为, 我们处理这些双击事件的时候写的代码形式是一样的, 但是如果处理双击事件要写的代码形式不一样就要额外进行处理了. 例如如下的例子

我们又有了新的需求, 我们要给这个OS添加目录文件, 和设备文件, 这个时候上面的双击函数就应该这样写:

func click(file f) {
	switch f.property{
		case 0x01:
		// load program to run
		case 0x11:
		/** 
		 * 目录文件 下面的行为假设是在一个 ext 文件系统上的.
		 * 1. 从文件属性存储区获取对应index的inode节点;
	     * 2. 解析 indoe 节点, 从文件数据存储区读入数据.
		 * 3. 向 **屏幕** 展示数据
		 */

		case 0x12:
		/** 
		 * 设备文件
		 * 1. 从文件属性存储区获取对应index的inode节点;
	     * 2. 解析 indoe 节点 **操作设备** 读入数据/写入数据.
		 */
		other:
		// 1. find corresponding program according f.property
		String program_path = db.find(f.property)
		echo f.content | run program_path
	}
}

为了让属性之间跟有区别, 我们将那些要求双击时间有特定功能的设置一个属性域, 那些同一执行程序并加载文件的方式, 这些对应的文件设置令一个属性域. 这两个属性域分别叫做 kind 和 type, 这样 click 函数可以通过如下的方式来实现.

func click(file f) {
	switch f.kind{
		case 0x01:
		// 程序文件
		case 0x02:
		/** 
		 * 目录文件 下面的行为假设是在一个 ext 文件系统上的.
		 * 1. 从文件属性存储区获取对应index的inode节点;
	     * 2. 解析 indoe 节点, 从文件数据存储区读入数据.
		 * 3. 向 **屏幕** 展示数据
		 */

		case 0x03:
		/** 
		 * 设备文件
		 * 1. 从文件属性存储区获取对应index的inode节点;
	     * 2. 解析 indoe 节点 **操作设备** 读入数据/写入数据.
		 */
		case 0x04:
		/**
	     * 普通文件
		 */
		// 1. find corresponding program according f.type
		String program_path = db.find(f.type)
		echo f.content | run program_path
	}
}

这样我们就实现了一个简易的双击事件处理函数:

总结一下, 我们给一个文件设置了两种属性, 一种是kind, 一种是type. OS 对于不同kind的文件所做的操作之间有差别, 对于不同type的文件OS统一是运行对应的程序, 然后将这个文件作为输入送入进程中.

kind 1
kind 2
kind 3
……
kind n
	type 1
	type 2
	type 3
	……
	type n

这个函数有很多可以进行优化的地方以及可能遇到的问题, 比如

type 是否一定是用两个字节存储?
一定要用数据库吗, 能否用其他的数据结构?
不同OS系统上 kind 和 type 表的不同, 网络传输文件过去对方是如何知道要用哪个程序打开?
文件属性是作为内容存储在数据区还是作为属性存在 inode 节点中?
为什么一定要文件属性呢? 右击选择打开方式, 它不香吗?

基于上面这几个问题工业界提供了多种解决方案, 也有对应的 pros and cons 之间的权衡, 下面是一些经典的解决方案:

Nothing

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-0bTMwFYF-1631367742413)(https://invisibleup.com/articles/34/IBM709.jpg)]

Used in: Early mainframes such as the IBM 704, etc.

在这些系统中, 它认为文件根本不需要属性, 你哪个地方输入就是什么类型的文件, 比如这个机器的A口就是接受程序的, 你就是要把含有程序的磁带放到A口, 这个机器的B口就是输入文件数据给程序的, 你就是要把数据文件放到B口, 这个机器C口就是输出文件的, 你就是要放一个存文件的磁带放到这个C口. 你知道哪些磁带是文件, 哪些磁带是程序, OS根本不管你输入的是文件还是程序, 你把一个装文件的磁带放到 A 口这个机器就会按程序的方式执行这 “文件”;

Type Codes

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-YkyGy6Iv-1631367742415)(https://invisibleup.com/articles/34/appledos.gif)]

Used in: Apple DOS for Apple II (1978-1980), HP-28 and HP-48 series, etc.

在这个系统中每个文件有一个名字, 以及一个 type code, 类似于上面的 kind 属性.

I (Integer BASIC program)
A (Applesoft BASIC program)
B (Binary files; either assembled programs or data)
T (ASCII text files)

对于里面的每个类型 Apple DOS 都有对应的特殊指令来和这些文件进行交互.

RUN command -> Interger BASIC and Applesoft BASIC programs
BRUN command -> binary run, only worked on binary files.
OPEN, READ, WRITE, and CLOSE -> all worked only on ASCII text files.

File Extensions

Used in: AMSDOS (Amstrad CPC), CP/M, MS-DOS, etc.

微软认为, 完全没有必要搞个属性 type , 直接在文件名后面以 . 的方式来指明这个文件应该以什么程序打开/加载. 比如 REPORT.TXT 这个文件应该被文件编辑器这个程序来加载. 这里强调应该的原因是用户任然需要手动来选择程序来打开这个文件. 将后缀名和程序关联这件事情就是 File Associations干的事情了.

Creator Codes

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1ZvYwznm-1631367742417)(https://invisibleup.com/articles/34/macinfo.gif)]

Used in: Classic Mac OS

实现方式是, 给每个程序一个 Creator Codes, 文件关联对应的 Creator Codes, 这里的 Creator Codes 就和 type 属性很像了. 有专门的数据库来粗才能关联信息, 程序, 图标等.

File Associations

Used in: Windows

和 Create Code 实现方式很像, 只不过 Creadtor Code在 DOS 系统中指示的是 extensions. 相同的 extension 默认的打开程序一致.

存在的问题 : Some applications may save to the same format, but handle radically different kinds of documents.

Magic Numbers

Unix/Linux 系列认为, 文件名就是文件名, 文件属性就是文件属性, 不要多加一个type字段, 不要多搞个 extension, 使用者理应知道这个文件类型是什么.

后来 Unix/Linux 系列还是给文件加了个Type 属性, 可以方便用户知道这个文件的formats是什么, 不同的是这个属性被写到文件内容中去了, 这串特殊的数字叫做 Magic Numbers, 可以通过file命令来查看! 而且这个属性跟程序的关系还比较麻烦和特殊, 问题集中在如下几点:

Checking a file’s type becomes a computationally expensive process, compared to just checking if some letters match.
Some file formats may not have magic numbers
File formats don’t indicate what a file’s kind is, what type of data the file actually contains regardless of format.
Some file formats are based off of other file formats (ex: Firefox extensions and Word documents are both ZIP archives.), which makes analysis complicated.
Some file formats might be two distinct formats at the same time! This includes maliciously constructed polyglot files as well as, ex: installers that contain ZIP archive data.