BattleMoonWars 归档解压/压缩程序(砍掉重炼版)

以前写过[url=http://rednaxelafx.iteye.com/blog/180521]BattleMoonWars的归档处理程序[/url]。可惜在JavaEye上的老帖里的代码真的非常老了;那代码我后来改过几次的,修掉以前偷懒不支持多层目录的问题,还修过对应LZSS解压的问题(但一直偷懒没实现压缩……用Java来写LZSS压缩真痛苦)。
改过的版本老早就不知道哪儿去了。茶茶也找不到那代码了,但要发新的BMW完全版的汉化补丁又等着用这打包工具……拖到今天终于重新写了个。

以前那个是用Java写的,这次用Ruby来试试。目标是用干净的代码实现归档处理程序需要的解包和打包功能,外带应用上LZSS的压缩。
写完之后,拿代码跟以前的Java版比较了一下,觉得自己貌似没啥长进。眯着眼睛看的话会觉得俩版本都差不多,至少“算法”都一样——这么小的东西根本没用上啥算法嘛orz
不过以前那版代码要想整理得干净一些的话显然是有许多办法的。印象中后来也没这么乱了,但,找不到了啊 T T
诶,用Java这么重量级的语言去实现这么简单的小工具本身就是个杯具。要是当年我就熟悉什么脚本语言的话……

Ruby的代码还是比Java的显著的少了。我倒没特意要为了短而写短,反而有些地方还特意用了比较长的名字和有点冗余的运算,因为懒得写注释,代码本身就应该能体现意图。
代码能减少的一个原因在于Java的标准I/O API用起来不方便,而且默认用big endian,在我们这little endian的平台上用起来很是费事。就为了各种转换endian、在byte[]与别的数据类型间转换之类的地方就写了很多垃圾代码。
相比之下Ruby的I/O API用起来非常方便,而且靠强大的String#unpack与Array#pack就已经可以解决很多转换问题,所以读取索引项的代码Ruby版比Java版短很多。
另外,Java代码的“噪音”还是比较大。嵌套的a.setXxx(b.getXxx())这类代码自然比不上a.xxx = b.xxx来得清爽。

OK,又到大段贴代码时间。贴代码比放附件“安全”多了,回头Google Reader把这篇备份到了之后我就不怕代码丢失了 ha ha ha
这次用的Ruby版本是去年7月自己用VC9来build的1.9.2dev。
[quote]ruby 1.9.2dev (2009-07-22 trunk 24241) [i386-mswin32_90][/quote]

[color=red]在GBK编码的系统上打开文件时,可能要注意用String#force_encoding('gbk')强制将ASCII-8BIT编码的文件名转换一下。[/color]

bmw_archiver.rb
#!../ruby19/bin/ruby
# coding: binary

require 'fileutils'
require File.join(File.expand_path(File.dirname(__FILE__)), 'compress/lzss.so')
include FileUtils

module BMWArchiver
class Archive
BENIFIT_THRESHOLD = 20

SIGNATURE = 'yanepkDx'
attr_accessor :arc_path, :dir_path, :entry_count
attr_reader :index

class IndexEntry
LENGTH = 256 + 4 * 3

attr_accessor :file_path, :offset
attr_accessor :original_length, :compressed_length
attr_writer :compressed

def initialize(file_path, offset, original_length, compressed_length)
@file_path, @offset, @original_length, @compressed_length =
file_path, offset, original_length, compressed_length

@compressed = original_length > compressed_length
end

def compressed?
@compressed
end

def should_compress?
['.yga', '.ogg'].none? {|ext| @file_path.downcase.end_with? ext }
end

def self.from_binary(str)
IndexEntry.new(*(str.unpack 'Z256III'))
end

def to_binary
[
@file_path, @offset,
@original_length, @compressed_length
].pack('Z256III')
end

def to_s
<<-ENTRY_STR
IndexEntry {
file_path: #@file_path
offset: 0x#{@offset.to_s 16}
compression: #{self.compressed? ? 'LZSS' : 'none'}
original_length: #@original_length
compressed_length: #@compressed_length
}
ENTRY_STR
end
end

def add_index_entry(entry)
@index ||= []
@index << entry
end

def extract_to(dest_dir_path)
@dir_path = dest_dir_path
File.open(@arc_path, 'rb') do |arc_file|
@index.each do |entry|
arc_file.pos = entry.offset
contents = arc_file.read(entry.compressed_length)
if entry.compressed?
contents = Compress::LZSS::decode(contents, entry.original_length)
end
dest_file_path = File.join(dest_dir_path, entry.file_path)
mkdir_p File.dirname dest_file_path
File.open(dest_file_path, 'wb') do |dest_file|
dest_file << contents
end
end
end
end

def save_to(dest_arc_path)
@arc_path = dest_arc_path
File.open(@arc_path, 'wb') do |arc_file|
# write signature and entry count
arc_file << SIGNATURE << [@entry_count].pack('I')

# write contents first, and calculate offset
index_section_start = SIGNATURE.bytesize + 4
index_length = IndexEntry::LENGTH * @entry_count
data_section_start = index_section_start + index_length
current_offset = data_section_start
arc_file.pos = current_offset

@index.each do |entry|
entry.offset = current_offset
contents = File.binread File.join(@dir_path, entry.file_path)
if entry.should_compress?
compressed_contents = Compress::LZSS::encode(contents)
compressed_length = compressed_contents.bytesize

# only use compressed version when it's benificial
if compressed_length + BENIFIT_THRESHOLD < entry.original_length
contents = compressed_contents
entry.compressed_length = compressed_length
entry.compressed = true
end
end
arc_file << contents
current_offset += entry.compressed_length
end

# write index
arc_file.pos = index_section_start
@index.each do |entry|
arc_file << entry.to_binary
end
end
end

def to_s
<<-ARCHIVE_STR
BMW Archive file: #@arc_path
No. of entries: #@entry_count
Entries:
#{@index.map{|e| e.to_s}.join}
ARCHIVE_STR
end

class << self
# builds an Archive object with index initialized from specified file
def from_archive(path)
File.open(path, 'rb') do |arc_file|
verify_signature_of arc_file
self.new.tap do |arc|
arc.arc_path = path
arc.entry_count = read_int32_from arc_file
arc.entry_count.times do
arc.add_index_entry(
IndexEntry.from_binary(arc_file.read(IndexEntry::LENGTH)))
end
end
end
end

def build_from_dir(src_dir_path)
verify_src_dir src_dir_path

self.new.tap do |arc|
arc.dir_path = src_dir_path

# HACK: for the ease of using Dir.glob
old_pwd = Dir.pwd
cd src_dir_path

Dir.glob('**/*').each do |path|
next if Dir.exists? path # skip directories

file_length = File.size path
arc.add_index_entry IndexEntry.new(
path.gsub('/', '\\'),
-1, # offset not calculated yet
file_length,
file_length)
end

arc.entry_count = arc.index.size

cd old_pwd
end
end

private
def verify_signature_of(file)
raise 'Wrong signature' unless file.read(8) == SIGNATURE
end

def verify_src_dir(dir_path)
unless Dir.exists? dir_path
raise "The specified source directory doesn't exist: #{dir_path}"
end
end

def read_int32_from(file)
file.read(4).unpack('I').first
end
end
end
end


if __FILE__ == $0
include BMWArchiver
opt, arc_path, dir_path = ARGV
case opt
when 'e'
arc = Archive.from_archive arc_path
arc.extract_to dir_path
puts arc
when 'a'
arc = Archive.build_from_dir dir_path
arc.save_to arc_path
puts arc
else
puts <<-USAGE_STR
Usage: bmw_archiver opt arc_file dir_path
options:
e - extract arc_file to dir_path
a - pack dir_path into arc_file
USAGE_STR
end
puts 'done'
end


上面就是这归档处理程序的主要代码了。
这次多得有[url=http://night-stalker.iteye.com/]night_stalker[/url]同学的大力支持,帮忙在一个现成的经典LZSS实现上包了层Ruby扩展。下面是完整代码。

他已经把这个包装版发成一个gem了,地址是[url]http://rubygems.org/gems/lzss[/url]。安装此gem需要一个C编译器,GCC、VC之类都可以。

extconf.rb
require "mkmf"

create_makefile 'lzss'


把实际实现注册给Ruby用:

lzss-ext.c
#include <ruby.h>

size_t Encode(size_t ilen, char* istr, size_t olen, char* ostr);
void Decode(size_t ilen, unsigned char* istr, size_t olen, char* ostr);

static VALUE encode(VALUE self, VALUE str) {
size_t ilen = RSTRING_LEN(str);
char* buff = (char*)malloc(ilen * 2);
size_t olen = Encode(RSTRING_LEN(str), RSTRING_PTR(str), ilen * 2, buff);
VALUE ret = rb_str_new(buff, olen);
free(buff);
return ret;
}

static VALUE decode(VALUE self, VALUE str, VALUE v_olen) {
size_t ilen = RSTRING_LEN(str);
VALUE ret = rb_str_new(0, NUM2INT(v_olen));
Decode(ilen, RSTRING_PTR(str), NUM2INT(v_olen), RSTRING_PTR(ret));
return ret;
}

void Init_lzss() {
VALUE Compress = rb_define_module("Compress");
VALUE LZSS = rb_define_module_under(Compress, "LZSS");
rb_define_module_function(LZSS, "encode", RUBY_METHOD_FUNC(encode), 1);
rb_define_module_function(LZSS, "decode", RUBY_METHOD_FUNC(decode), 2);
}


实际LZSS实现用的是[url=http://oku.edu.mie-u.ac.jp/~okumura/]奥村晴彦[/url]教授在80年代末写的版本。日本许多游戏里用的LZSS压缩实现都是从这个版本演化而来的,用起来非常顺手。
稍微修改了几处:
1、原始版本是用空格(0x20)来填充滑动窗口的初始值,这里修改为用0x00填充。由于原本实现主要用于压缩文本,用空格填充是个很合理的选择;这里则可能压缩任意二进制数据,所以选择用0x00填充。
2、把原本直接对FILE的I/O操作改为对unsigned char*的buffer的操作,便于把它当作库来用。
算法本身是完全没有修改的。没尝试去提高压缩速度(解压速度也没啥改善余地了)。其实这个时候还有后续的进化版,但我的首要目标是让这个归档处理程序能压缩出跟那游戏引擎里的工具完全一致的归档,自然就不太关心进一步提升压缩速度或压缩率了 =_=|||

lzss.c
/**************************************************************
LZSS.C -- A Data Compression Program
(tab = 4 spaces)
***************************************************************
4/6/1989 Haruhiko Okumura
Use, distribute, and modify this program freely.
Please send me your improved versions.
PC-VAN SCIENCE
NIFTY-Serve PAF01022
CompuServe 74050,1022
**************************************************************/
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define N 4096 /* size of ring buffer */
#define F 18 /* upper limit for match_length */
#define THRESHOLD 2 /* encode string into position and length if match_length is greater than this */
#define NIL N /* index for root of binary search trees */

/* of longest match. These are set by the InsertNode() procedure. */
static int match_position;
static int match_length;

static void InsertNode(unsigned char* text_buf, int* lson, int* rson, int* dad, int r)
/* Inserts string of length F, text_buf[r..r+F-1], into one of the
trees (text_buf[r]'th tree) and returns the longest-match position
and length via the global variables match_position and match_length.
If match_length = F, then removes the old node in favor of the new
one, because the old one will be deleted sooner.
Note r plays double role, as tree node and position in buffer. */
{
int i, p, cmp;
unsigned char *key;

cmp = 1; key = &text_buf[r]; p = N + 1 + key[0];
rson[r] = lson[r] = NIL; match_length = 0;
for ( ; ; ) {
if (cmp >= 0) {
if (rson[p] != NIL) p = rson[p];
else { rson[p] = r; dad[r] = p; return; }
} else {
if (lson[p] != NIL) p = lson[p];
else { lson[p] = r; dad[r] = p; return; }
}
for (i = 1; i < F; i++)
if ((cmp = key[i] - text_buf[p + i]) != 0) break;
if (i > match_length) {
match_position = p;
if ((match_length = i) >= F) break;
}
}
dad[r] = dad[p]; lson[r] = lson[p]; rson[r] = rson[p];
dad[lson[p]] = r; dad[rson[p]] = r;
if (rson[dad[p]] == p) rson[dad[p]] = r;
else lson[dad[p]] = r;
dad[p] = NIL; /* remove p */
}

static void DeleteNode(int* lson, int* rson, int* dad, int p) /* deletes node p from tree */
{
int q;

if (dad[p] == NIL) return; /* not in tree */
if (rson[p] == NIL) q = lson[p];
else if (lson[p] == NIL) q = rson[p];
else {
q = lson[p];
if (rson[q] != NIL) {
do { q = rson[q]; } while (rson[q] != NIL);
rson[dad[q]] = lson[q]; dad[lson[q]] = dad[q];
lson[q] = lson[p]; dad[lson[p]] = q;
}
rson[q] = rson[p]; dad[rson[p]] = q;
}
dad[q] = dad[p];
if (rson[dad[p]] == p) rson[dad[p]] = q; else lson[dad[p]] = q;
dad[p] = NIL;
}

#define _get(c) \
if (! ilen) {\
c = EOF;\
break;\
}\
c = *istr;\
++istr;\
--ilen

#define _put(c) \
*ostr = c;\
++ostr;\
--olen

size_t Encode(size_t ilen, char* istr, size_t olen, char* ostr)
{
int i, c, len, r, s, last_match_length, code_buf_ptr;
unsigned char code_buf[17], mask;
size_t codesize = 0;
int lson[N + 1], rson[N + 257], dad[N + 1]; /* left & right children & parents -- These constitute binary search trees. */
unsigned char text_buf[N + F - 1]; /* ring buffer of size N, with extra F-1 bytes to facilitate string comparison */

match_position = 0;
match_length = 0;

if (ilen == 0) return 0;

/* initialize trees */
/* For i = 0 to N - 1, rson[i] and lson[i] will be the right and
left children of node i. These nodes need not be initialized.
Also, dad[i] is the parent of node i. These are initialized to
NIL (= N), which stands for 'not used.'
For i = 0 to 255, rson[N + i + 1] is the root of the tree
for strings that begin with character i. These are initialized
to NIL. Note there are 256 trees. */
for (i = N + 1; i <= N + 256; i++) rson[i] = NIL;
for (i = 0; i < N; i++) dad[i] = NIL;

code_buf[0] = 0; /* code_buf[1..16] saves eight units of code, and
code_buf[0] works as eight flags, "1" representing that the unit
is an unencoded letter (1 byte), "0" a position-and-length pair
(2 bytes). Thus, eight units require at most 16 bytes of code. */
code_buf_ptr = mask = 1;
s = 0; r = N - F;
for (i = s; i < r; i++) text_buf[i] = 0; /* Clear the buffer with
any character that will appear often. */
for (len = 0; len < F && ilen; len++) {
_get(c);
text_buf[r + len] = c;
/* Read F bytes into the last F bytes of the buffer */
}
for (i = 1; i <= F; i++) InsertNode(text_buf, lson, rson, dad, r - i); /* Insert the F strings,
each of which begins with one or more 'space' characters. Note
the order in which these strings are inserted. This way,
degenerate trees will be less likely to occur. */
InsertNode(text_buf, lson, rson, dad, r); /* Finally, insert the whole string just read. The
global variables match_length and match_position are set. */
do {
if (match_length > len) match_length = len; /* match_length
may be spuriously long near the end of text. */
if (match_length <= THRESHOLD) {
match_length = 1; /* Not long enough match. Send one byte. */
code_buf[0] |= mask; /* 'send one byte' flag */
code_buf[code_buf_ptr++] = text_buf[r]; /* Send uncoded. */
} else {
code_buf[code_buf_ptr++] = (unsigned char) match_position;
code_buf[code_buf_ptr++] = (unsigned char)
(((match_position >> 4) & 0xf0)
| (match_length - (THRESHOLD + 1))); /* Send position and
length pair. Note match_length > THRESHOLD. */
}
if ((mask <<= 1) == 0) { /* Shift mask left one bit. */
for (i = 0; i < code_buf_ptr; i++) { /* Send at most 8 units of */
_put(code_buf[i]); /* code together */
}
codesize += code_buf_ptr;
code_buf[0] = 0; code_buf_ptr = mask = 1;
}
last_match_length = match_length;
for (i = 0; i < last_match_length && ilen; i++) {
_get(c);
DeleteNode(lson, rson, dad, s); /* Delete old strings and */
text_buf[s] = c; /* read new bytes */
if (s < F - 1) text_buf[s + N] = c; /* If the position is
near the end of buffer, extend the buffer to make
string comparison easier. */
s = (s + 1) & (N - 1); r = (r + 1) & (N - 1);
/* Since this is a ring buffer, increment the position
modulo N. */
InsertNode(text_buf, lson, rson, dad, r); /* Register the string in text_buf[r..r+F-1] */
}
while (i++ < last_match_length) { /* After the end of text, */
DeleteNode(lson, rson, dad, s); /* no need to read, but */
s = (s + 1) & (N - 1); r = (r + 1) & (N - 1);
if (--len) InsertNode(text_buf, lson, rson, dad, r); /* buffer may not be empty. */
}
} while (len > 0); /* until length of string to be processed is zero */
if (code_buf_ptr > 1) { /* Send remaining code. */
for (i = 0; i < code_buf_ptr; i++) {
_put(code_buf[i]);
}
codesize += code_buf_ptr;
}

return codesize;
}

void Decode(size_t ilen, unsigned char* istr, size_t olen, char* ostr) /* Just the reverse of Encode(). */
{
unsigned char text_buf[N + F - 1]; /* ring buffer of size N, with extra F-1 bytes to facilitate string comparison */
int i, j, k, r, c;
unsigned int flags;

for (i = 0; i < N - F; i++) text_buf[i] = 0;
r = N - F; flags = 0;
for ( ; ; ) {
if (((flags >>= 1) & 256) == 0) {
_get(c);
flags = c | 0xff00; /* uses higher byte cleverly */
} /* to count eight */
if (flags & 1) {
_get(c);
_put(c); text_buf[r++] = c; r &= (N - 1);
} else {
_get(i);
_get(j);
i |= ((j & 0xf0) << 4); j = (j & 0x0f) + THRESHOLD;
for (k = 0; k <= j; k++) {
c = text_buf[(i + k) & (N - 1)];
_put(c); text_buf[r++] = c; r &= (N - 1);
}
}
}
}

#undef _get
#undef _put


最后把需要的Ruby和那归档处理程序打个包放附件。备份完成 XDD
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值