遇到一个问题: 用python 统计一组数据, 我用脚本生成数据, 用python 格式保存, import 进来就能用。过程是生成一个py, 处理一个py, 数据py文件是同名的。 我知道py 有变化, 会重新生产 pyc。 但实际和预期的不一样。 有连续的数据结果一样, 但从原始数据上看, 应该是不一样的。查到的原来是有时处理下个数据的时候, 用的是上个数据的pyc。 然后找了写资料看看。大概情况如下:
1) pyc 的更新需要看py和对应pyc文件的修改时间. 获得修改时间用os.stat, 用 strace 更新重新运行调用的 stat64 系统调用.
2) man 2 stat 能找到 st_mtime, 和 对秒以下精度支持的说明
time_t st_mtime; /* time of last modification */
Since kernel 2.5.48, the stat structure supports nanosecond resolution for the three file timestamp fields. Glibc exposes the nanosecond component of each field using names of the form st_atim.tv_nsec if the _BSD_SOURCE or _SVID_SOURCE feature test macro is defined. These fields are specified in POSIX.1-2008, and, starting with version 2.12, glibc also exposes these field names if _POSIX_C_SOURCE is defined with the value 200809L or greater, or _XOPEN_SOURCE is defined with the value 700 or greater. If none of the aforementioned macros are defined, then the nanosecond values are exposed with names of the form st_atimensec. On file systems that do not support subsecond timestamps, the nanosecond fields are returned with the value 0.
文件 /usr/include/i386-linux-gnu/asm/stat.h
struct stat {
...
unsigned long st_mtime;
unsigned long st_mtime_nsec;
...
};
测试脚本
$ cat ./test.sh
echo "i="$1>data.py && python ./test.py
$ cat ./test.py
import data
print data.i
测试结果
$ time seq 1 1000 | xargs -n1 -I{} bash ./test.sh {} >>logs3
real 0m29.425s
user 0m7.520s
sys 0m3.268s
$ cat logs3 | sort | uniq -c | wc -l
30
4) 解决: 删除 pyc 文件 或 用不同的数据文件名