Linux文件写入的工作原理

最新推荐文章于 2023-04-27 11:41:40 发布

Make Dream Happen

最新推荐文章于 2023-04-27 11:41:40 发布

阅读量616

点赞数

分类专栏：运维文章标签： Linux 文件写入

本文链接：https://blog.csdn.net/haohzhang/article/details/87600509

版权

运维专栏收录该内容

41 篇文章 0 订阅

订阅专栏

背景

做运维的同学估计很多都遇到过如下这个问题：

程序启动了多个线程或多个进程，这些线程或进程都会写入一个文件，这时就有可能会造成文件错乱的情况，也就是多个线程或进程同时写入一个文件，造成这个文件错乱了，有些行被插入到了另一些行里去了。

这时很多同学想到了可以用文件锁来解决这个问题，很好，但你知不知道触发文件错乱是有一定条件的，在一次写入文件很小的情况下是不会造成文件错乱的。

原理分析

操作系统最小原子的概念。其实对于Linux系统，有一个最小操作原子的变量，有的是1024bytes，有的是4096bytes，如果一次写入不超过这个阀值，是不会引起文件错乱的。

下面我贴出一个shell脚本来模拟这种情况。

# ./test_appends.sh 4096Launching 20 worker processes
Each line will be 4096 characters long
Waiting for processes to exit
Testing output file
.......................[snip]....
All's good! The output file had no corrupted lines.
# ./test_appends.sh 4097Launching 20 worker processes
Each line will be 4097 characters long
Waiting for processes to exit
Testing output file
.......................[snip]....Found 27 instances of corrupted lines


#############################################################################

#

# This script aims to test/prove that you can append to a single file from

# multiple processes with buffers up to a certain size, without causing one

# process' output to corrupt the other's.

#

# The script takes one parameter, the length of the buffer. It then creates

# 20 worker processes which each write 50 lines of the specified buffer

# size to the same file. When all processes are done outputting, it tests

# the output file to ensure it is in the correct format.

#

#############################################################################



NUM_WORKERS=20

LINES_PER_WORKER=50

OUTPUT_FILE=/tmp/out.tmp



# each worker will output $LINES_PER_WORKER lines to the output file

run_worker() {

    worker_num=$1

    buf_len=$2



    # Each line will be a specific character, multiplied by the line length.

    # The character changes based on the worker number.

    filler_len=$((${buf_len}-1)) # -1 -> leave room for \n

    filler_char=$(printf \\$(printf '%03o' $(($worker_num+64))))

    line=`for i in $(seq 1 $filler_len);do echo -n $filler_char;done`

    for i in $(seq 1 $LINES_PER_WORKER)

    do

        echo $line >> $OUTPUT_FILE

    done

}



if [ "$1" = "worker" ]; then

    run_worker $2 $3

    exit

fi



buf_len=$1

if [ "$buf_len" = "" ]; then

    echo "Buffer length not specified, defaulting to 4096"

    buf_len=4096

fi



rm -f $OUTPUT_FILE



echo Launching $NUM_WORKERS worker processes

for i in $(seq 1 $NUM_WORKERS)

do

    $0 worker $i $buf_len &

    pids[$i]=${!}

done



echo Each line will be $buf_len characters long

echo Waiting for processes to exit

for i in $(seq 1 $NUM_WORKERS)

do

    wait ${pids[$i]}

done



# Now we want to test the output file. Each line should be the same letter

# repeated buf_len-1 times (remember the \n takes up one byte). If we had

# workers writing over eachother's lines, then there will be mixed characters

# and/or longer/shorter lines.



echo Testing output file



# Make sure the file is the right size (ensures processes didn't write over

# eachother's lines)

expected_file_size=$(($NUM_WORKERS * $LINES_PER_WORKER * $buf_len))

actual_file_size=`cat $OUTPUT_FILE | wc -c`

if [ "$expected_file_size" -ne "$actual_file_size" ]; then

    echo Expected file size of $expected_file_size, but got $actual_file_size

else

  

    # File size is OK, test the actual content



    # Only use newer versions of grep because older ones are way too slow with

    # backreferences

    [[ $(grep --version) =~ [^[:digit:]]*([[:digit:]]+)\.([[:digit:]]+) ]]

    grep_ver="${BASH_REMATCH[1]}${BASH_REMATCH[2]}"

    if [ "$grep_ver" -ge "216" ]; then

        num_lines=$(grep -v "^\(.\)\1\{$((${buf_len}-2))\}$" $OUTPUT_FILE | wc -l)

    else

        # Scan line by line in bash, which isn't that speedy, but is good enough

        # Note: Doesn't work on cygwin for lines < 255

        line_length=$((${buf_len}-1))

        num_lines=0

        for line in `cat $OUTPUT_FILE`

        do

            if ! [[ $line =~ ^${line:0:1}{$line_length}$ ]]; then

                num_lines=$(($num_lines+1))

            fi;

            echo -n .

        done

        echo

    fi



    if [ "$num_lines" -gt "0" ]; then

        echo "Found $num_lines instances of corrupted lines"

        else

        echo "All's good! The output file had no corrupted lines. $size"

    fi

fi



rm -f $OUTPUT_FILE

Make Dream Happen

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Linux文件写入的工作原理

背景做运维的同学估计很多都遇到过如下这个问题：程序启动了多个线程或多个进程，这些线程或进程都会写入一个文件，这时就有可能会造成文件错乱的情况，也就是多个线程或进程同时写入一个文件，造成这个文件错乱了，有些行被插入到了另一些行里去了。这时很多同学想到了可以用文件锁来解决这个问题，很好，但你知不知道触发文件错乱是有一定条件的，在一次写入文件很小的情况下是不会造成文件错乱的。原理分析...
复制链接

扫一扫

专栏目录