Archiving and backup

最新推荐文章于 2024-09-15 14:50:06 发布

weixin_34195142

最新推荐文章于 2024-09-15 14:50:06 发布

阅读量138

点赞数

文章标签：运维操作系统 shell

原文链接：http://www.cnblogs.com/itmeatball/p/7637809.html

版权

One of the primary tasks of a computer system's administrator is keeping the system's data secure.One way this is done is by performing timely backup of the system's files.Even if you're not system administrators,it is often useful to make copies of things and to move large collections of files from place to place and from device to device.In this chapter,we will look at several common programs that are used to manage collections of files.There are the file compression programs:

gzip-Compress or expand files
bzip2-A block sorting file compressor(块排序文件压缩器)

The archiving programs;

tar-Tape archinving utility(磁带打包工具)
zip-Package and compress files

And the file synchronization program(文件同步程序):

rsync-Remote file and directory synchronization(同步远端文件和目录)

compressed file

Throughout the history of computing,there has been a struggle(挣扎，奋斗) to get the most data into the smallest available space,whether that space be memory,storage devices or network bandwidth.Many of the data services that we take for granted(算是如此，但是) today,such as portable music players,high definition(清晰，鲜明) television,or broadband Internet,owe their existence to effective data compression techniques.

Data compression is the process of removing redundancy from data.Let's consider an imaginary example.Say we had an entirely black picture file with the dimensions（规模，尺寸规格） of one hundred pixels by one hundred pixels.In terms of data storage(assuming twenty-four bits,or three bytes per pixel),the image will occupy thirty thousand bytes of storage:

An image that is all one color contains entirely redundant data.If we were clever,we could encode（编码，译码） the data in such a way that we simply describe the fact that we have a block of thirty thousand black pixels.So,instead of storing a block of data containing thirty thousands zeros(black is usually represented in image files as zero),we could compress the data into the number 30,000,followed by a zero to represent our data.Such a data compression scheme is called run-length encoding(游戏编码) and is one of the most rudimentary(基本的，初步的) compression techniques.Today's techniques are much more advanced and complex but the basic goal remains the same-----get rid（使摆脱，解除...的负担） of redundant data.

Compression algorithms(算法) （the mathematical techniques used to carry out the compression） fall into two general categories(生物类别)，lossless and lossy.Lossless compression preserves all the data contained in the original.This means that when a file is restored from a compressed version,the restored file is exactly the same as the original,uncompressed version.Lossy compression,on the other hand,removes data as the compression is performed,to allow more compression to be applied.When a lossy file is restored,it does not match the original version;rather,it is a close approximation(接近).Examples of lossy compression are JPEG(for images) and MP3(for music.) In our discussion,we will look exclusively(专门地，专有地) at lossless compression,since most data on computers cannot tolerate(容许，承认) any data loss.

gzip

The gzip program is used to compress one or more files.When executed,it replaces the original file with a compressed version of the original.The corresponding (相当的，对应的) gunzip program is used to restore compressed files to their original,uncompressed form.Here is an example:

In this example,we create a text file named foo.txt from a directory listing.Next,we run gzip,which replaces the original file with a compressed version named foo.txt.gz.In the directory listing of foo.*,we see that the original file has been replaced with the compressed version,and that the compressed version about one-fifth the size of the origianal.We can also see that the compressed file has the same permissions and time stamp as the original.

Next,we run the gunzip program to uncompress the file.Afterward,we can see that the compressed version of the file has been replaced with the original,again with the permissions and time stamp preserved(保存，保留)

gzip has many options,Here are a few:

Table 19-1:gzip Options

Option	Description
-c	With output to standard output and keep original files.May also be specified with --stdout and --to-stdout.
-d	Decompress.This causes gzip to act like gunzip.May also be specified with --decompress or --uncompress.
-f	Force(强制) compression even if compressed version of the original file already exists.May also be specified with --force.
-h	Display usage information.May also be specified with --help.
-l	List compression statistics for each file compressed.May also be specified with --list.
-r	If one or more arguments on the command line are directories,recursively compress files contained within them.May also be specified with --recursive. 递归压缩目录中的文件
-t	Test the integrity(完整) of a compressed file.May also be specified with --test.
-v	Display verbose messages while compressing.May also be specified with --verbose.
-number	Set amount of compression(设置压缩指数).number is an integer in the range of 1(fastest,least compression) to 9 (slowest,more compression).The values 1 and 9 may also be expressed as --fast and --best,repectively.The default value is 6.

Going back to our earlier example:

Here,we replaced the file foo.txt with a compressed version,named foo.txt.gz.Next,we tested the intergrity of the compressed version,using the -t and -v options.Finally,we decompressed the file back to its original form.gzip can also be used in interesting ways via standard input and output:

This command creates a compressed version of a directory listing:

The gunzip program,which uncompresses gzip files,assumes that filenames end in the extension .gz,so it's not necessary to specify it,as long as the specified name is not in conflict(冲突) with an existing uncompressed file:

If our goal were only to view the contents of a compressed text file,we can do this:

Alternately(轮流地，交替地),there is a program supplied with gzip, called zcat,that is equivalent(相等地，相当地) to gunzip with the -c option.It can be used like the cat command on gzip compressed files:

Tip:There is a zless program,too.It performs the same function as the pipeline above.

bizp2

The bzip2 program,by Julian Seward,is similar to gzip,but uses a different compression algorithm(运算法则) that achieves higher levels of compression at the cost of (以...为代价) compression speed. In most regards,it works in the same fashion as gzip.A file compressed with bzip2 is denoted(表示，指示) with the extension .bz2

As we can see,bzip2 can be used the same way as gzip.All the options(except for -r) that we discussed for gzip are also supported in bzip2.Note,however,that the compression level option (-number) has a somewhat different meaning to bzip2.bizp2 comes with bunzip2 and bzcat for decompressing files.bzip2 also comes with the bzip2recover program,which will try to recover damaed .bz2 files.

Don't Be Compressive Compulsive(强迫性的)

I occasionally see people attempting to compress a file,which has been already compressed with an effective compression algorithm,by doing something like this:

$ gzip picture.jpg

Don't do it.You're probably just wasting time and space! If you apply compression to a file that is already compressed,you will actually end up a larger file.This is because all compression techniques involve some overhead(经费，开销) that already contains no redundant information,the compression will not result in any savings to offset(抵消，补偿).

Archiving files

A common file management task used in conjunction(连词，结合，联合) with compression is archiving.Archiving is the process of gathering up many files and bundling them together into a single large file.Archiving is often done as a part of system backups.It is also used when old data is moved from a system to some type of long-term storage.

tar

In the Unix-like world of software,the tar program is the classic tool for archiving files.Its name,short for tape archive,reveals its roots as a tool for making backup tapes.（揭示了它的根源，它是一款制作磁带备份的工具）While it is still used for that traditional task,it is equally adept（熟练的，擅长的）on other storage devices as well.We often see filenames that end with the extension .tar or .tgz which indicate a "plain" tar archive and a gzipped archive,respectively.A tar archive can consist of a group of separate files,one or more directory hierarchies(等级制度),or a mixture of both.The command syntax(句法，句法规则) works like this:

where mode is one of the following operating modes(only a partial list is shown here;see the tar man page for a complete list):

Table 19-2:tar Modes

Mode	Description
c	Create an archiive from a list of files and /or directories.为文件和/或目录列表创建归档文件
x	Extract an archive.抽取归档文件。
r	Append specified pathnames to the end of an archive.追加具体的路径到归档文件的末尾
t	List the contents of an archive.列出归档文件的内容

tar uses a slightly odd way of expressing options,so we;ll need some examples to show how it works.First,let's re-create our playground from the previous chapter:

To list the contents of the archive,we can do this:

For a more detailed listing,we can add the v (verbose) option:

Now ,let's extract the playground in a new location.We will do this by creating a new directory named foo,and changing the directory and extracting the tar archive:

If we examine the contents of ~/foo/playground,we see that the archive was successfully installed,creating a precise(精确的) reproduction(复制品) of the original files.There is one caveat(警告)，however:unless you are operating as the superuser,files and directories extracted from archives take on the ownership of the user performing the restoration,rather than the original owner.

Another interesting behavior of tar is the way it handles pathnames in archives.The default for pathnames is relative.rather than absolute.tar does this by simply removing any leading slash from the pathname when creating the archive.To demonstrate(说明，演示，论证，证明),we will recreate our archive,this time specifying an absolute pathname:

Remember,~/playground will expand into /home/root/playground when we press the enter key,so we will get an absolute pathname for our demonstration(证明，表明，示范).Next,we will extract the archive as before and watch watch what happens:

Here we can see that when we extracted our second archive,it recreated the directory root/playground relative to our current working directory,~/foo,not relative to the root directory,as would have been the case with an absolute pathname.This may seem like an odd() way for it to work,but it's actually more useful this way,as it allows us to extract archives to any location rather than being forced to extact them to their original locations.Repeating the exercise with the inclusion(包括，包含) of the verbose option(v) will give a clearer picture of what's going on.

Let's consider a hypothetical(假设的，假定的)，yet practical example,of tar in action.Imagine we want to copy the home directory and its contents from one system to another and we have a large USB hard drive that we can use for the transfer.On our modern Linux system,the drive is "automagically" mounted in the /media directory.Let's also imagine that the disk has a volume（由逻辑磁盘形成的虚拟盘，也可称为磁盘分区） name of BigDisk when we attach it.To make the tar archive,we can do the following:

测试：

After the tar file is written,we unmount the drive and attach it to the second computer.Again,it is mounted at /media/BigDisk.To extract the archive,we do this:

What's important to see here is that we must first change directory to /,so that the extraction is relative to the root directory,since all pathnames within the archive are relative.

When extracting an archive,it's possible to limit what is extracted from the archive.For example,if we want to extract a single file from an archive,it could be done like this:

By adding the trailing pathname to the command,tar will only restore the specified file.Multiple pathnames may be specified.Note that the pathname must be the full,exact relative pathname as stored in the archive.When specifying pathnames,wildcards are not normally supported;however,the GNU version of tar(which is the version most often found in Linux distributions) supports them with the -wildcards option.Here is an example using our previous playground.tar file:

This command will extract only files matching the specified pathname including the wildcard dir-*.

tar is often used in conjunction with find to produce archives.In this example,we will use find to produce a set of files to include an archive:

Here we use find to match all the files in playground named file-A and then,using the -exec action,we invoke tar in the append mode (r) to add the matching files files to the archive playground.tar.

Using tar with find is a good way of creating incremental(增加的) backups of a directory tree or an entire system.By using find to match files newer than a timestamp file,we could create an archive that only contains files newer than the last archive,assuming that the timestamp file is updated right after each archive is created.

tar can also make use of both standard input and output.Here is a comprehensive（广泛的，综合的） example:

In this example,we used the find program to produce a list of matching files and piped them into tar.If the filename "-" is specified,it is taken to mean standard input or output,as needed(by the way,this convention of using "-") to represent standard input/output is used by a number of other programs,too.)The -files-from option(which may be also be specified as -T) causes tar to read its list of pathnames from a file rather than the command line.Lastly,the archive produced by tar is piped into gzip to create the compressed archive playground.tgz. The .tgz extension is the conventional extension given to gzip-compressed tar files.The extension .tar.gz is also used sometimes.

While we used the gzip program externally(在外面，在外部，表面上) to produced our compressed archive,modern versions of GNU tar support both gzip and bzip2 compression directly,with the use of the z and j options,respectively(各自的，各个的). Using our previous example as a base,we can simplify(简化) it this way:

If we had wanted to create a bzip2 compressed archive instead,we could have done this:

By simply changing the compression option from z to j (and changing the output file's extension to .tbz to indicate a bzip2 compressed file) we enabled bzip2 compression.Another interesting use of standard input and output with the tar command involves(牵涉，涉及) transferring files between systems over（越过） a network.Imagine that we had two machines running a Unix-like system equipped with tar and ssh.In such a a scenario(情景),we could transfer a directory from a remote system (named remote-sys for this example) to our local system:

Here we were able to copy a directory named Documents from the remote system remote-sys to a directory within the directory named remote-stuff on the local system.How did we do this?First,we launched the tar program on the remote system using ssh.You will recall that ssh allows us to execute a program remotely on a networked computer and "see" the results on the local system-the standard output produced on the remote system is sent to the local system for viewing.We can take advantage of this by having tar create an archive(the c mode) and send it to standard output,rather than a file(the f option with the dash argument) ,thereby transporting the archive over the encrypted tunnel provided by ssh to the local system.On the local system,we execute tar and have it expand an archive(the x mode) supplied(供给，供应) from standard input (again,the f option with the dash argument）.

zip

The zip program is both a compression tool and an archiver.The file format used by the program is familiar to Windows users,as it reads and writes .zip files.In Linux,however,gzip is the predominant(占主导地位的，显著的) compression program wih bzip2 being a close second.

In its most basic usage,zip is invoked like this:

For example,to make a zip archive of our playground,we would do this:

Unless we include the -r option for recursion,only the playground directory(but none of its contents) is stored.Although the addition of the extension .zip is automatic a,we will include the file extension for clarity(清楚，明晰)

During the creation of the zip archive,zip will normally display a series of messages like this:

These messages show the status of each file added to the archive .zip will add files to the archive using one of two storage methods:either it will "store" a file without compression,as shown here,or it will "deflate" the file which performs compression.The numeric value displayed after the storage method indicates the amount of compression achieved(取得的，完成的).Since our playground only contains empty files,no compression is performed on its contents.

Extracting the contents of a zip file is straightforward when using the unzip program:

One thing to note about zip(as opposed to tar) is that if an existing archive is specified,it is updated rather than replaced.This means that the existing archive is preserved,but new files are added and matching files are replaced.Files may be listed and extracted selectively(有选择的，选择性的) from a zip archive by specifying them to unzip:

Using the -l option(只是列出文件包中文件，没有抽取文件)， causes unzip to merely(仅仅，只不过，只是) list the contents of the archive without extracting the file. If no file(s) are specified,unzip will list all files in the archive.The -v option can be added to increase the verbosity of the listing.Note that when the archive extraction confilct （冲突） with an existing file,the user is prompted before the file is replaced.

Like tar,zip can make use of standard input and output,though its implementation(贯彻，执行) is somewhat less useful.It is possible to pipe a list of filenames to zip via the -@ option:

Here we use find to generate a list of files matching the test -name "file-A",and pipe the list into zip,which creates the archive file-A.zip containing the selected files.

zip also supports writing its output to standard output,but its use is limited because very few programs can make use of the output.Unfortunately,the unzip program,does not accept standard input.This prevents zip and unzip from being used together to perform network file copying like tar.

zip can,however,accept standard input,so it can be used to compress the output of other programs:

In this example we pipe the output of ls into zip.Like tar,zip interprets the trailing dash as "use standard input for the input file"

The unzip program allows output to be sent to standard output when the -p(for pipe) option is specified:

We touched on some of the basic things that zip/unzip can do.They both have a lot of options that add to their flexibility,though some are platform specific to other systems.The man pages for both zip and unzip are pretty good and contain useful examples.However,the main use of these programs is for exchanging files with Windows systems,rather than performing compression and archiving on Linux,where tar and gzip are greatly preferred(首选地).

Synchronize files and directories

A common strategy(战略学，战略) for maintaining a backup copy of a system involves keeping one or more directories synchronized with another directory(or directories) located on either the local system(usually a removable storage device of some kind) or with a remote system.We might,for example,have a local copy of a web site under development and synchronize it from time to time with the "live" copy on a remote web server.In the Unix-like world,the preferred tool for this task is rsync.This program can synchronize both local and remote direcotries by using the rsync remote-update protocol,which allows rsync to quickly detect the differences between two directories and perform the minimum amount of copying required to bring them into sync.This makes rsync very fast and economical to use,compared to other kinds of copy programs.

rsync is invoked like this:

where source and destination are one of the following:

A local file or directory

A remote file or directory in the form of [user@]host:path

A remote rsync server specified with a URI of rsync://[user@]host:[:port]/path

Note that either the source and destination must be a local file.Remote to remote copying is not supported.

Let's try rsync out on some local files.First,let's clean out our foo directory:

Next,we'll synchronize the playground directory with a corresponding(相对应的) copy（副本） in foo:

We've included both the -a option(for archiving-causes recursion and preservation(保护) of file attributes) and the -v option( verbose(冗长的) output(输出) 冗余输出) to make a mirror(所谓镜像文件其实和ZIP压缩包类似，它将特定的一系列文件按照一定的格式制作成单一的文件，以方便用户下载和使用) of the playground directory within foo.While the command runs,we will see a list of the files and directories being copied.At the end, we will see a list of the files and directories being copied.At the end,we will see a summary message like this:

indicating the amount of copying performed.If we run the command again,we will see a different result:

Notice that there was no listing of files.This is because rsync detected that there were no differences between ~playground and ~/foo/playground,and therefore it didn't need to copy anything.If we modify a file in playground and run rsync again:

we see that rsync detected the change and copied only the updated file.As a practical example,let's consider the imaginary external hard drive that we used earlier with tar.If we attach the drive to our system and ,once again,it is mounted at /media/BigDisk,we can perform a useful system backup by first creating a directory,named /backup on the external drive and then using rsync to copy the most important stuff from our system to the external drive:

In this example,we copied the /etc,/home,and /usr/local directories from our system to our imaginary storage device.We included the -delete option to remove files that may have existed on the backup device that no longer existed on the source device(this is irrelevant(不相关的) the first time we make a backup,but will be useful on subsequent(随后的，继...之后) copies.)Repeating the procedure（程序，步骤）of attaching the external drive and running this rsync command would be a useful(though no ideal) way of keeping a small system backed up.Of course,an alias would be helpful here,too.We could create an alias and add it to our .bashrc file to provide this feature:

Now all we have to do is attach our external drive and run the backup command to do the job.

Use the Rsync command between networks

One of the real beauties of rsync is that it can be used to copy files over a network.After all,the "r" in rsync stands for "remote." Remote copying can be done in one of two ways.The first way is with another system that has rsync installed,along with a remote shell program such as ssh.Let's say we had another system on our local network with a lot of available hard drive space and we wanted to perform our backup operation using the remote system instead of an external drive.Assuming that it already had a directory named /backup where we could deliver our files,we could do this:

We made two changes to our command to facilitate(使便利，减轻...的困难) the network copy.First,we added the -rsh=ssh option,which instructs rsync to use the ssh program as its remote shell.In this way,we were able to use an ssh encrypted tunnel to securely transfer the data from the local system to the remote host.Second,we specified the remote host by prefixing(加前辍) its name（in this case the remote host is named remote-sys）to the destination path name.(通过在目标路径名前加上远端主机的名字，来指定远端主机)

The second way that rsync can be used to synchronize files over a network is by using an rysnc server.rsync can be configured to run as a daemon(守护神) and listen to incoming requests for synchronization.This is often done to allow mirroring of a remote system.For example,Red Hat Software maintains a large repository(存放处，储藏室) of software packages under development for its Fedora distribution.It is useful for software testers to mirror this collection during the testing phase(阶段，时期) of the distribution release cycle.Since files in the repository change frequently(often more than once a day),it is desirable(可取的) to maintain a local mirror by periodic(周期的，定期的) synchronization,rather than by bulk(大量，大多数，大部分) copying f the repository.One of these repositories is kept at Georgia Tech;we could mirror it using our local copy of rsync and their rsync server like this:

In this example,we use the URI of the remote rsync server,which consists of a protocol(rsync://),followed by the remote host name(rsync.gtlib.gatech.edu),followed by the pathname of the repository.

转载于:https://www.cnblogs.com/itmeatball/p/7637809.html