Linux kernel 3.0

Linux 3.0

Summary:Besides a new version numbering scheme, Linux 3.0 also has several newfeatures: Btrfs data scrubbing and automatic defragmentation, XEN Dom0support, unprivileged ICMP_ECHO, wake on WLAN, Berkeley Packet FilterJIT filtering, a memcached-like system for the page cache, a sendmmsg()syscall that batches sendmsg() calls and setns(), a syscall that allowsbetter handling of light virtualization systems such as containers. Newhardware support has been added: for example, Microsoft Kinect, AMDLlano Fusion APUs, Intel iwlwifi 105 and 135, Intel C600serial-attached-scsi controller, Ralink RT5370 USB, several realtekrtl81xx devices or the Apple iSight webcam. Many other drivers andsmall improvements have been added.

  1. Prominent features
    1. Btrfs: Automatic defragmentation, scrubbing, performance improvements
    2. sendmmsg(): batching of sendmsg() calls
    3. XEN dom0 support
    4. Cleancache
    5. Berkeley Packet Filter just-in-time filtering
    6. Wake on WLAN support
    7. Unprivileged ICMP_ECHO messages
    8. setns() syscall: better namespace handling
    9. Alarm-timers
  2. Driver and architecture-specific changes
  3. VFS
  4. Process scheduler
  5. Memory management
  6. Networking
  7. File systems
  8. Crypto
  9. Virtualization
  10. Security
  11. Tracing/profiling
  12. Various core changes

1. Prominent features

1.1. Btrfs: Automatic defragmentation, scrubbing, performance improvements

Automatic defragmentation

COW (copy-on-write) filesystems have manyadvantages, but they also have some disadvantages, for examplefragmentation. Btrfs lays out the data sequentially when files arewritten to the disk for first time, but a COW design implies that anysubsequent modification to the file must not be written on top of theold data, but be placed in a free block, which will cause fragmentation(RPM databases are a common case of this problem). Aditionally, itsuffers the fragmentation problems common to all filesystems.

Btrfs already offers alternativas to fight thisproblem: First, it supports online defragmentation using the command"btrfs filesystem defragment". Second, it has a mount option, -onodatacow, that disables COW for data. Now btrfs adds a third option,the -o autodefrag mount option. This mechanism detects small randomwrites into files and queues them up for an automatic defrag process,so the filesystem will defragment itself while it's used. It isn'tsuited to virtualization or big database workloads yet, but works wellfor smaller files such as rpm, sqlite or bdb databases. Code: (commit)

Scrub

"Scrubbing" is the process of checking theintegrity of the data in the filesystem. This initial implementation ofscrubbing will check the checksums of all the extents in thefilesystem. If an error occurs (checksum or IO error), a good copy issearched for. If one is found, the bad copy will be rewritten. Code: (commit 1, 2)

Other improvements

-File creation/deletion speedup: The performanceof file creation and deletion on btrfs was very poor. The reason isthat for each creation or deletion, btrfs must do a lot of b+ treeinsertions, such as inode item, directory name item, directory nameindex and so on. Now btrfs can do some delayed b+ tree insertions ordeletions, which allows to batch these modifications. Microbenchmarksof file creation have been speed up by ~15%, and file deletion by ~20%.Code: (commit)

-Do not flush csum items ofunchanged file data: speeds up fsync. A sysbench workload doing "randomwrite + fsync" went from 112.75 requests/sec to 1216 requests/sec.Code: (commit)

-Quasi-round-robin for spaceallocation in multidevice setups: the chunk allocator currently alwaysallocates space on the devices in the same order. This leads to a veryuneven distribution, especially with RAID1 or RAID10 and an unevennumber of devices. Now Btrfs always sorts the devices beforeallocating, and allocates the stripes on the devices with the mostavailable space. Code: (commit)

1.2. sendmmsg(): batching of sendmsg() calls

Recvmsg() and sendmsg() are the syscalls used to receive/send data to the network. In 2.6.33, Linux added recvmmsg(),a syscall that allows to receive in a single call data that would needmultiple recvmsg() calls, improving throughput and latency for a numberof scenarios. Now, a equivalent sendmmsg() syscall has been added. Amicrobenchmark saw a 20% improvement in throughput on UDP send and 30%on raw socket send

Code: (commit)

1.3. XEN dom0 support

Finally, Linux has got Xen dom0 support

1.4. Cleancache

Recommended LWN article: Cleancache and Frontswap

Cleancache is an optional feature that canpotentially increases page cache performance. It could be described asa memcached-like system, but for cache memory pages. It provides memorystorage not directly accessible or addressable by the kernel, and itdoes not guarantee that the data will not vanish. It can be used byvirtualization software to improve memory handling for guests, but itcan also be useful to implement things like a compressed cache.

Code: (commit), (commit)

1.5. Berkeley Packet Filter just-in-time filtering

Recommended LWN article: A JIT for packet filters

The Berkeley Packet Filter filteringcapabilities, used by tools like libpcap/tcpdump, are normally handledby an interpreter. This release adds a simple JIT that generates nativecode when filter is loaded in memory (something already done by otherOSes, like FreeBSD). Admin need to enable this feature writting "1" to /proc/sys/net/core/bpf_jit_enable

Code: (commit)

1.6. Wake on WLAN support

Wake on Wireless is afeature to allow the system to go into a low-power state (e.g. ACPI S3suspend) while the wireless NIC remains active and does varying thingsfor the host, e.g. staying connected to an AP or searching fornetworks. The 802.11 stack has added support for it.

Code: (commit 1, 2)

1.7. Unprivileged ICMP_ECHO messages

Recommended LWN article: ICMP sockets

This release makes it possible to send ICMP_ECHOmessages (ping) and receive the corresponding ICMP_ECHOREPLY messageswithout any special privileges, similar to what is implemented in Mac OS X.In other words, the patch makes it possible to implement setuid-lessand CAP_NET_RAW-less /bin/ping. Initially this functionality waswritten for Linux 2.4.32, but unfortunately it was never made public.The new functionality is disabled by default, and is enabled at bootupby supporting Linux distributions, optionally with restriction to agroup or a group range.

Code: (commit)

1.8. setns() syscall: better namespace handling

Recommended LWN article: Namespace file descriptors

Linux supports different namespaces for many ofthe resources its handles; for example, lightweight forms ofvirtualization such as containers or systemd-nspawshow to the virtualized processes a virtual PID different from the realPID. The same thing can be done with the filesystem directorystructure, network resources, IPC, etc. The only way to set differentnamespace configurations was using different flags in the clone()syscall, but that system didn't do things like allow to one processesto access to other process' namespace. The setns() syscall solves thatproblem-

Code: (commit 1, 2, 3, 4, 5, 6)

1.9. Alarm-timers

Recommended LWN article: Waking systems from suspend

Alarm-timers are a hybrid style timer, similarto high-resolution timers, but when the system is suspended, the RTCdevice is set to fire and wake the system for when the soonestalarm-timer expires. The concept for Alarm-timers was inspired by theAndroid Alarm driver, and the interface to userland uses the POSIXclock and timers interface, using two new clockids:CLOCK_REALTIME_ALARMand CLOCK_BOOTTIME_ALARM.

Code: (commit 1, 2)

2. Driver and architecture-specific changes

All the driver and architecture-specific changes can be found in the Linux_3.0_DriverArch page

3. VFS

  • Cache xattr security drop check for write: benchmarking on btrfsshowed that a major scaling bottleneck on large systems on btrfs iscurrently the xattr lookup on every write, which causes an additionaltree walk, hitting some per file system locks and quite badscalability. This is also a problem in ext4, where it hits the globalmbcache lock. Caching this check solves the problem (commit)

4. Process scheduler

  • Increase SCHED_LOAD_SCALE resolution: With this extraresolution, the scheduler can handle deeper cgroup hiearchies and dobetter shares distribution and load load balancing on larger systems(especially for low weight task groups) (commit), (commit)

  • Move the second half of ttwu() to the remote cpu: avoids havingto take rq->lock and doing the task enqueue remotely, saving lots oncacheline transfers. A semaphore benchmark goes from 647278 workerburns per second to 816715 (commit)

  • Next buddy hint on sleep and preempt path: a worst-casebenchmark consisting of 2 tbench client processes with 2 threads eachrunning on a single CPU changed from 105.84 MB/sec to 112.42 MB/sec (commit)

5. Memory management

  • Make mmu_gather preempemtible (commit)

  • Batch activate_page() calls to reduce zone->lru_lock contention (commit)

  • tmpfs: implement generic xattr support (commit)

  • Memory cgroup controller:

    • Add memory.numastat api for numa statistics (commit)

    • Add the pagefault count into memcg stats (commit)

    • Reclaim memory from nodes in round-robin order (commit)

    • Remove the deprecated noswapaccount kernel parameter (commit)

6. Networking

  • Allow setting the network namespace by fd (commit)

  • Wireless

    • Add the ability to advertise possible interface combinations (commit)

    • Add support for scheduled scans (commit)

    • Add userspace authentication flag to mesh setup (commit)

    • New notification to discover mesh peer candidates. (commit)

  • Allow ethtool to set interface in loopback mode. (commit)

  • Allow no-cache copy from user on transmit (commit)

  • ipset: SCTP, UDPLITE support added (commit)

  • sctp: implement socket option SCTP_GET_ASSOC_ID_LIST (commit), implement event notification SCTP_SENDER_DRY_EVENT (commit)

  • bridge: allow creating bridge devices with netlink (commit), allow creating/deleting fdb entries via netlink (commit)

  • batman-adv: multi vlan support for bridge loop detection (commit)

  • pkt_sched: QFQ - quick fair queue scheduler (commit)

  • RDMA: Add netlink infrastructure that allows for registration of RDMA clients (commit)

7. File systems

BLOCK LAYER

  • Submit discard bio in batches in blkdev_issue_discard() - makes discarding data faster (commit)

EXT4

CIFS

  • Add support for mounting Windows 2008 DFS shares (commit)

  • Convert cifs_writepages to use async writes (commit), (commit)

  • Add rwpidforward mount option that enables a mode when CIFSforwards pid of a process who opened a file to any read and writeoperation (commit)

OCFS2

NILFS2

XFS

8. Crypto

  • caam - Add support for the Freescale SEC4/CAAM (commit)

  • padlock - Add SHA-1/256 module for VIA Nano (commit)

  • s390: add System z hardware support for CTR mode (commit), add System z hardware support for GHASH (commit), add System z hardware support for XTS mode (commit)

  • s5p-sss - add S5PV210 advanced crypto engine support (commit)

9. Virtualization

  • User Mode Linux: add earlyprintk support (commit), add ucast ethernet transport (commit)

  • xen: add blkback support (commit)

10. Security

  • Allow the application of capability limits to usermode helpers (commit)

  • SELinux

    • add /sys/fs/selinux mount point to put selinuxfs (commit)

    • Make selinux cache VFS RCU walks safe (improves VFS performance) (commit)

11. Tracing/profiling

  • perf stat: Add -d -d and -d -d -d options to show more CPU events (commit), (commit)

  • perf stat: Add --sync/-S option (commit)

12. Various core changes

  • rcu: priority boosting for TREE_PREEMPT_RCU (commit)

  • ulimit: raise default hard ulimit on number of files to 4096 (commit)

  • cgroups

    • remove the Namespace cgroup subsystem. It has been replaced by acompatibility flag 'clone_children', where a newly created cgroup willcopy the parent cgroup values. The userspace has to manually create acgroup and add a task to the 'tasks' file (commit)

    • Make 'procs' file writable (commit)

  • kbuild: implement several W= levels (commit)

  • PM/Hibernate: Add sysfs knob to control size of memory for drivers (commit)

  • posix-timers: RCU conversion (commit)

  • coredump: add support for exe_file in core name (commit)



the original link:http://kernelnewbies.org/Linux_3.0



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值