Linux Source Code Passes 10 Million Lines

[ original link]

THE NUMBER of lines of source code comprising Linux kernel files recently surpassed the ten million mark after the latest release of Linux version 2.6.27, an analysis has found.

However, that count includes blank lines, comments and text files included in a full checkout of the kernel source. Counted slightly differently, the number of lines of actual text is "only" over nine million, but we rather like that larger figure of ten million, because white-space really is important for code readability and, well... it's a nice round number.

As with all long-term programming projects, the size of the Linux kernel code base varies over time, as old code is discarded and replaced.

Newer features and functions are constantly being added, though, so the overall size of the Linux kernel continually increases.

Some analyses of the Linux kernel code base using David Wheeler's SLOCCount program yield some interesting facts. (The acronym "SLOC" stands for Source Lines of Code.) It finds only 6,399,191 lines of source code, since it doesn't count blank lines, comments and other input. One breakdown of the code base by SLOCCount comes up with the following figures (percentages are rounded to one decimal place):

TYPE COUNT PER CENT
Drivers 3,301,081 51.6
Architectures 1,258,638 19.7
Filesystems 544,871 8.5
Networking 376,716 5.9
Sound 356,180 5.6
Include 320,078 5.0
Kernel 74,503 1.2
Memory Mgmt 36,312 0.6
Cryptography 32,769 0.5
Security 25,303 0.4
Other 72,780 1.1

Categorisation by language finds that the overwhelming majority of the Linux kernel code is written in ANSI C, at 96.39 per cent, with Assembly Language accounting for almost all of the rest at 3.32 per cent. Other languages used in the kernel source files, in descending order of the number of lines of code, include Perl, C++, Yacc, Sh(ell), Lex, Python, LISP, Pascal and Awk.

More interestingly perhaps, SLOCCount also produces an estimation of the Linux kernel source code's value, that is, what it might cost to redevelop the code base from scratch, using the COCOMO development model.

SLOCCount estimates that it would take a team of over 200 developers about nine and a half years to rewrite the Linux kernel from scratch. Based up a four year old assumption of programmers' average salary level, SLOCCount calculates that would cost nearly $268 million.

Given inflation and adding in management overhead, $500 million might be a fair estimate of what it might actually cost a proprietary software vendor to redevelop Linux.

In fact, thousands of programmers have contributed to developing the Linux kernel, over a period of more than 15 years.

And in terms of what it costs one to download a full Linux distribution, they did for free .


A more detailed version [ original link]

After the release of Linux 2.6.27, kernel developers are currently busily integrating patches for the next kernel version into the main development branch of Linux. This usually involves discarding some old code and adding new code though on balance, there are usually more new lines than old ones, making the kernel grow continually.

In this process, the kernel developers have now passed the 10 million line mark if blank lines, comments and text files are included in a current Git checkout of the Linux source code (find . -type f -not -regex '/.//.git.*' | xargs cat | wc -l). It is also worth noting that the lines of text in source code files as that number has recently passed 9 million (find . -name *.[hcS] -not -regex '/.//.git.*' | xargs cat | wc -l).

Programs like SLOCCount can be used to inspect the Linux kernel's source code in more detail. According to this tool, the source code line count is not 9 million but exactly 6,399,191 (Source Lines of Code/SLOC), as the program doesn't count blank lines, comments and several other types of input. More than half of the lines are part of hardware drivers; the second largest chunk is the arch/ directory which contains the source code of the various architectures supported by Linux.

SLOC    Directory    SLOC-by-Language (Sorted)
3301081 drivers     ansic=3296641,yacc=1680,asm=1136,perl=829,lex=778,sh=17
1258638 arch        ansic=1047549,asm=209655,sh=617,yacc=307,lex=300,awk=96,python=45,pascal=41,perl=28
544871    fs        ansic=544871
376716    net        ansic=376716
356180    sound        ansic=355997,asm=183
320078    include     ansic=318367,cpp=1511,asm=125,pascal=75
74503    kernel        ansic=74198,perl=305
36312    mm        ansic=36312
32729    crypto        ansic=32729
25303    security    ansic=25303
24111    scripts     ansic=14424,perl=4653,cpp=1791,sh=1155,yacc=967,lex=742,python=379
17065    lib        ansic=17065
10723    block        ansic=10723
7616    Documentation    ansic=5615,sh=926,perl=857,lisp=218
5227    ipc        ansic=5227
2622    virt        ansic=2622
2287    init        ansic=2287
1803    firmware    asm=1598,ansic=205
833    samples     ansic=833
493    usr        ansic=491,asm=2
0    top_dir     (none)

According to SLOCCount, 96.4 per cent of the code is written in C and 3.3 percent in Assembler. The other programming languages are only used marginally: Perl, for example, was used for some help scripts during kernel development and only accounts for a tiny 0.1 percent. In the Assembler-heavy architecture directory, SLOCCount also claims to have found 116 lines of Pascal code – but that could well be a misinterpretation by SLOCCount.

Totals grouped by language (dominant language first):
ansic:        6168175 (96.39%)
asm:         212699 (3.32%)
perl:           6672 (0.10%)
cpp:           3302 (0.05%)
yacc:           2954 (0.05%)
sh:           2715 (0.04%)
lex:           1820 (0.03%)
python:     424 (0.01%)
lisp:        218 (0.00%)
pascal:     116 (0.00%)
awk:         96 (0.00%)

SLOCCount also tries to give a rough calculation of the source code's value; according to the program's estimates, it would take more than 200 developers about nine and a half years and cost $267 million to rewrite the code from scratch. Given that the program has not been updated for four years, the accuracy of this calculation is arguable; especially the cost per developer would now surely need to be increased.

Total Physical Source Lines of Code (SLOC)          = 6,399,191
Development Effort Estimate, Person-Years (Person-Months) = 1,983.63
(23,803.60)
(Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Schedule Estimate, Years (Months)              = 9.59 (115.10)
(Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule)  = 206.81
Total Estimated Cost to Develop               = $ 267,961,839
(average salary = $56,286/year, overhead = 2.40).
Generated using David A. Wheeler's 'SLOCCount'

There is no end in sight for kernel growth which has been ongoing in the Linux 2.6 series for several years – with every new version, the kernel hackers extend the Linux kernel further to include new functions and drivers, improving the hardware support or making it more flexible, better or faster. A look at the figures pertaining to the latest kernel versions also shows that it is not only the number of lines of source code which is continually increasing, but also the number of changes per kernel version.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值