python实现爬虫统计学校BBS男女比例(三)数据处理

数据分析

文本数据

得到了以下列字符串开头的文本数据,我们需要进行处理

名称 特性
correct 此id的性别、活动时间都存在
errTime 此id的性别有,活动时间无 (改成noTime可能更好)
unkownsex 此id的性别无法得知
notexist 此id不存在相应用户
httperror 此id由于服务器故障,需要回滚处理

回滚

我们需要对httperror的数据进行再处理

因为代码的原因,具体可见本系列文章(二),会导致文本里面同一个id连续出现几次httperror记录:

<code class="hljs cs has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//httperror265001_266001.txt</span>
<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">265002</span> httperror
<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">265002</span> httperror
<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">265002</span> httperror
<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">265002</span> httperror
<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">265003</span> httperror
<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">265003</span> httperror
<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">265003</span> httperror
<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">265003</span> httperror</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li></ul>

所以我们在代码里要考虑这种情形,不能每一行的id都进行处理,是判断是否重复的id

java里面有缓存方法可以避免频繁读取硬盘上的文件,python其实也有,可以见这篇文章。

<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">main</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">()</span>:</span>
    reload(sys)
    sys.setdefaultencoding(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'utf-8'</span>)
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">global</span> sexRe,timeRe,notexistRe,url1,url2,file1,file2,file3,file4,startNum,endNum,file5
    sexRe = re.compile(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'em>\u6027\u522b</em>(.*?)</li'</span>)
    timeRe = re.compile(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'em>\u4e0a\u6b21\u6d3b\u52a8\u65f6\u95f4</em>(.*?)</li'</span>)
    notexistRe = re.compile(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'(p>)\u62b1\u6b49\uff0c\u60a8\u6307\u5b9a\u7684\u7528\u6237\u7a7a\u95f4\u4e0d\u5b58\u5728<'</span>)
    url1 = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'http://rs.xidian.edu.cn/home.php?mod=space&uid=%s'</span>
    url2 = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'http://rs.xidian.edu.cn/home.php?mod=space&uid=%s&do=profile'</span>
    file1 = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'ruisi\\correct_re.txt'</span>
    file2 = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'ruisi\\errTime_re.txt'</span>
    file3 = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'ruisi\\notexist_re.txt'</span>
    file4 = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'ruisi\\unkownsex_re.txt'</span>
    file5 = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'ruisi\\httperror_re.txt'</span>

    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#遍历文件夹里面以httperror开头的文本</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> filename <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> os.listdir(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">r'E:\pythonProject\ruisi'</span>):
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> filename.startswith(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'httperror'</span>):
            count = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
            newName = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'E:\\pythonProject\\ruisi\\%s'</span> % (filename)
            readFile = open(newName,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'r'</span>)
            oldLine = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'0'</span>
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> line <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> readFile:
                <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#newLine 用来比较是否是重复的id</span>
                newLine =  line
                <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (newLine != oldLine):
                    nu = newLine.split()[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>]
                    oldLine = newLine
                    count += <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
                    searchWeb((int(nu),))
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"%s deal %s lines"</span> %(filename, count)
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li></ul>

本代码为了简便,没有再把httperror的那些id分类,直接存储为下面这5个文件里

<code class="hljs bash has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">    file1 = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'ruisi\\correct_re.txt'</span>
    file2 = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'ruisi\\errTime_re.txt'</span>
    file3 = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'ruisi\\notexist_re.txt'</span>
    file4 = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'ruisi\\unkownsex_re.txt'</span>
    file5 = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'ruisi\\httperror_re.txt'</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li></ul>

可以看下输出Log记录,总共处理了多少个httperror的数据。

<code class="hljs ruby has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"D:\Program Files\Python27\python.exe"</span> <span class="hljs-constant" style="box-sizing: border-box;">E</span><span class="hljs-symbol" style="color: rgb(0, 102, 102); box-sizing: border-box;">:/pythonProject/webCrawler/reload</span>.py
httperror132001-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">133001</span>.txt deal <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">21</span> lines
httperror2001-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3001</span>.txt deal <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span> lines
httperror251001-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">252001</span>.txt deal <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span> lines
httperror254001-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">255001</span>.txt deal <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span> lines</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li></ul>

单线程统计 unkownsex 数据

代码简单,我们利用单线程统计一下unkownsex(由于权限原因无法获取、或者该用户没有填写)的用户。另外,经过我们检查,没有性别的用户也是没有活动时间的。

数据格式如下:

<code class="hljs  has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">253042 unkownsex
253087 unkownsex
253102 unkownsex
253118 unkownsex
253125 unkownsex
253136 unkownsex
253161 unkownsex</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li></ul>
<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> os,time
sumCount = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>

startTime = time.clock()

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> filename <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> os.listdir(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">r'E:\pythonProject\ruisi'</span>):
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> filename.startswith(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'unkownsex'</span>):
        count = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
        newName = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'E:\\pythonProject\\ruisi\\%s'</span> % (filename)
        readFile = open(newName,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'r'</span>)
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> line <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> open(newName):
            count += <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
            sumCount +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"%s deal %s lines"</span> %(filename, count)
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'%s unkowns sex'</span> %(sumCount)

endTime = time.clock()
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"cost time "</span> + str(endTime - startTime) + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">" s"</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li></ul>

处理速度很快,输出如下:

<code class="hljs livecodeserver has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">unkownsex1-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1001.</span>txt deal <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">204</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">lines</span>
unkownsex100001-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">101001.</span>txt deal <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">50</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">lines</span>
unkownsex10001-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">11001.</span>txt deal <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">206</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">lines</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#...省略中间输出信息</span>
unkownsex99001-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">100001.</span>txt deal <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">56</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">lines</span>
unkownsex_re.txt deal <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1085</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">lines</span>
<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">14223</span> unkowns sex
cost <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">time</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.0813142301261</span> s</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li></ul>

单线程统计 correct 数据

数据格式如下:

<code class="hljs css has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">31024 男 2014<span class="hljs-tag" style="color: rgb(0, 0, 0); box-sizing: border-box;">-11-11</span> 13<span class="hljs-pseudo" style="color: rgb(0, 0, 0); box-sizing: border-box;">:20</span>
31283 男 2013<span class="hljs-tag" style="color: rgb(0, 0, 0); box-sizing: border-box;">-3-25</span> 19<span class="hljs-pseudo" style="color: rgb(0, 0, 0); box-sizing: border-box;">:41</span>
31340 保密 2015<span class="hljs-tag" style="color: rgb(0, 0, 0); box-sizing: border-box;">-2-2</span> 15<span class="hljs-pseudo" style="color: rgb(0, 0, 0); box-sizing: border-box;">:17</span>
31427 保密 2014<span class="hljs-tag" style="color: rgb(0, 0, 0); box-sizing: border-box;">-8-10</span> 09<span class="hljs-pseudo" style="color: rgb(0, 0, 0); box-sizing: border-box;">:17</span>
31475 保密 2013<span class="hljs-tag" style="color: rgb(0, 0, 0); box-sizing: border-box;">-7-2</span> 08<span class="hljs-pseudo" style="color: rgb(0, 0, 0); box-sizing: border-box;">:59</span>
31554 保密 2014<span class="hljs-tag" style="color: rgb(0, 0, 0); box-sizing: border-box;">-10-17</span> 17<span class="hljs-pseudo" style="color: rgb(0, 0, 0); box-sizing: border-box;">:02</span>
31621 男 2015<span class="hljs-tag" style="color: rgb(0, 0, 0); box-sizing: border-box;">-5-16</span> 19<span class="hljs-pseudo" style="color: rgb(0, 0, 0); box-sizing: border-box;">:27</span>
31872 保密 2015<span class="hljs-tag" style="color: rgb(0, 0, 0); box-sizing: border-box;">-1-11</span> 16<span class="hljs-pseudo" style="color: rgb(0, 0, 0); box-sizing: border-box;">:49</span>
31915 保密 2014<span class="hljs-tag" style="color: rgb(0, 0, 0); box-sizing: border-box;">-5-4</span> 11<span class="hljs-pseudo" style="color: rgb(0, 0, 0); box-sizing: border-box;">:01</span>
31997 保密 2015<span class="hljs-tag" style="color: rgb(0, 0, 0); box-sizing: border-box;">-5-16</span> 20<span class="hljs-pseudo" style="color: rgb(0, 0, 0); box-sizing: border-box;">:14</span>
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li></ul>

代码如下,实现思路就是一行一行读取,利用line.split()获取性别信息。sumCount 是统计一个多少人,boycount 、girlcount 、secretcount 分别统计男、女、保密的人数。我们还是利用unicode进行正则匹配。

<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> os,sys,time
reload(sys)
sys.setdefaultencoding(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'utf-8'</span>)
startTime = time.clock()
sumCount = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
boycount = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
girlcount = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
secretcount = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> filename <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> os.listdir(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">r'E:\pythonProject\ruisi'</span>):
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> filename.startswith(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'correct'</span>):
        newName = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'E:\\pythonProject\\ruisi\\%s'</span> % (filename)
        readFile = open(newName,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'r'</span>)
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> line <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> readFile:
            sexInfo = line.split()[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]
            sumCount +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u7537'</span> :
                boycount += <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">elif</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u5973'</span>:
                girlcount +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">elif</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u4fdd\u5bc6'</span>:
                secretcount +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"until %s, sum is %s boys; %s girls; %s secret;"</span> %(filename, boycount,girlcount,secretcount)
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"total is %s;  %s boys; %s girls; %s secret;"</span> %(sumCount, boycount,girlcount,secretcount)
endTime = time.clock()
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"cost time "</span> + str(endTime - startTime) + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">" s"</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li></ul>

注意,我们输出的是截止某个文件的统计信息,而不是单个文件的统计情况。输出结果如下:

<code class="hljs applescript has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">until</span> correct1-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1001.</span>txt, sum <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">110</span> boys; <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">7</span> girls; <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">414</span> secret;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">until</span> correct100001-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">101001.</span>txt, sum <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">125</span> boys; <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">13</span> girls; <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">542</span> secret;
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#...省略</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">until</span> correct99001-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">100001.</span>txt, sum <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">11070</span> boys; <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3113</span> girls; <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">26636</span> secret;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">until</span> correct_re.txt, sum <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">13937</span> boys; <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4007</span> girls; <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">28941</span> secret;
total <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">46885</span>;  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">13937</span> boys; <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4007</span> girls; <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">28941</span> secret;
cost <span class="hljs-property" style="box-sizing: border-box;">time</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3.60047888495</span> s</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li></ul>

多线程统计数据

为了更快统计,我们可以利用多线程。 
作为对比,我们试下单线程需要的时间。

<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># encoding: UTF-8</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> threading
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> time,os,sys

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#全局变量</span>
SUM = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
BOY = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
GIRL = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
SECRET = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
NUM =<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#本来继承自threading.Thread,覆盖run()方法,用start()启动线程</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#这和java里面很像</span>
<span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">StaFileList</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(threading.Thread)</span>:</span>
    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#文本名称列表</span>
    fileList = []

    <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">__init__</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(self, fileList)</span>:</span>
        threading.Thread.__init__(self)
        self.fileList = fileList

    <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">run</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(self)</span>:</span>
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">global</span> SUM, BOY, GIRL, SECRET
        <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#可以加上个耗时时间,这样多线程更加明显,而不是顺序的thread-1,2,3</span>
        <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#time.sleep(1)</span>
        <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#acquire获取锁</span>
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> mutex.acquire(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>):
            self.staFiles(self.fileList)
            <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#release释放锁</span>
            mutex.release()

    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#处理输入的files列表,统计男女人数</span>
    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#注意这儿数据同步问题,global使用全局变量</span>
    <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">staFiles</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(self, files)</span>:</span>
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">global</span> SUM, BOY, GIRL, SECRET
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> name <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span>  files:
            newName = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'E:\\pythonProject\\ruisi\\%s'</span> % (name)
            readFile = open(newName,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'r'</span>)
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> line <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> readFile:
                sexInfo = line.split()[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]
                SUM +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
                <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u7537'</span> :
                    BOY += <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
                <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">elif</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u5973'</span>:
                    GIRL +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
                <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">elif</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u4fdd\u5bc6'</span>:
                    SECRET +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
            <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># print "thread %s, until %s, total is %s; %s boys; %s girls;" \</span>
            <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#       " %s secret;" %(self.name, name, SUM, BOY,GIRL,SECRET)</span>


<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">test</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">()</span>:</span>
    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#files保存多个文件,可以设定一个线程处理多少个文件</span>
    files = []

    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#用来保存所有的线程,方便最后主线程等待所以子线程结束</span>
    staThreads = []
    i = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> filename <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> os.listdir(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">r'E:\pythonProject\ruisi'</span>):
        <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#没获取10个文本,就创建一个线程</span>
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> filename.startswith(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'correct'</span>):
            files.append(filename)
            i+=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
            <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#一个线程处理20个文件</span>
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> i == <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">20</span> :
                staThreads.append(StaFileList(files))
                files = []
                i = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#最后剩余的files,很可能长度不足10个</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> files:
        staThreads.append(StaFileList(files))

    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> t <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> staThreads:
        t.start()
    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 主线程中等待所有子线程退出,如果不加这个,速度更快些?</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> t <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> staThreads:
        t.join()



<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> __name__ == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'__main__'</span>:
    reload(sys)
    sys.setdefaultencoding(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'utf-8'</span>)
    startTime = time.clock()
    mutex = threading.Lock()
    test()
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"Multi Thread, total is %s;  %s boys; %s girls; %s secret;"</span> %(SUM, BOY,GIRL,SECRET)
    endTime = time.clock()
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"cost time "</span> + str(endTime - startTime) + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">" s"</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li><li style="box-sizing: border-box; padding: 0px 5px;">59</li><li style="box-sizing: border-box; padding: 0px 5px;">60</li><li style="box-sizing: border-box; padding: 0px 5px;">61</li><li style="box-sizing: border-box; padding: 0px 5px;">62</li><li style="box-sizing: border-box; padding: 0px 5px;">63</li><li style="box-sizing: border-box; padding: 0px 5px;">64</li><li style="box-sizing: border-box; padding: 0px 5px;">65</li><li style="box-sizing: border-box; padding: 0px 5px;">66</li><li style="box-sizing: border-box; padding: 0px 5px;">67</li><li style="box-sizing: border-box; padding: 0px 5px;">68</li><li style="box-sizing: border-box; padding: 0px 5px;">69</li><li style="box-sizing: border-box; padding: 0px 5px;">70</li><li style="box-sizing: border-box; padding: 0px 5px;">71</li><li style="box-sizing: border-box; padding: 0px 5px;">72</li><li style="box-sizing: border-box; padding: 0px 5px;">73</li><li style="box-sizing: border-box; padding: 0px 5px;">74</li><li style="box-sizing: border-box; padding: 0px 5px;">75</li><li style="box-sizing: border-box; padding: 0px 5px;">76</li><li style="box-sizing: border-box; padding: 0px 5px;">77</li><li style="box-sizing: border-box; padding: 0px 5px;">78</li><li style="box-sizing: border-box; padding: 0px 5px;">79</li><li style="box-sizing: border-box; padding: 0px 5px;">80</li><li style="box-sizing: border-box; padding: 0px 5px;">81</li><li style="box-sizing: border-box; padding: 0px 5px;">82</li><li style="box-sizing: border-box; padding: 0px 5px;">83</li><li style="box-sizing: border-box; padding: 0px 5px;">84</li><li style="box-sizing: border-box; padding: 0px 5px;">85</li><li style="box-sizing: border-box; padding: 0px 5px;">86</li><li style="box-sizing: border-box; padding: 0px 5px;">87</li><li style="box-sizing: border-box; padding: 0px 5px;">88</li><li style="box-sizing: border-box; padding: 0px 5px;">89</li></ul>

输出

<code class="hljs vhdl has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">Multi Thread, total <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">46885</span>;  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">13937</span> boys; <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4007</span> girls; <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">28941</span> secret;
cost <span class="hljs-typename" style="color: rgb(102, 0, 102); box-sizing: border-box;">time</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.132137192794</span> s</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>

我们发现时间和单线程差不多。因为这儿涉及到线程同步问题,获取锁和释放锁都是需要时间开销的,线程间切换保存中断和恢复中断也都是需要时间开销的。

较多数据的单线程和多线程对比

我们可以对correct、errTime 、unkownsex的文本都进行处理。 
单线程代码

<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># coding=utf-8</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> os,sys,time
reload(sys)
sys.setdefaultencoding(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'utf-8'</span>)
startTime = time.clock()
sumCount = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
boycount = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
girlcount = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
secretcount = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
unkowncount = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> filename <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> os.listdir(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">r'E:\pythonProject\ruisi'</span>):
    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 有性别、活动时间</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> filename.startswith(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'correct'</span>) :
        newName = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'E:\\pythonProject\\ruisi\\%s'</span> % (filename)
        readFile = open(newName,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'r'</span>)
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> line <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> readFile:
            sexInfo =line.split()[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]
            sumCount +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u7537'</span> :
                boycount += <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">elif</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u5973'</span>:
                girlcount +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">elif</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u4fdd\u5bc6'</span>:
                secretcount +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
        <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># print "until %s, sum is %s boys; %s girls; %s secret;" %(filename, boycount,girlcount,secretcount)</span>
    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#没有活动时间,但是有性别</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">elif</span> filename.startswith(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"errTime"</span>):
        newName = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'E:\\pythonProject\\ruisi\\%s'</span> % (filename)
        readFile = open(newName,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'r'</span>)
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> line <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> readFile:
            sexInfo =line.split()[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]
            sumCount +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u7537'</span> :
                boycount += <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">elif</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u5973'</span>:
                girlcount +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">elif</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u4fdd\u5bc6'</span>:
                secretcount +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
        <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># print "until %s, sum is %s boys; %s girls; %s secret;" %(filename, boycount,girlcount,secretcount)</span>
    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#没有性别,也没有时间,直接统计行数</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">elif</span> filename.startswith(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"unkownsex"</span>):
        newName = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'E:\\pythonProject\\ruisi\\%s'</span> % (filename)
        <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># count = len(open(newName,'rU').readlines())</span>
        <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#对于大文件用循环方法,count 初始值为 -1 是为了应对空行的情况,最后+1得到0行</span>
        count = -<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> count, line <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> enumerate(open(newName, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'rU'</span>)):
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">pass</span>
        count += <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
        unkowncount += count
        sumCount += count
        <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># print "until %s, sum is %s unkownsex" %(filename, unkowncount)</span>



<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"Single Thread, total is %s;  %s boys; %s girls; %s secret; %s unkownsex;"</span> %(sumCount, boycount,girlcount,secretcount,unkowncount)
endTime = time.clock()
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"cost time "</span> + str(endTime - startTime) + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">" s"</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li></ul>

输出为

<code class="hljs vbnet has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">Single</span> Thread, total <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">61111</span>;  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">13937</span> boys; <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4009</span> girls; <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">28942</span> secret; <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">14223</span> unkownsex;
cost time <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.37444645628</span> s</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>

多线程代码

<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">__author__ = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'admin'</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># encoding: UTF-8</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#多线程处理程序</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> threading
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> time,os,sys

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#全局变量</span>
SUM = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
BOY = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
GIRL = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
SECRET = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
UNKOWN = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>

<span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">StaFileList</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(threading.Thread)</span>:</span>
    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#文本名称列表</span>
    fileList = []

    <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">__init__</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(self, fileList)</span>:</span>
        threading.Thread.__init__(self)
        self.fileList = fileList

    <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">run</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(self)</span>:</span>
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">global</span> SUM, BOY, GIRL, SECRET
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> mutex.acquire(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>):
            self.staManyFiles(self.fileList)
            mutex.release()

    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#处理输入的files列表,统计男女人数</span>
    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#注意这儿数据同步问题</span>
    <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">staCorrectFiles</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(self, files)</span>:</span>
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">global</span> SUM, BOY, GIRL, SECRET
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> name <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span>  files:
            newName = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'E:\\pythonProject\\ruisi\\%s'</span> % (name)
            readFile = open(newName,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'r'</span>)
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> line <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> readFile:
                sexInfo = line.split()[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]
                SUM +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
                <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u7537'</span> :
                    BOY += <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
                <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">elif</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u5973'</span>:
                    GIRL +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
                <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">elif</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u4fdd\u5bc6'</span>:
                    SECRET +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
            <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># print "thread %s, until %s, total is %s; %s boys; %s girls;" \</span>
            <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#       " %s secret;" %(self.name, name, SUM, BOY,GIRL,SECRET)</span>

    <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">staManyFiles</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(self, files)</span>:</span>
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">global</span> SUM, BOY, GIRL, SECRET,UNKOWN
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> name <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span>  files:
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> name.startswith(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'correct'</span>) :
                newName = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'E:\\pythonProject\\ruisi\\%s'</span> % (name)
                readFile = open(newName,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'r'</span>)
                <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> line <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> readFile:
                    sexInfo = line.split()[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]
                    SUM +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
                    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u7537'</span> :
                        BOY += <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
                    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">elif</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u5973'</span>:
                        GIRL +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
                    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">elif</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u4fdd\u5bc6'</span>:
                        SECRET +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
                <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># print "thread %s, until %s, total is %s; %s boys; %s girls;" \</span>
                <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#       " %s secret;" %(self.name, name, SUM, BOY,GIRL,SECRET)</span>
            <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#没有活动时间,但是有性别</span>
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">elif</span> name.startswith(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"errTime"</span>):
                newName = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'E:\\pythonProject\\ruisi\\%s'</span> % (name)
                readFile = open(newName,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'r'</span>)
                <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> line <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> readFile:
                    sexInfo = line.split()[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]
                    SUM +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
                    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u7537'</span> :
                        BOY += <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
                    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">elif</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u5973'</span>:
                        GIRL +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
                    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">elif</span> sexInfo == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">u'\u4fdd\u5bc6'</span>:
                        SECRET +=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
                <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># print "thread %s, until %s, total is %s; %s boys; %s girls;" \</span>
                <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#       " %s secret;" %(self.name, name, SUM, BOY,GIRL,SECRET)</span>
            <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#没有性别,也没有时间,直接统计行数</span>
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">elif</span> name.startswith(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"unkownsex"</span>):
                newName = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'E:\\pythonProject\\ruisi\\%s'</span> % (name)
                <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># count = len(open(newName,'rU').readlines())</span>
                <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#对于大文件用循环方法,count 初始值为 -1 是为了应对空行的情况,最后+1得到0行</span>
                count = -<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
                <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> count, line <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> enumerate(open(newName, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'rU'</span>)):
                    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">pass</span>
                count += <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
                UNKOWN += count
                SUM += count
                <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># print "thread %s, until %s, total is %s; %s unkownsex" %(self.name, name, SUM, UNKOWN)</span>


<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">test</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">()</span>:</span>
    files = []
    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#用来保存所有的线程,方便最后主线程等待所以子线程结束</span>
    staThreads = []
    i = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> filename <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> os.listdir(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">r'E:\pythonProject\ruisi'</span>):
        <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#没获取10个文本,就创建一个线程</span>
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> filename.startswith(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"correct"</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">or</span> filename.startswith(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"errTime"</span>)  <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">or</span> filename.startswith(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"unkownsex"</span>):
            files.append(filename)
            i+=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> i == <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">20</span> :
                staThreads.append(StaFileList(files))
                files = []
                i = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#最后剩余的files,很可能长度不足10个</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> files:
        staThreads.append(StaFileList(files))

    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> t <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> staThreads:
        t.start()
    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 主线程中等待所有子线程退出</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> t <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> staThreads:
        t.join()



<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> __name__ == <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'__main__'</span>:
    reload(sys)
    sys.setdefaultencoding(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'utf-8'</span>)
    startTime = time.clock()
    mutex = threading.Lock()
    test()
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"Multi Thread, total is %s;  %s boys; %s girls; %s secret; %s unkownsex"</span> %(SUM, BOY,GIRL,SECRET,UNKOWN)
    endTime = time.clock()
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"cost time "</span> + str(endTime - startTime) + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">" s"</span>
    endTime = time.clock()
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"cost time "</span> + str(endTime - startTime) + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">" s"</span>
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li><li style="box-sizing: border-box; padding: 0px 5px;">59</li><li style="box-sizing: border-box; padding: 0px 5px;">60</li><li style="box-sizing: border-box; padding: 0px 5px;">61</li><li style="box-sizing: border-box; padding: 0px 5px;">62</li><li style="box-sizing: border-box; padding: 0px 5px;">63</li><li style="box-sizing: border-box; padding: 0px 5px;">64</li><li style="box-sizing: border-box; padding: 0px 5px;">65</li><li style="box-sizing: border-box; padding: 0px 5px;">66</li><li style="box-sizing: border-box; padding: 0px 5px;">67</li><li style="box-sizing: border-box; padding: 0px 5px;">68</li><li style="box-sizing: border-box; padding: 0px 5px;">69</li><li style="box-sizing: border-box; padding: 0px 5px;">70</li><li style="box-sizing: border-box; padding: 0px 5px;">71</li><li style="box-sizing: border-box; padding: 0px 5px;">72</li><li style="box-sizing: border-box; padding: 0px 5px;">73</li><li style="box-sizing: border-box; padding: 0px 5px;">74</li><li style="box-sizing: border-box; padding: 0px 5px;">75</li><li style="box-sizing: border-box; padding: 0px 5px;">76</li><li style="box-sizing: border-box; padding: 0px 5px;">77</li><li style="box-sizing: border-box; padding: 0px 5px;">78</li><li style="box-sizing: border-box; padding: 0px 5px;">79</li><li style="box-sizing: border-box; padding: 0px 5px;">80</li><li style="box-sizing: border-box; padding: 0px 5px;">81</li><li style="box-sizing: border-box; padding: 0px 5px;">82</li><li style="box-sizing: border-box; padding: 0px 5px;">83</li><li style="box-sizing: border-box; padding: 0px 5px;">84</li><li style="box-sizing: border-box; padding: 0px 5px;">85</li><li style="box-sizing: border-box; padding: 0px 5px;">86</li><li style="box-sizing: border-box; padding: 0px 5px;">87</li><li style="box-sizing: border-box; padding: 0px 5px;">88</li><li style="box-sizing: border-box; padding: 0px 5px;">89</li><li style="box-sizing: border-box; padding: 0px 5px;">90</li><li style="box-sizing: border-box; padding: 0px 5px;">91</li><li style="box-sizing: border-box; padding: 0px 5px;">92</li><li style="box-sizing: border-box; padding: 0px 5px;">93</li><li style="box-sizing: border-box; padding: 0px 5px;">94</li><li style="box-sizing: border-box; padding: 0px 5px;">95</li><li style="box-sizing: border-box; padding: 0px 5px;">96</li><li style="box-sizing: border-box; padding: 0px 5px;">97</li><li style="box-sizing: border-box; padding: 0px 5px;">98</li><li style="box-sizing: border-box; padding: 0px 5px;">99</li><li style="box-sizing: border-box; padding: 0px 5px;">100</li><li style="box-sizing: border-box; padding: 0px 5px;">101</li><li style="box-sizing: border-box; padding: 0px 5px;">102</li><li style="box-sizing: border-box; padding: 0px 5px;">103</li><li style="box-sizing: border-box; padding: 0px 5px;">104</li><li style="box-sizing: border-box; padding: 0px 5px;">105</li><li style="box-sizing: border-box; padding: 0px 5px;">106</li><li style="box-sizing: border-box; padding: 0px 5px;">107</li><li style="box-sizing: border-box; padding: 0px 5px;">108</li><li style="box-sizing: border-box; padding: 0px 5px;">109</li><li style="box-sizing: border-box; padding: 0px 5px;">110</li><li style="box-sizing: border-box; padding: 0px 5px;">111</li><li style="box-sizing: border-box; padding: 0px 5px;">112</li><li style="box-sizing: border-box; padding: 0px 5px;">113</li><li style="box-sizing: border-box; padding: 0px 5px;">114</li><li style="box-sizing: border-box; padding: 0px 5px;">115</li><li style="box-sizing: border-box; padding: 0px 5px;">116</li><li style="box-sizing: border-box; padding: 0px 5px;">117</li><li style="box-sizing: border-box; padding: 0px 5px;">118</li><li style="box-sizing: border-box; padding: 0px 5px;">119</li><li style="box-sizing: border-box; padding: 0px 5px;">120</li><li style="box-sizing: border-box; padding: 0px 5px;">121</li><li style="box-sizing: border-box; padding: 0px 5px;">122</li><li style="box-sizing: border-box; padding: 0px 5px;">123</li><li style="box-sizing: border-box; padding: 0px 5px;">124</li><li style="box-sizing: border-box; padding: 0px 5px;">125</li><li style="box-sizing: border-box; padding: 0px 5px;">126</li><li style="box-sizing: border-box; padding: 0px 5px;">127</li><li style="box-sizing: border-box; padding: 0px 5px;">128</li><li style="box-sizing: border-box; padding: 0px 5px;">129</li><li style="box-sizing: border-box; padding: 0px 5px;">130</li></ul>

输出为

<code class="hljs vhdl has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">Multi Thread, total <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">61111</span>;  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">13937</span> boys; <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4009</span> girls; <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">28942</span> secret;
cost <span class="hljs-typename" style="color: rgb(102, 0, 102); box-sizing: border-box;">time</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.23049112201</span> s</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>

可以看出多线程还是优于单线程的,由于使用的同步,数据统计是一直的。

总结思考

注意python在类内部经常需要加上self,这点和java区别很大。

<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">    <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">__init__</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(self, fileList)</span>:</span>
        threading.Thread.__init__(self)
        self.fileList = fileList

    <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">run</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(self)</span>:</span>
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">global</span> SUM, BOY, GIRL, SECRET
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> mutex.acquire(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>):
            <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#调用类内部方法需要加self</span>
            self.staFiles(self.fileList)
            mutex.release()</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li></ul>

参考文章

python多线程编程(3): 使用互斥锁同步线程

<code class="hljs vhdl has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">total <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">61111</span>;  <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">13937</span> boys; <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4009</span> girls; <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">28942</span> secret; <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">14223</span> unkownsex;
cost <span class="hljs-typename" style="color: rgb(102, 0, 102); box-sizing: border-box;">time</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.25413238673</span> s</code>
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值