segmentseq2seq代码解析-1

segmentseq2seq源代码:https://github.com/pponnada/segmentseq2seq

tensorflow版本为0.12.0时运行OK

datasets原始数据文件有:

1.Financial1.sequences5ss-1024ws-64.bz2
2.Financial1.ss-1024.vocab-5ws-64

从 predict.py 开始:

maybe_download(datadir='datasets', fname=fname, url=origin)

1.判断文件夹是否存在,若不存在就创建一个文件夹:

if not os.path.exists(datadir):
    print("Creating directory %s" % datadir)
    os.mkdir(datadir)

2.得到文件路径:

os.path.join(datadir, fname)

3.根据URL将数据下载到本地:

filepath, _ = urllib.request.urlretrieve(url, filepath)

关于 urlretrieve():

def urlretrieve(url, filename=None, reporthook=None, data=None):
    """
    Retrieve a URL into a temporary location on disk.

    Requires a URL argument. If a filename is passed, it is used as
    the temporary file location. The reporthook argument should be
    a callable that accepts a block number, a read size, and the
    total file size of the URL target. The data argument should be
    valid URL encoded data.

    If a filename is passed and the URL points to a local resource,
    the result is a copy from local file to new file.

    Returns a tuple containing the path to the newly created
    data file as well as the resulting HTTPMessage object.
    """

参数说明:

(1) url,  外部或本地URL

(2) filename=None,  指定了保存本地路径(如果参数未指定,urllib会生成一个临时文件保存数据)

(3) reporthook=None,  一个回调函数,接受三个参数:blocknum 已经下载的数据块数目, bs 数据块大小, size 待下载文件总大小

reporthook(blocknum, bs, size)

一个 reporthook 示例:

def reporthook_sample(blocknum, bs, size):
    ps = 100.0 * blocknum * bs / size
    if ps > 100:
        ps = 100
    print('%.2f%%' % ps)

(4) data=None,一个有效的URL编码数据,被 urlopen() 调用,用来指明发往服务器请求中的额外的信息

urlopen(url, data)

4.关于os.stat(),返回相关文件的系统状态信息

statinfo = os.stat(filepath)

示例:

os.stat('plot.py')
os.stat_result(st_mode=33204, st_ino=1463660, st_dev=2056, st_nlink=1, st_uid=1000, st_gid=1000, st_size=1198, st_atime=1530530109, st_mtime=1492579364, st_ctime=1530530101)

返回值说明:https://docs.python.org/3/library/os.html#os.stat_result

st_mode
st_ino
st_dev
st_nlink
st_uid
st_gid
st_size
st_atime
st_mtime
st_ctime

5.解压文件

import bz2
with bz2.BZ2File(compressed, 'rb') as file:
    with open(uncompressed, 'wb') as new_file:
        for data in iter(lambda: file.read(100 * 1024), b''):
            new_file.write(data)

(1) 关于 class BZ2File 的方法 read():

def read(self, size=-1):
    """Read up to size uncompressed bytes from the file.

    If size is negative or omitted, read until EOF is reached.
    Returns b'' if the file is already at EOF.
    """

如果参数 size 为负值或省略,读到 EOF,并在读到 EOF 后返回 b''。

(2) 关于内置函数 iter():

def iter(source, sentinel=None):  # known special case of iter
    """
    iter(iterable) -> iterator
    iter(callable, sentinel) -> iterator

    Get an iterator from an object.  In the first form, the argument must
    supply its own iterator, or be a sequence.
    In the second form, the callable is called until it returns the sentinel.
    """

传一个参数时,参数是一个 iterable。
传两个参数时,参数 callable 应是一个可调用对象(实例),即定义了 __call__() 方法,此时将调用 callable 直到枚举到的值等于哨兵值。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值