扩充mnist_loader

由于之前已经把要扩充的数据保存为变量,只要读进来,再合在一起即可,我当时只求快点搞好,记得list可以直接用加号增加数据,又记得list和array数据可以互转,就用了这个效率很低的办法:把array数据转成list 添加符号数据之后,再转为list返回,代码如下

 

def load_data():

   f = gzip.open('../data/mnist.pkl.gz', 'rb')
    training_data, validation_data, test_data = cPickle.load(f)
    f.close()

    print ('reading...')

    temp1 = saveAndReadVar.loadVar('plus5000.data')

    temp2 = saveAndReadVar.loadVar('sub6500.data')

    temp3 = saveAndReadVar.loadVar('mul5000.data')

    temp4 = saveAndReadVar.loadVar('div4000.data')

    #temp5 = saveAndReadVar.loadVar('(6500.data')

    #temp6 = saveAndReadVar.loadVar(')6500.data')

    #training_data1 is the mnist data,plus5000,sub6500... is the data I add

    print ('tolist')

    plus5000 = temp1.tolist()

sub6500 = temp2.tolist()

mul5000 = temp3.tolist()

div4000 = temp4.tolist()

#left6500 = temp5.tolist()

#right6500 = temp6.tolist()

 

t3 = [[] for i in range(50000)]

t4 = [[] for i in range(50000)]

training_data1=[t3,t4]

 

  training_data1[0] = training_data[0].tolist()

training_data1[1] = training_data[1].tolist()

#split the data I add

print ('spit')

plus4k = plus5000[0:4000]

plus1k = plus5000[4000:5000]

 

sub5p5k = sub6500[0:5500]

sub1k = sub6500[5500:6500]

 

mul4k = mul5000[0:4000]

mul1k = mul5000[4000:5000]

 

div3k = div4000[0:3000]

div1k = div4000[3000:4000]

 

#left5p5k = left6500[0:5500]

#left1k = left6500[5500:6500]

 

#right5p5k = right6500[0:5500]

#right1k = right6500[5500:6500]

 

#generate the label data for the data I add

print ('generate')

pluslabel4k = [10 for i in range(4000)]

pluslabel1k = [10 for i in range(1000)]

 

sublabel5p5k = [11 for i in range(5500)]

sublabel1k = [11 for i in range(1000)]

 

mullabel4k = [12 for i in range(4000)]

mullabel1k = [12 for i in range(1000)]

 

divlabel3k = [13 for i in range(3000)]

divlabel1k = [13 for i in range(1000)]

 

#leftlabel5p5k = [14 for i in range(5500)]

#leftlabel1k = [14 for i in range(1000)]

 

#rightlabel5p5k = [15 for i in range(5500)]

#rightlabel1k = [15 for i in range(1000)]

 

#compound the data

print('compound')

training_data1[0] = training_data1[0]+plus4k+sub5p5k+mul4k+div3k#+left5p5k+right5p5k

training_data1[1] = training_data1[1]+pluslabel4k+sublabel5p5k+mullabel4k+divlabel3k#+leftlabel5p5k+rightlabel5p5k

 

test_data1 = [t3,t4]

test_data1[0] = test_data[0].tolist()+plus1k+sub1k+mul1k+div1k#+left1k+right1k

test_data1[1] = test_data[1].tolist()+pluslabel1k+sublabel1k+mullabel1k+divlabel1k#+leftlabel1k+rightlabel1k

 

print('array')

mytraining_data=(np.array(training_data1[0]),np.array(training_data1[1]))

mytest_data = (np.array(test_data1[0]),np.array(test_data1[1]))

 

return (mytraining_data, validation_data, mytest_data)

 

再修改下最后一个函数里的np.zeros里的数字即可,10改为实际要是别的符号个数,如上面我添加后总共有14个数据(本来16个,我后来把左右括号注释掉了),这里就改为14.

特别提下,训练数据背景得搞成一样,比如mnist数据集背景均为0,那扩充的数据也得处理成0而不是255. 导入数据可能会出现memory error,我是通过加大虚拟机内存为3g解决的

 

发布了80 篇原创文章 · 获赞 21 · 访问量 4万+
展开阅读全文

没有更多推荐了,返回首页

©️2019 CSDN 皮肤主题: 编程工作室 设计师: CSDN官方博客

分享到微信朋友圈

×

扫一扫,手机浏览