哈哈哈,我又来更新了。我觉得tensorflow的数据处理部分不是很好写,总是会报错。然后在google上搜了好多博客,但是奥朵都是有关图片数据的加载。但是我想处理的不是图片数据,而是纯数据。于是我就在想怎么进行数据加载。作为我这个python小白感到有点苦恼。因为python的变量是动态的,因此在idea或者是pycharm中并不能查看变量的类型,那就是说我定义的对或者是错都不知道。emmm很苦恼。之前用的都是java和scala,所以都是可以有相应提示的。因此对于python我只能用比较笨拙的方法,那就是每次写一部分就run一次。好了进入正题,我想处理的数据类型如下:
1 1:-0.7937703439307248 2:-0.9460021126531705 3:-2.6606234773457653 4:-0.3341277824253517 5:-2.739185001234046 6:-2.7652641452266127 7:-4.321993443413369 8:-5.177811450323089 9:-2.5923344734086626 10:-2.5951276537463417 11:51.693163975286474 12:51.77945311353419 13:0.60
6507226370301 14:1.5271276962835107 15:63.698610658949704 16:53.879831310695465 17:-0.42439982553883354 18:-0.18507459505253876
1.按行读取数据,用split进行划分成字符串的形式,提取出label,features。分别存成list的形式
2.然后进行每行提取出来的label,features。存到featuresList和labelList中。
3.返回featuresList和labelList。然后进行格式转化,转化成Tensor的形式。在进行数据处理。
好啦,下面上代码:
#-*- coding:utf-8 -*-
import tensorflow as tf
import numpy as np
from numpy.random import RandomState
batch_size = 100
w1 = tf.Variable(tf.random_normal([18,8],stddev=1,seed=1)) #第一层权重初始化,方差为1
w2 = tf.Variable(tf.random_normal([8,1],stddev=1,seed=1))
x = tf.placeholder(tf.float32,shape=(None, 18),name="x-input")
y_ = tf.placeholder(tf.float32,shape=(None, 1),name="y-input")
a = tf.matmul(x,w1)
y = tf.matmul(a,w2)
cross_entropy = -tf.reduce_mean(
y * tf.log(tf.clip_by_value(y, 1e-10, 1.0)))
train_step = tf.train.AdamOptimizer(0.001).minimize(cross_entropy)
from pyhdfs import HdfsClient
client = HdfsClient(hosts = '172.18.11.72:50070')
print(client.list_status('/'))
# file = client.open("/scaler.0.2/part-00000")
count = 0
# 生成数据文件
def create_file(path):
write = tf.python_io.TFRecordWriter('train.tfrecords')
with open(path,'r') as file:
lines = file.readlines()
# print lines.__len__()
count = 0
data = []
featuresList = []
labelList = []
for line in lines:
word = line.split(" ")
features = []
label=[]
for i in range(1, len(word)):
if i < (len(word) - 1):
features.append(word[i].split(":")[1])
else:
features.append(word[len(word) - 1].split(":")[1].split("\n")[0])
label.append(int(word[0]))
count = count + 1
print count
featuresList.append(features)
labelList.append(label)
data.append(featuresList)
data.append(labelList)
write.close()
return data[0],data[1]
def read_and_decode(filename):
filename_queue = tf.train.string_input_producer([filename])
reader = tf.TFRecordReader()
_,serialized_example = reader.read(filename_queue)
features = tf.parse_single_example(
serialized_example,
features= {
'label':tf.FixedLenFeature([],tf.int64),
'features':tf.FixedLenFeature([],tf.string)
}
)
label = features['label']
feature = features['features']
label = tf.cast(label, tf.float32)
feature = tf.cast(feature, tf.float32)
return feature,label
data = create_file("/home/wangrui/important/data.scaler.svm.0.2/part-00000")
test = create_file("/home/wangrui/important/data.scaler.svm.0.2/part-00000")
d = tf.convert_to_tensor(data[0])#训练集
d1 = tf.convert_to_tensor(data[1])
dtest = tf.convert_to_tensor(test[0]) #测试集
d1test = tf.convert_to_tensor(test[1])
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
print "itN:",i
dd = sess.run(d)
dd1 = sess.run(d1)
ddtest = sess.run(dtest)
dd1test = sess.run(d1test)
sess.run(train_step, feed_dict={x: dd,y_: dd1})
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
print correct_prediction
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print sess.run(accuracy, feed_dict={x: ddtest, y_: dd1test})