在数据处理中,可能遇到关于excel数据处理后的数据暂存问题。
原因是,每次运行程序都处理一次的话太慢了,所以需要一个文件来存中间数据。
以一个excel单细胞测序的向量数据处理为例:
首先是相关头文件
import
数据读取及处理:
df = pd.read_csv('./Blood3223_celltype.csv')
df2 = pd.read_csv('./Blood3223_data.csv')
print(df.info())
print(df.head())
breed = df['Cell_type']
breed_np = Series.as_matrix(breed)
print(type(breed_np) )
print(breed_np.shape) #3000
breed_set = set(breed_np)
print(len(breed_set)) #4
breed_4_list = list(breed_set)
dic = {}
for i in range(4):
dic[ breed_4_list[i] ] = i
print(breed_4_list)
print(dic)
cell = Series.as_matrix(df["Cell"])
print(cell.shape)
#1)得到一个single_cell[]: 里面是每个cell的向量表示
single_cell=[]
single_cell=np.array(single_cell)
# for i in range(3000):
# cell_name.append(cell[i])
for i in range(3000):
cell_vector=df2[cell[i]]
cell_vector = Series.as_matrix(cell_vector)
if(single_cell.shape[0]==0):
single_cell=cell_vector.T
else:
single_cell=np.vstack((single_cell,cell_vector.T))
print("single_cell.shape ",single_cell.shape)
#2)得到一个label[] : 里面是每个cell的标签
label=[]
for i in range(3000):
label.append(dic[breed[i]])
label=np.array(label)
print("label.shape ",label.shape)
此时数据已经存在了dic、single_cell、label中。
接下来存入json文件
file1 = './sc.json'
file2 = './label.json'
file3 = './dic.json'
with open(file1,'w', encoding='utf-8') as file:
json.dump(single_cell.tolist(),file)
print("ok")
with open(file2,'w', encoding='utf-8') as file:
json.dump(label.tolist(),file)
print("ok")
with open(file3,'w', encoding='utf-8') as file:
json.dump(dic,file)
print("ok")
![a0c5200cdfd530d8adccaffe61ffec62.png](https://img-blog.csdnimg.cn/img_convert/a0c5200cdfd530d8adccaffe61ffec62.png)
之所以使用.tolist()的原因是,json文件不支持numpy格式存储。
数据提取及还原:
file1 = './sc.json'
file2 = './label.json'
file3 = './dic.json'
with open(file1,'r', encoding='utf-8') as file:
single_cell=json.load(file)
single_cell=np.array(single_cell)
print(single_cell.shape)
with open(file2,'r', encoding='utf-8') as file:
label=json.load(file)
label = np.array(label)
print(label.shape)
with open(file3,'r', encoding='utf-8') as file:
dic=json.load(file)
print(dic)
![02244ff4df8acb5c465c9348d80ad6e9.png](https://img-blog.csdnimg.cn/img_convert/02244ff4df8acb5c465c9348d80ad6e9.png)