python长格式,在Python中将数据从长格式解析为宽格式

最新推荐文章于 2024-02-19 11:51:46 发布

孟德9413

最新推荐文章于 2024-02-19 11:51:46 发布

阅读量96

点赞数

文章标签： python长格式

I'm wondering what the best way to parse long form data into wide for is in python. I've previously been doing this sort of task in R but it really is taking to long as my files can be upwards of 1 gb. Here is some dummy data:

Sequence Position Strand Score

Gene1 0 + 1

Gene1 1 + 0.25

Gene1 0 - 1

Gene1 1 - 0.5

Gene2 0 + 0

Gene2 1 + 0.1

Gene2 0 - 0

Gene2 1 - 0.5

But I'd like to have it in the wide form where I've summed the scores over the strands at each position. Here is output I hope for:

Sequence 0 1

Gene1 2 0.75

Gene2 0 0.6

Any help on how to attack such a problem conceptually would be really helpful.

解决方案

Both of these solutions seem like overkill when you can do it with pandas in a one-liner:

In [7]: df

Out[7]:

Sequence Position Strand Score

0 Gene1 0 + 1.00

1 Gene1 1 + 0.25

2 Gene1 0 - 1.00

3 Gene1 1 - 0.50

4 Gene2 0 + 0.00

5 Gene2 1 + 0.10

6 Gene2 0 - 0.00

7 Gene2 1 - 0.50

In [8]: df.groupby(['Sequence', 'Position']).Score.sum().unstack('Position')

Out[8]:

Position 0 1

Sequence

Gene1 2 0.75

Gene2 0 0.60

If you cannot load the file into memory then an out-of-core solution in the other answers will work too.

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

孟德9413

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python长格式,在Python中将数据从长格式解析为宽格式

I'm wondering what the best way to parse long form data into wide for is in python. I've previously been doing this sort of task in R but it really is taking to long as my files can be upwards of 1 gb...
复制链接

扫一扫