python导入data文件_使用Python导入包含文本和数字数据的文件

最新推荐文章于 2023-09-22 10:18:33 发布

weixin_40009207

最新推荐文章于 2023-09-22 10:18:33 发布

阅读量256

点赞数

文章标签： python导入data文件

I have a .txt file which has text data and numerical data. The first two rows of the file have essential information in text data form, while the first column (I am referring to the zeroth column as the first column) also has essential data in text form. At all other locations in the file, the data is in numerical form. I wish to analyze the numerical data present in the file using libraries in python ,preferably numpy or pandas, or a combination of both (analysis like regression, correlation, scikit-learn etc). I reiterate that all of the data in the file is essential for my analysis. The following snapshot (taken from Excel) shows a truncated version of the format in which my data is in:

The data shown in this snapshot can be found here.

In particular, what I want is to be able to import all the numerical data from this file using python (numpy or pandas), and be able to refer to specific rows in this data using the text data in the first two rows (Type, Tag) and the first column (object number). In my actual data file, I have hundreds of thousands of rows (object types) and scores of columns.

I have already tried using numpy.loadtxt(...) and pandas.read_csv(...) to open this file, but I have either run into errors, or have loaded data in clumsy formats. I will be really thankful to have some direction as to how I can import the file in python in a way so that I have the functionality that I desire.

解决方案

If I were you, I would use pandas, and import it using something like this:

df = pd.read_csv('dum.txt',sep='\t', header=[0,1], index_col=0)

This gives you the dataframe:

>>> df

Type T1 T2 T3 T4 T5

Tag Good Good Good Good Good

object1 1.1 2.1 3.1 4.1 5.1

object2 1.2 2.2 3.2 4.2 5.2

object3 1.3 2.3 3.3 4.3 5.3

object4 1.4 2.4 3.4 4.4 5.4

object5 1.5 2.5 3.5 4.5 5.5

object6 1.6 2.6 3.6 4.6 5.6

object7 1.7 2.7 3.7 4.7 5.7

object8 1.8 2.8 3.8 4.8 5.8

And all of your columns are floats:

>>> df.dtypes

Type Tag

T1 Good float64

T2 Good float64

T3 Good float64

T4 Good float64

T5 Good float64

dtype: object

It contains a multi-indexed column header:

>>> df.columns

MultiIndex(levels=[['T1', 'T2', 'T3', 'T4', 'T5'], ['Good']],

labels=[[0, 1, 2, 3, 4], [0, 0, 0, 0, 0]],

names=['Type', 'Tag'])

And a regular index containing the information from Type:

>>> df.index

Index(['object1', 'object2', 'object3', 'object4', 'object5', 'object6',

'object7', 'object8'],

dtype='object')

Furthermore, you can convert your values to a numpy array of floats simply by using:

>>> df.values

array([[1.1, 2.1, 3.1, 4.1, 5.1],

[1.2, 2.2, 3.2, 4.2, 5.2],

[1.3, 2.3, 3.3, 4.3, 5.3],

[1.4, 2.4, 3.4, 4.4, 5.4],

[1.5, 2.5, 3.5, 4.5, 5.5],

[1.6, 2.6, 3.6, 4.6, 5.6],

[1.7, 2.7, 3.7, 4.7, 5.7],

[1.8, 2.8, 3.8, 4.8, 5.8]])

weixin_40009207

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python导入data文件_使用Python导入包含文本和数字数据的文件

I have a .txt file which has text data and numerical data. The first two rows of the file have essential information in text data form, while the first column (I am referring to the zeroth column as t...
复制链接

扫一扫