In [7]:help(pd.read_csv)
Help on function read_csv in module pandas.io.parsers:
read_csv(filepath_or_buffer:Union[str, pathlib.Path, IO[~AnyStr]], sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True, iterator=False, chunksize=None, compression='infer', thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, dialect=None, error_bad_lines=True, warn_bad_lines=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)
Read a comma-separated values (csv)file into DataFrame.
Also supports optionally iterating or breaking of the file
into chunks.
Additional help can be found in the online docs for
`IO Tools <http://pandas.pydata.org/pandas-docs/stable/user_guide/io.html>`_.
Parameters
----------
filepath_or_buffer :str, path objectorfile-like object
Any valid string path is acceptable. The string could be a URL. Valid
URL schemes include http, ftp, s3,andfile. For file URLs, a host is
expected. A local file could be:file://localhost/path/to/table.csv.
If you want to passin a path object, pandas accepts any ``os.PathLike``.
By file-like object, we refer to objects with a ``read()`` method, such as
a file handler (e.g. via builtin ``open`` function)or ``StringIO``.
sep :str, default ','
Delimiter to use. If sep isNone, the C engine cannot automatically detect
the separator, but the Python parsing engine can, meaning the latter will
be used and automatically detect the separator by Python's builtin sniffer
tool, ``csv.Sniffer``. In addition, separators longer than 1 character and
different from ``'\s+'`` will be interpreted as regular expressions and
will also force the use of the Python parsing engine. Note that regex
delimiters are prone to ignoring quoted data. Regex example: ``'\r\t'``.
delimiter :str, default ``None``
Alias for sep.
header :int,list of int, default 'infer'
Row number(s) to use as the column names,and the start of the
data. Default behavior is to infer the column names:if no names
are passed the behavior is identical to ``header=0`` and column
names are inferred from the first line of the file,if column
names are passed explicitly then the behavior is identical to
``header=None``. Explicitly pass ``header=0`` to be able to
replace existing names. The header can be a list of integers that
specify row locations for a multi-index on the columns
e.g.[0,1,3]. Intervening rows that are not specified will be
skipped (e.g.2in this example is skipped). Note that this
parameter ignores commented lines and empty lines if
``skip_blank_lines=True``, so ``header=0`` denotes the first line of
data rather than the first line of the file.
names : array-like, optional
List of column names to use. If file contains no header row, then you
should explicitly pass ``header=None``. Duplicates in this list are not
allowed.
index_col :int,str, sequence of int/str,orFalse, default ``None``
Column(s) to use as the row labels of the ``DataFrame``, either given as
string name or column index. If a sequence of int/stris given, a
MultiIndex is used.
Note: ``index_col=False`` can be used to force pandas to *not* use the first
column as the index, e.g. when you have a malformed filewith delimiters at
the end of each line.
usecols :list-like orcallable, optional
Return a subset of the columns. If list-like,all elements must either
be positional (i.e. integer indices into the document columns)or strings
that correspond to column names provided either by the user in `names` or
inferred from the document header row(s). For example, a valid list-like
`usecols` parameter would be ``[0,1,2]`` or ``['foo','bar','baz']``.
Element order is ignored, so ``usecols=[0,1]`` is the same as ``[1,0]``.
To instantiate a DataFrame from ``data`` with element order preserved use
``pd.read_csv(data, usecols=['foo','bar'])[['foo','bar']]`` for columns
in ``['foo','bar']`` order or
``pd.read_csv(data, usecols=['foo','bar'])[['bar','foo']]``
for ``['bar','foo']`` order.-- More --
查看dataframe的前三行数据
In [9]: food_info.head(3)
Out[9]:
NDB_No Shrt_Desc Water_(g) Energ_Kcal ... FA_Sat_(g) FA_Mono_(g) FA_Poly_(g) Cholestrl_(mg)01001 BUTTER WITH SALT 15.87717...51.36821.0213.043215.011002 BUTTER WHIPPED WITH SALT 15.87717...50.48923.4263.012219.021003 BUTTER OIL ANHYDROUS 0.24876...61.92428.7323.694256.0[3 rows x 36 columns]
查看dataframe的尾三行数据
In [10]: food_info.tail(3)
Out[10]:
NDB_No Shrt_Desc Water_(g) Energ_Kcal ... FA_Sat_(g) FA_Mono_(g) FA_Poly_(g) Cholestrl_(mg)861590480 SYRUP CANE 26.0269...0.0000.0000.0000.0861690560 SNAIL RAW 79.290...0.3610.2590.25250.0861793600 TURTLE GREEN RAW 78.589...0.1270.0880.17050.0[3 rows x 36 columns]
查看dataframe的各个字段的名称
In [13]: food_info.columns
Out[13]:
Index(['NDB_No','Shrt_Desc','Water_(g)','Energ_Kcal','Protein_(g)','Lipid_Tot_(g)','Ash_(g)','Carbohydrt_(g)','Fiber_TD_(g)','Sugar_Tot_(g)','Calcium_(mg)','Iron_(mg)','Magnesium_(mg)','Phosphorus_(mg)','Potassium_(mg)','Sodium_(mg)','Zinc_(mg)','Copper_(mg)','Manganese_(mg)','Selenium_(mcg)','Vit_C_(mg)','Thiamin_(mg)','Riboflavin_(mg)','Niacin_(mg)','Vit_B6_(mg)','Vit_B12_(mcg)','Vit_A_IU','Vit_A_RAE','Vit_E_(mg)','Vit_D_mcg','Vit_D_IU','Vit_K_(mcg)','FA_Sat_(g)','FA_Mono_(g)','FA_Poly_(g)','Cholestrl_(mg)'],
dtype='object')
查看dataframe的形状
In [14]: food_info.shape
Out[14]:(8618,36)
查看某些行的数据
In [18]: food_info.loc[2:5]
Out[18]:
NDB_No Shrt_Desc Water_(g) Energ_Kcal ... FA_Sat_(g) FA_Mono_(g) FA_Poly_(g) Cholestrl_(mg)21003 BUTTER OIL ANHYDROUS 0.24876...61.92428.7323.694256.031004 CHEESE BLUE 42.41353...18.6697.7780.80075.041005 CHEESE BRICK 41.11371...18.7648.5980.78494.051006 CHEESE BRIE 48.42334...17.4108.0130.826100.0[4 rows x 36 columns]
取某一列的数据
In [19]: food_info["Water_(g)"]
Out[19]:015.87115.8720.24342.41441.11...861343.00861470.25861526.00861679.20861778.50
Name: Water_(g), Length:8618, dtype: float64
取某几列的数据
In [22]: food_info[["Water_(g)","Energ_Kcal"]]
Out[22]:
Water_(g) Energ_Kcal
015.87717115.8771720.24876342.41353441.11371.........861343.00305861470.25111861526.00269861679.2090861778.5089[8618 rows x 2 columns]
把dataframe的列名转化为列表
In [24]: food_info.columns.tolist()
Out[24]:['NDB_No','Shrt_Desc','Water_(g)','Energ_Kcal','Protein_(g)','Lipid_Tot_(g)','Ash_(g)','Carbohydrt_(g)','Fiber_TD_(g)','Sugar_Tot_(g)','Calcium_(mg)','Iron_(mg)','Magnesium_(mg)','Phosphorus_(mg)','Potassium_(mg)','Sodium_(mg)','Zinc_(mg)','Copper_(mg)','Manganese_(mg)','Selenium_(mcg)','Vit_C_(mg)','Thiamin_(mg)','Riboflavin_(mg)','Niacin_(mg)','Vit_B6_(mg)','Vit_B12_(mcg)','Vit_A_IU','Vit_A_RAE','Vit_E_(mg)','Vit_D_mcg','Vit_D_IU','Vit_K_(mcg)','FA_Sat_(g)','FA_Mono_(g)','FA_Poly_(g)','Cholestrl_(mg)']
按列进行排序
#inplace表示是否替换,即是在原对象上排序还是新生成一个对象后在之上排序#ascending表示修改默认的升序排序为降序排序
In [33]: food_info.sort_values("Water_(g)", inplace=True, ascending=False)
In [34]: food_info["Water_(g)"]
Out[34]:4209100.04378100.04348100.04377100.04376100.0...6067 NaN
6113 NaN
1983 NaN
7776 NaN
6095 NaN
Name: Water_(g), Length:8618, dtype: float64
判断某列数据是否存在缺失值
In [38]: pd.isnull(water)
Out[38]:4209False4378False4348False4377False4376False...6067True6113True1983True7776True6095True
Name: Water_(g), Length:8618, dtype:bool