08 ,df 列操作 :字段名,dtype 字段类型,字段操作案例,列计算,大,小,平均值

1 ,字段名 : data.columns

  1. 代码 :
if __name__ == '__main__':
    # 读文件 csv
    data = pd.read_csv("titanic_train.csv")
    # 所有字段 :
    cols = data.columns
    print(cols)
==================================
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

2 ,字段类型,查看 : data.dtypes

  1. 代码 :
if __name__ == '__main__':
    # 读文件 csv
    data = pd.read_csv("titanic_train.csv")
    # 所有字段 :
    res = data.dtypes
    print(res)
================================
PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object

3 ,字段类型,修改 :data[“PassengerId”].astype(“object”)

  1. 代码 :
if __name__ == '__main__':
    # 读文件 csv
    data = pd.read_csv("titanic_train.csv")
    # 所有字段 :
    print(data.dtypes)
    data["PassengerId"] = data["PassengerId"].astype("object")
    print(data.dtypes)
=======================================================
PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
=======================================================
PassengerId     object
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object

4 ,案例:字段操作

  1. 字段名操作 :
    1 ,定位 : 找出所有以 “d” 结尾的字段,并且取出。
    2 ,操作 : 将这些字段 * 2,得到新的字段
    3 ,替换 : 将原字段删除,将新字段放入
  2. 精华代码 :
# 3 ,制造新 df
new_df = data[new_cols] * 2
# 4 ,去掉旧 df
res_data = data.drop(new_cols,axis=1)
# 5 ,添加新列
res_data[["new01","new02","new03"]] = new_df
  1. 全部代码 :
import numpy as np
import pandas as pd
import pandas.core.frame

if __name__ == '__main__':
    # 读文件 csv
    data = pd.read_csv("titanic_train.csv")
    # 1 ,所有字段 :
    old_cols = data.columns.tolist()
    # 2 ,找到所有的 d 结尾字段
    new_cols = []
    for e in old_cols:
        if str(e).endswith("d"):
            new_cols.append(e)
    print(old_cols)
    print(new_cols)
    # 3 ,制造新 df
    new_df = data[new_cols] * 2
    # 4 ,去掉旧 df
    res_data = data.drop(new_cols,axis=1)
    # 5 ,添加新列
    res_data[["new01","new02","new03"]] = new_df
    print(res_data)
================================================================
['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked']
['PassengerId', 'Survived', 'Embarked']
     Pclass                                               Name  ... new02  new03
0         3                            Braund, Mr. Owen Harris  ...     0     SS
1         1  Cumings, Mrs. John Bradley (Florence Briggs Th...  ...     2     CC
......................
......................

5 ,列计算,列乘 :data[“PassengerId”] * data[“Survived”]

  1. 不同列之间 :可以做计算,加减乘除
  2. 代码 :
if __name__ == '__main__':
    # 读文件 csv
    data = pd.read_csv("titanic_train.csv")
    # 取出两列
    df_new = data[["PassengerId","Survived"]]
    df_tow = data["PassengerId"] * data["Survived"]
    df_new["tow"] = df_tow
    print(df_new)
===================================
     PassengerId  Survived  tow
0              1         0    0
1              2         1    2
2              3         1    3

6 ,最大值 :data[“Age”].max()

  1. 代码 : 年龄最大的人
if __name__ == '__main__':
    # 读文件 csv
    data = pd.read_csv("titanic_train.csv")
    # 取出两列
    res = data["Age"].max()
    print(res)
=======================
80.0

7 ,最小值 : data[“Age”].min()

8 ,平均值 : data[“Age”].mean()

  1. 注意,这个平均值,不是 :总和/总数
  2. 是 : 不算空值

9 ,总和 : data[“Age”].sum()

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Problem_C_Data.zip The three data sets provided contain product user ratings and reviews extracted from the Amazon Customer Reviews Dataset thru Amazon Simple Storage Service (Amazon S3). hair_dryer.tsv microwave.tsv pacifier.tsv Data Set Definitions: Each row represents data partitioned into the following columns. Problem_C_Data.zip提供的三个数据集包含产品用户评分和通过Amazon Simple Storage Service(Amazon S3)从Amazon客户评论数据集提取的评论。 hair_dryer.tsv微波.tsv pacifier.tsv数据集定义:每行代表划分为以下各列的数据。 ● marketplace (string): 2 letter country code of the marketplace where the review was written. ●市场(字符串):撰写评论的市场的2个字母的国家代码。 ● customer_id (string): Random identifier that can be used to aggregate reviews written by a single author. ●customer_id(字符串):随机标识符,可用于汇总单个作者撰写的评论。 ● review_id (string): The unique ID of the review. ●review_id(字符串):评论的唯一ID。 ● product_id (string): The unique Product ID the review pertains to. ●product_id(字符串):审核所属的唯一产品ID。 ● product_parent (string): Random identifier that can be used to aggregate reviews for the same product. ●product_parent(字符串):随机标识符,可用于汇总同一产品的评论。 ● product_title (string): Title of the product. ●product_title(字符串):产品的标题。 ● product_category (string): The major consumer category for the product. ●product_category(字符串):产品的主要消费者类别。 ● star_rating (int): The 1-5 star rating of the review. ●star_rating(int):评论的1-5星评级。 ● helpful_votes (int): Number of helpful votes. ●helpful_votes(int):有用的投票数。 ● total_votes (int): Number of total votes the review received. ●total_votes(int):评论收到的总票数。 ● vine (string): Customers are invited to become Amazon Vine Voices based on the trust that they have earned in the Amazon community for writing accurate and insightful reviews. Amazon provides Amazon Vine members with free copies of products that have been submitted to the program by vendors. Amazon doesn't influence the opinions of Amazon Vine members, nor do they modify or edit reviews. ●vine(字符串):基于客户在撰写准确而有见地的评论方面所获得的信任,邀请客户成为Amazon Vine Voices。亚马逊为Amazon Vine成员提供了供应商已提交给该程序的产品的免费副本。Amazon不会影响Amazon Vine成员的意见,也不会修改或编辑评论。 ● verified_purchase (string): A “Y” indicates Amazon verified that the person writing the review purchased the product at Amazon and didn't receive the product at a deep discount. ●verify_purchase(字符串):“ Y”表示亚马逊已验证撰写评论的人在亚马逊上购买了该产品,并且没有以大幅折扣收到该产品。 ● review_headline (string): The title of the review. ●review_headline(字符串):评论的标题。 ● review_body (string): The review text. ●review_body(字符串):评论文本。 ● review_date (bigint): The date the review was written. ●review_date(bigint):撰写评论的日期。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值