删除文件最新可以用这个命令:find . -name “test_*” |xargs rm -rfv
移动文件到目录:sudo mv update_trade_cnt_feature_data.csv /home/scdata/all_model_train_data/revoke
(1)批量删除文件
Linux下,直接键入history命令,会将当前账户此前所有的命令都显示出来,未免太多了些。如果我只想查找某种命令,怎么办?
比如说,我只想查找我之前运行过的 “git” 命令
可以这样写:
$ history | grep “git”
批量删除文件命令
sudo find . -name ‘*.crc’ | xargs sudo rm -rf
(2)pyspark.sql.dataframe.DataFrame与pandas.core.frame.DataFrame不同,
可以import pyspark,然后dir(pyspark.sql.dataframe.DataFrame)看其属性;同理后者!!
def the_second_hy_map(spark,sc):
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType
sqlContext = SQLContext(sparkContext=sc)
sqlContext.registerFunction("type1", lambda x: type1(x))####UDF应用重点!!!
dfkk3 = sqlContext.read.csv(
myfeature_path + 'the_final_second_hy_risk_proportion.csv', header=True)
dfkk3.show()
dfkk3.createOrReplaceTempView('b1')
df12=sqlContext.sql("select count(*) as num from b1 ")#where company_name = 'company_name',='0.02242273631857184'
df12.show()
# dfkk3 = sqlContext.read.csv(
# myfeature_path + 'the_final_second_update_db_data.csv', header=True)
# dfkk3.show()
spark.stop()
risk_type_map = udf(prob_map_to_risk_rank, StringType())
df2 = dfkk3.withColumn("risk_type", risk_type_map(dfkk3['revoke_prob']))
# df2.show()
prob_map_score = udf(prob_map_to_score, StringType())
df3 = df2.withColumn("new_risk_score", prob_map_score(df2['revoke_prob']))
df4 = df3.groupby(['big_hy_name', 'risk_type']).count()
df4.show()
df4.write.csv(os.path.join(myfeature_path, "the_final_hy_risk_num.csv"), mode='overwrite', header=True)
# df3.show()
# df3.drop('risk_score')
# # print("****************")
# # print(type(df4))
# # print("****************")
df3.selectExpr('new_risk_score as risk_score') #sql的dataframe重命名,不能用pandas里面dataframe的rename实现!
sqlContext.registerDataFrameAsTable(df3, "tb1")##创建临时表
df4 = sqlContext.sql("select id,create_time,update_time,company_name,revoke_prob,"
"is_revoke,risk_score,rank,original_industry,big_hy_code,"
"big_hy_name,sub_rank,risk_type from tb1")
# df4.write.csv(os.path.join(myfeature_path, "the_final_second_hy_risk_proportion.csv"), mode='overwrite', header=True)
spark.stop()
###数据处理不够细心