建立神经网络来预测贷款风险

最新推荐文章于 2024-02-24 20:12:49 发布

weixin_26752765

最新推荐文章于 2024-02-24 20:12:49 发布

阅读量1.9k

点赞数

文章标签：神经网络 python 人工智能机器学习深度学习

原文链接：https://towardsdatascience.com/loan-risk-neural-network-30c8f65f052e

版权

深度分析 (In-Depth Analysis)

介绍 (Introduction)

LendingClub is the world’s largest peer-to-peer lending platform. Until recently (through the end of 2018), LendingClub published a public dataset of all loans issued since the company’s launch in 2007. I’m accessing the dataset via Kaggle.

LendingClub是世界上最大的点对点借贷平台。直到最近(到2018年底)，LendingClub都发布了自该公司于2007年成立以来发行的所有贷款的公开数据集。我正在通过Kaggle访问该数据集。

(2260701, 151)

With 2,260,701 loans to look at and 151 potential variables, my goal is to create a neural network model with TensorFlow and Keras to predict the fraction of an expected loan return that a prospective borrower will pay back. This will require a lot of data cleaning given the state of the dataset, and I’ll walk through that entire process here. After building and training the network, I’ll create a public API to serve that model.

我需要查看2,260,701笔贷款和151个潜在变量，我的目标是使用TensorFlow和Keras创建一个神经网络模型，以预测潜在借款人将偿还的预期贷款回报的比例。给定数据集的状态，这将需要大量数据清理，在此我将逐步介绍整个过程。在构建并训练了网络之后，我将创建一个公共API来服务于该模型。

Also, as you may have guessed from the preceding code block, this post is adapted from a Jupyter Notebook. If you’d like to follow along in your own notebook, go ahead and fork mine on Kaggle or GitHub.

另外，正如您可能从前面的代码块中猜到的那样，此文章改编自Jupyter Notebook。如果您想继续使用自己的笔记本，请继续在Kaggle或GitHub上进行挖掘。

数据清理 (Data cleaning)

I’ll first look at the data dictionary (downloaded directly from LendingClub’s website) to get an idea of how to create the desired output variable and which remaining features are available at the point of loan application (to avoid data leakage).

我将首先查看数据字典(直接从LendingClub的网站下载)，以了解如何创建所需的输出变量以及在贷款申请时可以使用哪些其余功能(以避免数据泄漏)。

•id: A unique LC assigned ID for the loan listing.
•member_id: A unique LC assigned Id for the borrower member.
•loan_amnt: The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value.
•funded_amnt: The total amount committed to that loan at that point in time.
•funded_amnt_inv: The total amount committed by investors for that loan at that point in time.
•term: The number of payments on the loan. Values are in months and can be either 36 or 60.
•int_rate: Interest Rate on the loan
•installment: The monthly payment owed by the borrower if the loan originates.
•grade: LC assigned loan grade
•sub_grade: LC assigned loan subgrade
•emp_title: The job title supplied by the Borrower when applying for the loan.*
•emp_length: Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years.
•home_ownership: The home ownership status provided by the borrower during registration or obtained from the credit report. Our values are: RENT, OWN, MORTGAGE, OTHER
•annual_inc: The self-reported annual income provided by the borrower during registration.
•verification_status: Indicates if income was verified by LC, not verified, or if the income source was verified
•issue_d: The month which the loan was funded
•loan_status: Current status of the loan
•pymnt_plan: Indicates if a payment plan has been put in place for the loan
•url: URL for the LC page with listing data.
•desc: Loan description provided by the borrower
•purpose: A category provided by the borrower for the loan request.
•title: The loan title provided by the borrower
•zip_code: The first 3 numbers of the zip code provided by the borrower in the loan application.
•addr_state: The state provided by the borrower in the loan application
•dti: A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.
•delinq_2yrs: The number of 30+ days past-due incidences of delinquency in the borrower's credit file for the past 2 years
•earliest_cr_line: The month the borrower's earliest reported credit line was opened
•fico_range_low: The lower boundary range the borrower’s FICO at loan origination belongs to.
•fico_range_high: The upper boundary range the borrower’s FICO at loan origination belongs to.
•inq_last_6mths: The number of inquiries in past 6 months (excluding auto and mortgage inquiries)
•mths_since_last_delinq: The number of months since the borrower's last delinquency.
•mths_since_last_record: The number of months since the last public record.
•open_acc: The number of open credit lines in the borrower's credit file.
•pub_rec: Number of derogatory public records
•revol_bal: Total credit revolving balance
•revol_util: Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit.
•total_acc: The total number of credit lines currently in the borrower's credit file
•initial_list_status: The initial listing status of the loan. Possible values are – W, F
•out_prncp: Remaining outstanding principal for total amount funded
•out_prncp_inv: Remaining outstanding principal for portion of total amount funded by investors
•total_pymnt: Payments received to date for total amount funded
•total_pymnt_inv: Payments received to date for portion of total amount funded by investors
•total_rec_prncp: Principal received to date
•total_rec_int: Interest received to date
•total_rec_late_fee: Late fees received to date
•recoveries: post charge off gross recovery
•collection_recovery_fee: post charge off collection fee
•last_pymnt_d: Last month payment was received
•last_pymnt_amnt: Last total payment amount received
•next_pymnt_d: Next scheduled payment date
•last_credit_pull_d: The most recent month LC pulled credit for this loan
•last_fico_range_high: The upper boundary range the borrower’s last FICO pulled belongs to.
•last_fico_range_low: The lower boundary range the borrower’s last FICO pulled belongs to.
•collections_12_mths_ex_med: Number of collections in 12 months excluding medical collections
•mths_since_last_major_derog: Months since most recent 90-day or worse rating
•policy_code: publicly available policy_code=1
new products not publicly available policy_code=2
•application_type: Indicates whether the loan is an individual application or a joint application with two co-borrowers
•annual_inc_joint: The combined self-reported annual income provided by the co-borrowers during registration
•dti_joint: A ratio calculated using the co-borrowers' total monthly payments on the total debt obligations, excluding mortgages and the requested LC loan, divided by the co-borrowers' combined self-reported monthly income
•verification_status_joint: Indicates if the co-borrowers' joint income was verified by LC, not verified, or if the income source was verified
•acc_now_delinq: The number of accounts on which the borrower is now delinquent.
•tot_coll_amt: Total collection amounts ever owed
•tot_cur_bal: Total current balance of all accounts
•open_acc_6m: Number of open trades in last 6 months
•open_act_il: Number of currently active installment trades
•open_il_12m: Number of installment accounts opened in past 12 months
•open_il_24m: Number of installment accounts opened in past 24 months
•mths_since_rcnt_il: Months since most recent installment accounts opened
•total_bal_il: Total current balance of all installment accounts
•il_util: Ratio of total current balance to high credit/credit limit on all install acct
•open_rv_12m: Number of revolving trades opened in past 12 months
•open_rv_24m: Number of revolving trades opened in past 24 months
•max_bal_bc: Maximum current balance owed on all revolving accounts
•all_util: Balance to credit limit on all trades
•total_rev_hi_lim: Total revolving high credit/credit limit
•inq_fi: Number of personal finance inquiries
•total_cu_tl: Number of finance trades
•inq_last_12m: Number of credit inquiries in past 12 months
•acc_open_past_24mths: Number of trades opened in past 24 months.
•avg_cur_bal: Average current balance of all accounts
•bc_open_to_buy: Total open to buy on revolving bankcards.
•bc_util: Ratio of total current balance to high credit/credit limit for all bankcard accounts.
•chargeoff_within_12_mths: Number of charge-offs within 12 months
•delinq_amnt: The past-due amount owed for the accounts on which the borrower is now delinquent.
•mo_sin_old_il_acct: Months since oldest bank installment account opened
•mo_sin_old_rev_tl_op: Months since oldest revolving account opened
•mo_sin_rcnt_rev_tl_op: Months since most recent revolving account opened
•mo_sin_rcnt_tl: Months since most recent account opened
•mort_acc: Number of mortgage accounts.
•mths_since_recent_bc: Months since most recent bankcard account opened.
•mths_since_recent_bc_dlq: Months since most recent bankcard delinquency
•mths_since_recent_inq: Months since most recent inquiry.
•mths_since_recent_revol_delinq: Months since most recent revolving delinquency.
•num_accts_ever_120_pd: Number of accounts ever 120 or more days past due
•num_actv_bc_tl: Number of currently active bankcard accounts
•num_actv_rev_tl: Number of currently active revolving trades
•num_bc_sats: Number of satisfactory bankcard accounts
•num_bc_tl: Number of bankcard accounts
•num_il_tl: Number of installment accounts
•num_op_rev_tl: Number of open revolving accounts
•num_rev_accts: Number of revolving accounts
•num_rev_tl_bal_gt_0: Number of revolving trades with balance >0
•num_sats: Number of satisfactory accounts
•num_tl_120dpd_2m: Number of accounts currently 120 days past due (updated in past 2 months)
•num_tl_30dpd: Number of accounts currently 30 days past due (updated in past 2 months)
•num_tl_90g_dpd_24m: Number of accounts 90 or more days past due in last 24 months
•num_tl_op_past_12m: Number of accounts opened in past 12 months
•pct_tl_nvr_dlq: Percent of trades never delinquent
•percent_bc_gt_75: Percentage of all bankcard accounts > 75% of limit.
•pub_rec_bankruptcies: Number of public record bankruptcies
•tax_liens: Number of tax liens
•tot_hi_cred_lim: Total high credit/credit limit
•total_bal_ex_mort: Total credit balance excluding mortgage
•total_bc_limit: Total bankcard high credit/credit limit
•total_il_high_credit_limit: Total installment high credit/credit limit
•revol_bal_joint: Sum of revolving credit balance of the co-borrowers, net of duplicate balances
•sec_app_fico_range_low: FICO range (high) for the secondary applicant
•sec_app_fico_range_high: FICO range (low) for the secondary applicant
•sec_app_earliest_cr_line: Earliest credit line at time of application for the secondary applicant
•sec_app_inq_last_6mths: Credit inquiries in the last 6 months at time of application for the secondary applicant
•sec_app_mort_acc: Number of mortgage accounts at time of application for the secondary applicant
•sec_app_open_acc: Number of open trades at time of application for the secondary applicant
•sec_app_revol_util: Ratio of total current balance to high credit/credit limit for all revolving accounts
•sec_app_open_act_il: Number of currently active installment trades at time of application for the secondary applicant
•sec_app_num_rev_accts: Number of revolving accounts at time of application for the secondary applicant
•sec_app_chargeoff_within_12_mths: Number of charge-offs within last 12 months at time of application for the secondary applicant
•sec_app_collections_12_mths_ex_med: Number of collections within last 12 months excluding medical collections at time of application for the secondary applicant
•sec_app_mths_since_last_major_derog: Months since most recent 90-day or worse rating at time of application for the secondary applicant
•hardship_flag: Flags whether or not the borrower is on a hardship plan
•hardship_type: Describes the hardship plan offering
•hardship_reason: Describes the reason the hardship plan was offered
•hardship_status: Describes if the hardship plan is active, pending, canceled, completed, or broken
•deferral_term: Amount of months that the borrower is expected to pay less than the contractual monthly payment amount due to a hardship plan
•hardship_amount: The interest payment that the borrower has committed to make each month while they are on a hardship plan
•hardship_start_date: The start date of the hardship plan period
•hardship_end_date: The end date of the hardship plan period
•payment_plan_start_date: The day the first hardship plan payment is due. For example, if a borrower has a hardship plan period of 3 months, the start date is the start of the three-month period in which the borrower is allowed to make interest-only payments.
•hardship_length: The number of months the borrower will make smaller payments than normally obligated due to a hardship plan
•hardship_dpd: Account days past due as of the hardship plan start date
•hardship_loan_status: Loan Status as of the hardship plan start date
•orig_projected_additional_accrued_interest: The original projected additional interest amount that will accrue for the given hardship payment plan as of the Hardship Start Date. This field will be null if the borrower has broken their hardship payment plan.
•hardship_payoff_balance_amount: The payoff balance amount as of the hardship plan start date
•hardship_last_payment_amount: The last payment amount as of the hardship plan start date
•disbursement_method: The method by which the borrower receives their loan. Possible values are: CASH, DIRECT_PAY
•debt_settlement_flag: Flags whether or not the borrower, who has charged-off, is working with a debt-settlement company.
•debt_settlement_flag_date: The most recent date that the Debt_Settlement_Flag has been set
•settlement_status: The status of the borrower’s settlement plan. Possible values are: COMPLETE, ACTIVE, BROKEN, CANCELLED, DENIED, DRAFT
•settlement_date: The date that the borrower agrees to the settlement plan
•settlement_amount: The loan amount that the borrower has agreed to settle for
•settlement_percentage: The settlement amount as a percentage of the payoff balance amount on the loan
•settlement_term: The number of months that the borrower will be on the settlement plan

For the output variable (the fraction of expected return that was recovered), I’ll calculated the expected return by multiplying the monthly payment amount (installment) by the number of payments on the loan (term), and I’ll calculate the amount actually received by summing the total principle, interest, late fees, and post-chargeoff gross recovery received (total_rec_prncp, total_rec_int, total_rec_late_fee, recoveries) and subtracting any collection fee (collection_recovery_fee).

对于输出变量(收回的预期收益的比例)，我将每月还款额( installment )乘以贷款的还款次数( term )来计算出预期收益 ，然后计算出该金额其实总结的总原则，利息，滞纳金和后chargeoff收到总回收率( 收到 total_rec_prncp ， total_rec_int ， total_rec_late_fee ， recoveries )并减去任何费征收( collection_recovery_fee )。

Several other columns contain either irrelevant demographic data or data not created until after a loan is accepted, so those will need to be removed. I’ll hold onto issue_d (the month and year the loan was funded) for now, though, in case I want to compare variables to the date of the loan.

其他几列包含不相关的人口统计数据或直到接受贷款后才创建的数据，因此需要将其删除。不过，如果我想将变量与贷款日期进行比较，我暂时保留issue_d (贷款资金的年月)。

emp_title (the applicant’s job title) does seem relevant in the context of a loan, but it may have too many unique values to be useful.

emp_title (申请人的职务)在贷款方面似乎确实相关，但是它可能具有太多独特的值，无法使用。

Too many unique values indeed. In a future version of this model, I could perhaps try to generate a feature from this column by aggregating job titles into categories, but that effort may have a low return on investment, since there are already columns for annual income and length of employment.

确实有太多独特的价值。在此模型的未来版本中，我也许可以尝试通过将职称汇总到类别中来从此列中生成功能，但是这种努力可能会降低投资回报率，因为已经有用于年收入和就业时间的列。

Two other interesting columns that I’ll also remove are title and desc (“description”), which are both freeform text entries written by the borrower. These could be fascinating subjects for natural language processing, but that’s outside the scope of the current project. Perhaps in the future, I could generate additional features from these fields using measures like syntactic complexity, word count, or keyword inclusion.

我还将删除的另外两个有趣的列是title和desc (“描述”)，它们都是借款人编写的自由格式文本条目。这些可能是自然语言处理的有趣主题，但这不在当前项目的范围内。也许将来，我可以使用语法复杂性，字数统计或关键字包含之类的方法从这些字段中生成其他功能。

Before creating the output variable, however, I must take a closer look at loan_status, to see if any loans in the dataset are still open.

但是，在创建输出变量之前，我必须仔细查看loan_status ，以查看数据集中是否还有任何借贷。

loan_status
Charged Off                                             268559
Current                                                 878317
Default                                                     40
Does not meet the credit policy. Status:Charged Off        761
Does not meet the credit policy. Status:Fully Paid        1988
Fully Paid                                             1076751
In Grace Period                                           8436
Late (16-30 days)                                         4349
Late (31-120 days)                                       21467
Name: loan_status, dtype: int64

For practical purposes, I’ll consider loans with statuses that don’t contain “Fully Paid” or “Charged Off” to still be open, so I’ll remove those from the dataset. I’ll also merge the “credit policy” columns with their matching status.

出于实际目的，我将认为状态不包含“已付清”或“已清还”的贷款仍处于打开状态，因此我将从数据集中删除这些贷款。我还将合并“信贷政策”列及其匹配状态。

loan_status
Charged Off     269320
Fully Paid     1078739
Name: loan_status, dtype: int64

Now to create the output variable. I’ll start by checking the null counts of the variables involved.

现在创建输出变量。我将从检查所涉及变量的空计数开始。

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1348059 entries, 0 to 2260697
Data columns (total 7 columns):
 #   Column                   Non-Null Count    Dtype
---  ------                   --------------    -----
 0   term                     1348059 non-null  object
 1   installment              1348059 non-null  float64
 2   total_rec_prncp          1348059 non-null  float64
 3   total_rec_int            1348059 non-null  float64
 4   total_rec_late_fee       1348059 non-null  float64
 5   recoveries               1348059 non-null  float64
 6   collection_recovery_fee  1348059 non-null  float64
dtypes: float64(6), object(1)
memory usage: 82.3+ MB

Every remaining row has each of these seven variables, but term’s data type is object, so that needs to be fixed first.

其余的每一行都有这七个变量，但是term的数据类型是object ，因此需要首先固定。

term
 36 months    1023181
 60 months     324878
Name: term, dtype: int64

Ah, so term is a categorical feature with two options. I’ll treat it as such when I use it as an input to the model, but to calculate the output variable I’ll create a numerical column from it.

嗯， term是带有两个选项的分类功能。当我将其用作模型的输入时，将对其进行处理，但是要计算输出变量，我将根据该值创建一个数字列。

Also, I need to trim the whitespace from the beginning of those values — that’s no good.

另外，我需要从这些值的开头修剪空白-这是不好的。

Now I can create the output variable.

现在，我可以创建输出变量。

There is at least one odd outlier on the right in both categories. But also, many of the “fully paid” loans do not quite reach 1. One potential explanation is that when the last payment comes in, the system just flips loan_status to “Fully Paid” without adding the payment amount to the system itself, or perhaps simply multiplying installation by the term number leaves off a few cents in the actual total. If I were performing this analysis for Lending Club themselves, I’d ask them, but this is just a personal project. I’ll consider every loan marked “Fully Paid” to have fully recovered the expected return.

在这两类中，至少有一个奇异的异常值在右边。但是，许多“已付清”贷款还没有达到1。一个可能的解释是，当最后一次loan_status ，系统只是将loan_status翻转为“已全额支付”，而未将支付金额添加到系统本身，或者也许仅将installation数乘以term数就可以使实际总数减少几美分。如果我是为Lending Club自己进行分析，我会问他们，但这只是一个个人项目。我认为每笔标有“已付清”的贷款都已完全收回了预期收益。

For that matter, I’ll cap my fraction_recovered values for charged off loans at 1.0 as well, since at least one value is above that for some reason.

为此，我还将冲销贷款的fraction_recovered值也限制为1.0，因为出于某种原因至少有一个值高于该值。

For the sake of curiosity, I’ll plot the distribution of fraction recovered for charged-off loans.

出于好奇，我将为冲销贷款绘制回收分数的分布。

A KDE (kernel density estimation) plot depicting the distribution of “fraction recovered” amounts among charged-off loans.

Now that the output is formatted, it’s time to clean up the inputs. I’ll check the null counts of each variable.

现在已经格式化了输出，是时候清理输入了。我将检查每个变量的空计数。

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1348059 entries, 0 to 2260697
Data columns (total 97 columns):
 #   Column                               Non-Null Count    Dtype
---  ------                               --------------    -----
 0   loan_amnt                            1348059 non-null  float64
 1   term                                 1348059 non-null  object
 2   emp_length                           1269514 non-null  object
 3   home_ownership                       1348059 non-null  object
 4   annual_inc                           1348055 non-null  float64
 5   verification_status                  1348059 non-null  object
 6   issue_d                              1348059 non-null  object
 7   loan_status                          1348059 non-null  object
 8   purpose                              1348059 non-null  object
 9   dti                                  1347685 non-null  float64
 10  delinq_2yrs                          1348030 non-null  float64
 11  earliest_cr_line                     1348030 non-null  object
 12  fico_range_low                       1348059 non-null  float64
 13  fico_range_high                      1348059 non-null  float64
 14  inq_last_6mths                       1348029 non-null  float64
 15  mths_since_last_delinq               668117 non-null   float64
 16  mths_since_last_record               229415 non-null   float64
 17  open_acc                             1348030 non-null  float64
 18  pub_rec                              1348030 non-null  float64
 19  revol_bal                            1348059 non-null  float64
 20  revol_util                           1347162 non-null  float64
 21  total_acc                            1348030 non-null  float64
 22  collections_12_mths_ex_med           1347914 non-null  float64
 23  mths_since_last_major_derog          353750 non-null   float64
 24  application_type                     1348059 non-null  object
 25  annual_inc_joint                     25800 non-null    float64
 26  dti_joint                            25797 non-null    float64
 27  verification_status_joint            25595 non-null    object
 28  acc_now_delinq                       1348030 non-null  float64
 29  tot_coll_amt                         1277783 non-null  float64
 30  tot_cur_bal                          1277783 non-null  float64
 31  open_acc_6m                          537597 non-null   float64
 32  open_act_il                          537598 non-null   float64
 33  open_il_12m                          537598 non-null   float64
 34  open_il_24m                          537598 non-null   float64
 35  mths_since_rcnt_il                   523382 non-null   float64
 36  total_bal_il                         537598 non-null   float64
 37  il_util                              465016 non-null   float64
 38  open_rv_12m                          537598 non-null   float64
 39  open_rv_24m                          537598 non-null   float64
 40  max_bal_bc                           537598 non-null   float64
 41  all_util                             537545 non-null   float64
 42  total_rev_hi_lim                     1277783 non-null  float64
 43  inq_fi                               537598 non-null   float64
 44  total_cu_tl                          537597 non-null   float64
 45  inq_last_12m                         537597 non-null   float64
 46  acc_open_past_24mths                 1298029 non-null  float64
 47  avg_cur_bal                          1277761 non-null  float64
 48  bc_open_to_buy                       1284167 non-null  float64
 49  bc_util                              1283398 non-null  float64
 50  chargeoff_within_12_mths             1347914 non-null  float64
 51  delinq_amnt                          1348030 non-null  float64
 52  mo_sin_old_il_acct                   1239735 non-null  float64
 53  mo_sin_old_rev_tl_op                 1277782 non-null  float64
 54  mo_sin_rcnt_rev_tl_op                1277782 non-null  float64
 55  mo_sin_rcnt_tl                       1277783 non-null  float64
 56  mort_acc                             1298029 non-null  float64
 57  mths_since_recent_bc                 1285089 non-null  float64
 58  mths_since_recent_bc_dlq             319020 non-null   float64
 59  mths_since_recent_inq                1171239 non-null  float64
 60  mths_since_recent_revol_delinq       449962 non-null   float64
 61  num_accts_ever_120_pd                1277783 non-null  float64
 62  num_actv_bc_tl                       1277783 non-null  float64
 63  num_actv_rev_tl                      1277783 non-null  float64
 64  num_bc_sats                          1289469 non-null  float64
 65  num_bc_tl                            1277783 non-null  float64
 66  num_il_tl                            1277783 non-null  float64
 67  num_op_rev_tl                        1277783 non-null  float64
 68  num_rev_accts                        1277782 non-null  float64
 69  num_rev_tl_bal_gt_0                  1277783 non-null  float64
 70  num_sats                             1289469 non-null  float64
 71  num_tl_120dpd_2m                     1227909 non-null  float64
 72  num_tl_30dpd                         1277783 non-null  float64
 73  num_tl_90g_dpd_24m                   1277783 non-null  float64
 74  num_tl_op_past_12m                   1277783 non-null  float64
 75  pct_tl_nvr_dlq                       1277629 non-null  float64
 76  percent_bc_gt_75                     1283755 non-null  float64
 77  pub_rec_bankruptcies                 1346694 non-null  float64
 78  tax_liens                            1347954 non-null  float64
 79  tot_hi_cred_lim                      1277783 non-null  float64
 80  total_bal_ex_mort                    1298029 non-null  float64
 81  total_bc_limit                       1298029 non-null  float64
 82  total_il_high_credit_limit           1277783 non-null  float64
 83  revol_bal_joint                      18629 non-null    float64
 84  sec_app_fico_range_low               18630 non-null    float64
 85  sec_app_fico_range_high              18630 non-null    float64
 86  sec_app_earliest_cr_line             18630 non-null    object
 87  sec_app_inq_last_6mths               18630 non-null    float64
 88  sec_app_mort_acc                     18630 non-null    float64
 89  sec_app_open_acc                     18630 non-null    float64
 90  sec_app_revol_util                   18302 non-null    float64
 91  sec_app_open_act_il                  18630 non-null    float64
 92  sec_app_num_rev_accts                18630 non-null    float64
 93  sec_app_chargeoff_within_12_mths     18630 non-null    float64
 94  sec_app_collections_12_mths_ex_med   18630 non-null    float64
 95  sec_app_mths_since_last_major_derog  6645 non-null     float64
 96  fraction_recovered                   1348059 non-null  float64
dtypes: float64(86), object(11)
memory usage: 1007.9+ MB

Remaining columns with lots of null values seem to fall into three categories:

剩下的带有很多空值的列似乎可以分为三类：

Derogatory/delinquency metrics (where null means the borrower doesn’t have any such marks). I’ll also add mths_since_recent_inq to this list, since its non-null count is below what seems to be the threshold for complete data, which is around 1,277,783. I’ll assume a null value here means no recent inquiries.
贬损/拖欠行为指标 (其中null表示借款人没有任何此类标记)。我还将在该列表中添加mths_since_recent_inq ，因为其非空计数低于似乎是完整数据的阈值，约为1,277,783。我假设这里为空值，意味着没有最近的查询。
Metrics that only apply to joint applications (where null means it was a single application).
仅适用于联合应用程序的度量标准 (其中null表示它是单个应用程序)。
An inexplicable series of 14 credit history–related columns that only have around 537,000 entries. Are these newer metrics?
包含14个与信用记录相关的列的莫名其妙的系列 ，只有大约537,000个条目。这些是新指标吗？

I’ll first look at those more confusing columns to find out whether or not they’re a newer set of metrics. That’ll require converting issue_d to date format first.

我将首先查看那些更令人困惑的列，以了解它们是否是一组较新的指标。这将需要首先将issue_d转换为日期格式。

count                 464325
min      2015-12-01 00:00:00
max      2018-12-01 00:00:00
Name: issue_d, dtype: object

count                 557708
min      2015-12-01 00:00:00
max      2018-12-01 00:00:00
Name: issue_d, dtype: object

It appears that these are indeed newer metrics, their use only beginning in December 2015, but even after that point usage is spotty. I’m curious to see if these additional metrics would make a model more accurate, though, so once I’m done cleaning the data I’ll copy the rows with these new metrics into a new dataset and create another model using the new metrics.

看来这些确实是较新的指标，它们的使用仅在2015年12月开始，但即使在此之后，使用情况仍然参差不齐。我很好奇这些附加指标是否会使模型更准确，因此，一旦清理完数据，我会将具有这些新指标的行复制到新数据集中，并使用新指标创建另一个模型。

As for the derogatory/delinquency metrics, taking a cue from Michael Wurm, I’m going to take the inverse of all the “months since recent/last” fields, which will turn each into a proxy for the frequency of the event and also let me set all the null values (when an event has never happened) to 0. For the “months since oldest” fields, I’ll just set the null values to 0 and leave the rest untouched.

至于贬损/过失指标，从迈克尔·乌尔姆 ( Michael Wurm)那里得到提示，我将采用所有“自最近/最近以来的月份”字段的倒数，这将使每个字段都代表事件的发生频率，以及让我将所有空值(如果从未发生过事件)都设置为0。对于“从最旧的月份开始”字段，我将空值设置为0，其余的保持不变。

Now to look closer at joint loans.

现在来看一下联合贷款。

application_type
Individual    1322259
Joint App       25800
Name: application_type, dtype: int64

<class 'pandas.core.frame.DataFrame'>
Int64Index: 25800 entries, 2 to 2260663
Data columns (total 16 columns):
 #   Column                                   Non-Null Count  Dtype
---  ------                                   --------------  -----
 0   annual_inc_joint                         25800 non-null  float64
 1   dti_joint                                25797 non-null  float64
 2   verification_status_joint                25595 non-null  object
 3   revol_bal_joint                          18629 non-null  float64
 4   sec_app_fico_range_low                   18630 non-null  float64
 5   sec_app_fico_range_high                  18630 non-null  float64
 6   sec_app_earliest_cr_line                 18630 non-null  object
 7   sec_app_inq_last_6mths                   18630 non-null  float64
 8   sec_app_mort_acc                         18630 non-null  float64
 9   sec_app_open_acc                         18630 non-null  float64
 10  sec_app_revol_util                       18302 non-null  float64
 11  sec_app_open_act_il                      18630 non-null  float64
 12  sec_app_num_rev_accts                    18630 non-null  float64
 13  sec_app_chargeoff_within_12_mths         18630 non-null  float64
 14  sec_app_collections_12_mths_ex_med       18630 non-null  float64
 15  sec_app_inv_mths_since_last_major_derog  25800 non-null  float64
dtypes: float64(14), object(2)
memory usage: 3.3+ MB

It seems there may be a case of newer metrics for joint applications as well. I’ll investigate.

似乎也可能有一些针对联合应用的更新指标。我会调查

count                  18301
min      2017-03-01 00:00:00
max      2018-12-01 00:00:00
Name: issue_d, dtype: object

count                  18629
min      2017-03-01 00:00:00
max      2018-12-01 00:00:00
Name: issue_d, dtype: object

Newer than the previous set of new metrics, even — these didn’t start getting used till March 2017. Now I wonder when joint loans were first introduced.

甚至比以前的一组新指标都更新-这些指标直到2017年3月才开始使用。现在我想知道何时首次引入联合贷款。

count                  25800
min      2015-10-01 00:00:00
max      2018-12-01 00:00:00
Name: issue_d, dtype: object

2015. I think I’ll save the newer joint metrics for perhaps a third model, but I believe I can include annual_inc_joint, dti_joint, and verification_status_joint in the main model—I’ll just binary-encode application_type, and for individual applications I’ll set annual_inc_joint, dti_joint, and verification_status_joint equal to their non-joint counterparts.

2015年，我想我会保存新的联合度量也许是第三种模式，但我相信我可以包括annual_inc_joint ， dti_joint和verification_status_joint中的主力机型-我只是二进制编码application_type ，和应用程序的I”将annual_inc_joint ， dti_joint和verification_status_joint设置为它们的非联合副本。

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1348059 entries, 0 to 2260697
Data columns (total 97 columns):
 #   Column                                   Non-Null Count    Dtype
---  ------                                   --------------    -----
 0   loan_amnt                                1348059 non-null  float64
 1   term                                     1348059 non-null  object
 2   emp_length                               1269514 non-null  object
 3   home_ownership                           1348059 non-null  object
 4   annual_inc                               1348055 non-null  float64
 5   verification_status                      1348059 non-null  object
 6   issue_d                                  1348059 non-null  datetime64[ns]
 7   loan_status                              1348059 non-null  object
 8   purpose                                  1348059 non-null  object
 9   dti                                      1347685 non-null  float64
 10  delinq_2yrs                              1348030 non-null  float64
 11  earliest_cr_line                         1348030 non-null  object
 12  fico_range_low                           1348059 non-null  float64
 13  fico_range_high                          1348059 non-null  float64
 14  inq_last_6mths                           1348029 non-null  float64
 15  inv_mths_since_last_delinq               1348059 non-null  float64
 16  inv_mths_since_last_record               1348059 non-null  float64
 17  open_acc                                 1348030 non-null  float64
 18  pub_rec                                  1348030 non-null  float64
 19  revol_bal                                1348059 non-null  float64
 20  revol_util                               1347162 non-null  float64
 21  total_acc                                1348030 non-null  float64
 22  collections_12_mths_ex_med               1347914 non-null  float64
 23  inv_mths_since_last_major_derog          1348059 non-null  float64
 24  application_type                         1348059 non-null  object
 25  annual_inc_joint                         1348055 non-null  float64
 26  dti_joint                                1348056 non-null  float64
 27  verification_status_joint                1347854 non-null  object
 28  acc_now_delinq                           1348030 non-null  float64
 29  tot_coll_amt                             1277783 non-null  float64
 30  tot_cur_bal                              1277783 non-null  float64
 31  open_acc_6m                              537597 non-null   float64
 32  open_act_il                              537598 non-null   float64
 33  open_il_12m                              537598 non-null   float64
 34  open_il_24m                              537598 non-null   float64
 35  inv_mths_since_rcnt_il                   1348059 non-null  float64
 36  total_bal_il                             537598 non-null   float64
 37  il_util                                  465016 non-null   float64
 38  open_rv_12m                              537598 non-null   float64
 39  open_rv_24m                              537598 non-null   float64
 40  max_bal_bc                               537598 non-null   float64
 41  all_util                                 537545 non-null   float64
 42  total_rev_hi_lim                         1277783 non-null  float64
 43  inq_fi                                   537598 non-null   float64
 44  total_cu_tl                              537597 non-null   float64
 45  inq_last_12m                             537597 non-null   float64
 46  acc_open_past_24mths                     1298029 non-null  float64
 47  avg_cur_bal                              1277761 non-null  float64
 48  bc_open_to_buy                           1284167 non-null  float64
 49  bc_util                                  1283398 non-null  float64
 50  chargeoff_within_12_mths                 1347914 non-null  float64
 51  delinq_amnt                              1348030 non-null  float64
 52  mo_sin_old_il_acct                       1239735 non-null  float64
 53  mo_sin_old_rev_tl_op                     1277782 non-null  float64
 54  inv_mo_sin_rcnt_rev_tl_op                1348059 non-null  float64
 55  inv_mo_sin_rcnt_tl                       1348059 non-null  float64
 56  mort_acc                                 1298029 non-null  float64
 57  inv_mths_since_recent_bc                 1348059 non-null  float64
 58  inv_mths_since_recent_bc_dlq             1348059 non-null  float64
 59  inv_mths_since_recent_inq                1348059 non-null  float64
 60  inv_mths_since_recent_revol_delinq       1348059 non-null  float64
 61  num_accts_ever_120_pd                    1277783 non-null  float64
 62  num_actv_bc_tl                           1277783 non-null  float64
 63  num_actv_rev_tl                          1277783 non-null  float64
 64  num_bc_sats                              1289469 non-null  float64
 65  num_bc_tl                                1277783 non-null  float64
 66  num_il_tl                                1277783 non-null  float64
 67  num_op_rev_tl                            1277783 non-null  float64
 68  num_rev_accts                            1277782 non-null  float64
 69  num_rev_tl_bal_gt_0                      1277783 non-null  float64
 70  num_sats                                 1289469 non-null  float64
 71  num_tl_120dpd_2m                         1227909 non-null  float64
 72  num_tl_30dpd                             1277783 non-null  float64
 73  num_tl_90g_dpd_24m                       1277783 non-null  float64
 74  num_tl_op_past_12m                       1277783 non-null  float64
 75  pct_tl_nvr_dlq                           1277629 non-null  float64
 76  percent_bc_gt_75                         1283755 non-null  float64
 77  pub_rec_bankruptcies                     1346694 non-null  float64
 78  tax_liens                                1347954 non-null  float64
 79  tot_hi_cred_lim                          1277783 non-null  float64
 80  total_bal_ex_mort                        1298029 non-null  float64
 81  total_bc_limit                           1298029 non-null  float64
 82  total_il_high_credit_limit               1277783 non-null  float64
 83  revol_bal_joint                          18629 non-null    float64
 84  sec_app_fico_range_low                   18630 non-null    float64
 85  sec_app_fico_range_high                  18630 non-null    float64
 86  sec_app_earliest_cr_line                 18630 non-null    object
 87  sec_app_inq_last_6mths                   18630 non-null    float64
 88  sec_app_mort_acc                         18630 non-null    float64
 89  sec_app_open_acc                         18630 non-null    float64
 90  sec_app_revol_util                       18302 non-null    float64
 91  sec_app_open_act_il                      18630 non-null    float64
 92  sec_app_num_rev_accts                    18630 non-null    float64
 93  sec_app_chargeoff_within_12_mths         18630 non-null    float64
 94  sec_app_collections_12_mths_ex_med       18630 non-null    float64
 95  sec_app_inv_mths_since_last_major_derog  1348059 non-null  float64
 96  fraction_recovered                       1348059 non-null  float64
dtypes: datetime64[ns](1), float64(86), object(10)
memory usage: 1007.9+ MB

Now the only remaining steps should be removing rows with null values (in columns that aren’t new metrics) and encoding categorical features.

现在，剩下的唯一步骤应该是删除具有空值的行(在不是新指标的列中)并编码分类特征。

I’m removing rows with null values in those columns because that should still leave the vast majority of rows intact, over 1 million, which is still plenty of data. But I guess I should make sure before I overwrite loans.

我要删除这些列中具有空值的行，因为那仍应保持绝大多数行(超过100万行)的完好无损，而这仍然是大量数据。但是我想我应该确保在覆盖loans之前。

(1110171, 97)

Yes, still 1,110,171. That’ll do.

是的，仍然是1,110,171。会的

Then actually I’ll tackle earliest_cr_line and its joint counterpart first before looking at the categorical features.

然后，实际上，在查看分类特征之前，我将先解决earliest_cr_line及其联合副本。

1110171 rows × 2 columns

I should convert that to the age of the credit line at the time of application (or the time of loan issuing, more precisely).

我应该将其转换为申请时(或更确切地说，发放贷款时)的信贷额度。

0          148
1          192
2          184
4          210
5          338
          ...
2260688    147
2260690    175
2260691     64
2260692    230
2260697    207
Length: 1110171, dtype: int64

Now a look at those categorical features.

现在看一下这些分类功能。

term
36 months    831601
60 months    278570
Name: term, dtype: int64emp_length
1 year        76868
10+ years    392883
2 years      106124
3 years       93784
4 years       69031
5 years       72421
6 years       54240
7 years       52229
8 years       53826
9 years       45210
< 1 year      93555
Name: emp_length, dtype: int64home_ownership
ANY            250
MORTGAGE    559035
NONE            39
OTHER           40
OWN         114577
RENT        436230
Name: home_ownership, dtype: int64verification_status
Not Verified       335350
Source Verified    463153
Verified           311668
Name: verification_status, dtype: int64purpose
car                    10754
credit_card           245942
debt_consolidation    653222
educational                1
home_improvement       71089
house                   5720
major_purchase         22901
medical                12302
moving                  7464
other                  60986
renewable_energy         691
small_business         11137
vacation                7169
wedding                  793
Name: purpose, dtype: int64verification_status_joint
Not Verified       341073
Source Verified    461941
Verified           307157
Name: verification_status_joint, dtype: int64

First, in researching income verification, I learned that LendingClub only tries to verify income on a subset of loan applications based on the content of the application, so this feature is a source of target leakage. I’ll remove the two offending columns (and a couple more I don’t need anymore).

首先，在研究收入验证时，我了解到LendingClub仅尝试根据应用程序的内容来验证一部分贷款应用程序的收入，因此此功能是目标泄漏的根源。我将删除两个有问题的列(还有一些我不再需要的列)。

Once I create my pipeline, I’ll binary encode term, one-hot encode home_ownership and purpose, and since emp_length is an ordinal variable, I’ll convert it to the integers 0–10.

一旦创建我的管道，我会二进制编码term ，独热编码home_ownership和purpose ，而且由于emp_length是一个序变量，我将其转换为整数0-10。

That should cover all the cleaning necessary for the first model’s data. I’ll save the columns that’ll be used in the first model to a new DataFrame, and while I’m at it, I’ll start formatting the DataFrames for the two additional models adding the two sets of new metrics.

这应该包括对第一个模型的数据进行的所有必要清洁。我将在第一个模型中使用的列保存到新的DataFrame中，当我使用它时，我将开始为两个附加模型设置DataFrames格式，并添加两组新指标。

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1110171 entries, 0 to 2260697
Data columns (total 80 columns):
 #   Column                              Non-Null Count    Dtype
---  ------                              --------------    -----
 0   loan_amnt                           1110171 non-null  float64
 1   term                                1110171 non-null  object
 2   emp_length                          1110171 non-null  object
 3   home_ownership                      1110171 non-null  object
 4   annual_inc                          1110171 non-null  float64
 5   purpose                             1110171 non-null  object
 6   dti                                 1110171 non-null  float64
 7   delinq_2yrs                         1110171 non-null  float64
 8   cr_hist_age_mths                    1110171 non-null  int64
 9   fico_range_low                      1110171 non-null  float64
 10  fico_range_high                     1110171 non-null  float64
 11  inq_last_6mths                      1110171 non-null  float64
 12  inv_mths_since_last_delinq          1110171 non-null  float64
 13  inv_mths_since_last_record          1110171 non-null  float64
 14  open_acc                            1110171 non-null  float64
 15  pub_rec                             1110171 non-null  float64
 16  revol_bal                           1110171 non-null  float64
 17  revol_util                          1110171 non-null  float64
 18  total_acc                           1110171 non-null  float64
 19  collections_12_mths_ex_med          1110171 non-null  float64
 20  inv_mths_since_last_major_derog     1110171 non-null  float64
 21  application_type                    1110171 non-null  object
 22  annual_inc_joint                    1110171 non-null  float64
 23  dti_joint                           1110171 non-null  float64
 24  acc_now_delinq                      1110171 non-null  float64
 25  tot_coll_amt                        1110171 non-null  float64
 26  tot_cur_bal                         1110171 non-null  float64
 27  open_acc_6m                         459541 non-null   float64
 28  open_act_il                         459541 non-null   float64
 29  open_il_12m                         459541 non-null   float64
 30  open_il_24m                         459541 non-null   float64
 31  inv_mths_since_rcnt_il              1110171 non-null  float64
 32  total_bal_il                        459541 non-null   float64
 33  il_util                             408722 non-null   float64
 34  open_rv_12m                         459541 non-null   float64
 35  open_rv_24m                         459541 non-null   float64
 36  max_bal_bc                          459541 non-null   float64
 37  all_util                            459541 non-null   float64
 38  total_rev_hi_lim                    1110171 non-null  float64
 39  inq_fi                              459541 non-null   float64
 40  total_cu_tl                         459541 non-null   float64
 41  inq_last_12m                        459541 non-null   float64
 42  acc_open_past_24mths                1110171 non-null  float64
 43  avg_cur_bal                         1110171 non-null  float64
 44  bc_open_to_buy                      1110171 non-null  float64
 45  bc_util                             1110171 non-null  float64
 46  chargeoff_within_12_mths            1110171 non-null  float64
 47  delinq_amnt                         1110171 non-null  float64
 48  mo_sin_old_il_acct                  1110171 non-null  float64
 49  mo_sin_old_rev_tl_op                1110171 non-null  float64
 50  inv_mo_sin_rcnt_rev_tl_op           1110171 non-null  float64
 51  inv_mo_sin_rcnt_tl                  1110171 non-null  float64
 52  mort_acc                            1110171 non-null  float64
 53  inv_mths_since_recent_bc            1110171 non-null  float64
 54  inv_mths_since_recent_bc_dlq        1110171 non-null  float64
 55  inv_mths_since_recent_inq           1110171 non-null  float64
 56  inv_mths_since_recent_revol_delinq  1110171 non-null  float64
 57  num_accts_ever_120_pd               1110171 non-null  float64
 58  num_actv_bc_tl                      1110171 non-null  float64
 59  num_actv_rev_tl                     1110171 non-null  float64
 60  num_bc_sats                         1110171 non-null  float64
 61  num_bc_tl                           1110171 non-null  float64
 62  num_il_tl                           1110171 non-null  float64
 63  num_op_rev_tl                       1110171 non-null  float64
 64  num_rev_accts                       1110171 non-null  float64
 65  num_rev_tl_bal_gt_0                 1110171 non-null  float64
 66  num_sats                            1110171 non-null  float64
 67  num_tl_120dpd_2m                    1110171 non-null  float64
 68  num_tl_30dpd                        1110171 non-null  float64
 69  num_tl_90g_dpd_24m                  1110171 non-null  float64
 70  num_tl_op_past_12m                  1110171 non-null  float64
 71  pct_tl_nvr_dlq                      1110171 non-null  float64
 72  percent_bc_gt_75                    1110171 non-null  float64
 73  pub_rec_bankruptcies                1110171 non-null  float64
 74  tax_liens                           1110171 non-null  float64
 75  tot_hi_cred_lim                     1110171 non-null  float64
 76  total_bal_ex_mort                   1110171 non-null  float64
 77  total_bc_limit                      1110171 non-null  float64
 78  total_il_high_credit_limit          1110171 non-null  float64
 79  fraction_recovered                  1110171 non-null  float64
dtypes: float64(74), int64(1), object(5)
memory usage: 686.1+ MB

Before I drop a bunch of rows with nulls from loans_2, I’m concerned about il_util, as it’s missing values in about 50,000 more rows than the rest of the new metric columns. Why would that be?

在我从loans_2删除一堆行中包含空值的行loans_2 ，我担心il_util ，因为它比新指标列中的剩余行缺少50,000多行值。为什么会这样呢？

count    408722.000000
mean         71.832894
std          22.311439
min           0.000000
25%          59.000000
50%          75.000000
75%          87.000000
max         464.000000
Name: il_util, dtype: float64

Peeking back up to the data dictionary, il_util is the “ratio of total current balance to high credit/credit limit on all install acct”. The relevant balance (total_bal_il) and credit limit (total_il_high_credit_limit) metrics appear to already be in the data, so perhaps this utilization metric doesn’t contain any new information. I’ll compare il_util (where it’s present) to the ratio of the other two variables.

回顾数据字典， il_util是“当前总余额与所有安装帐户上的最高信用/信用额度之比”。相关的余额( total_bal_il )和信用额度( total_il_high_credit_limit )度量似乎已经存在于数据中，因此该利用率度量可能不包含任何新信息。我将把il_util (如果有的话)与其他两个变量的比率进行比较。

408722 rows × 2 columns

count     408722
unique         2
top         True
freq      307589
dtype: object

count    101133.000000
mean         14.638684
std          16.409913
min           1.000000
25%           3.000000
50%          10.000000
75%          21.000000
max        1108.000000
Name: compute_diff, dtype: float64

That’s weird. il_util is equal to the computed ratio three-quarters of the time, but when it’s off, the median difference is 10 points off. Perhaps there’s new information there sometimes after all. Maybe whatever credit bureau is reporting the utilization rate uses a different formula than just a simple ratio? Again, something I could ask if I were performing this analysis for a client, but that’s not the case. I’ll assume that this variable is still valuable, and where il_util is null I’ll impute the value to make it equal to the ratio of total_bal_il to total_il_high_credit_limit (or 0 if the limit is 0). And I’ll add one more boolean field to mark the imputed entries.

那真是怪了。 il_util等于四分之三的时间所计算的比率，但是当它关闭时，中位数差异减少了10点。也许有时毕竟那里有新信息。也许任何征信机构都在报告利用率不只是简单比率而是使用不同的公式？再次，我可能会问我是否正在为客户执行此分析，但是事实并非如此。我假设该变量仍然有价值，并且在il_util为null的情况下，我将il_util该值使其等于total_bal_il与total_il_high_credit_limit的比率(如果限制为0，则为0)。我将再添加一个布尔字段来标记估算的条目。

Also, that 1,108 is a doozy of an outlier, but I think I’ll just leave it be, as it appears that outliers aren’t too big a deal if the neural network architecture is sufficiently deep.

另外，这1,108个数字是一个离群值的杂项，但我想我会保留它，因为如果神经网络架构足够深，离群值似乎并不太重要。

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1110171 entries, 0 to 2260697
Data columns (total 81 columns):
 #   Column                              Non-Null Count    Dtype
---  ------                              --------------    -----
 0   loan_amnt                           1110171 non-null  float64
 1   term                                1110171 non-null  object
 2   emp_length                          1110171 non-null  object
 3   home_ownership                      1110171 non-null  object
 4   annual_inc                          1110171 non-null  float64
 5   purpose                             1110171 non-null  object
 6   dti                                 1110171 non-null  float64
 7   delinq_2yrs                         1110171 non-null  float64
 8   cr_hist_age_mths                    1110171 non-null  int64
 9   fico_range_low                      1110171 non-null  float64
 10  fico_range_high                     1110171 non-null  float64
 11  inq_last_6mths                      1110171 non-null  float64
 12  inv_mths_since_last_delinq          1110171 non-null  float64
 13  inv_mths_since_last_record          1110171 non-null  float64
 14  open_acc                            1110171 non-null  float64
 15  pub_rec                             1110171 non-null  float64
 16  revol_bal                           1110171 non-null  float64
 17  revol_util                          1110171 non-null  float64
 18  total_acc                           1110171 non-null  float64
 19  collections_12_mths_ex_med          1110171 non-null  float64
 20  inv_mths_since_last_major_derog     1110171 non-null  float64
 21  application_type                    1110171 non-null  object
 22  annual_inc_joint                    1110171 non-null  float64
 23  dti_joint                           1110171 non-null  float64
 24  acc_now_delinq                      1110171 non-null  float64
 25  tot_coll_amt                        1110171 non-null  float64
 26  tot_cur_bal                         1110171 non-null  float64
 27  open_acc_6m                         459541 non-null   float64
 28  open_act_il                         459541 non-null   float64
 29  open_il_12m                         459541 non-null   float64
 30  open_il_24m                         459541 non-null   float64
 31  inv_mths_since_rcnt_il              1110171 non-null  float64
 32  total_bal_il                        459541 non-null   float64
 33  il_util                             459541 non-null   float64
 34  open_rv_12m                         459541 non-null   float64
 35  open_rv_24m                         459541 non-null   float64
 36  max_bal_bc                          459541 non-null   float64
 37  all_util                            459541 non-null   float64
 38  total_rev_hi_lim                    1110171 non-null  float64
 39  inq_fi                              459541 non-null   float64
 40  total_cu_tl                         459541 non-null   float64
 41  inq_last_12m                        459541 non-null   float64
 42  acc_open_past_24mths                1110171 non-null  float64
 43  avg_cur_bal                         1110171 non-null  float64
 44  bc_open_to_buy                      1110171 non-null  float64
 45  bc_util                             1110171 non-null  float64
 46  chargeoff_within_12_mths            1110171 non-null  float64
 47  delinq_amnt                         1110171 non-null  float64
 48  mo_sin_old_il_acct                  1110171 non-null  float64
 49  mo_sin_old_rev_tl_op                1110171 non-null  float64
 50  inv_mo_sin_rcnt_rev_tl_op           1110171 non-null  float64
 51  inv_mo_sin_rcnt_tl                  1110171 non-null  float64
 52  mort_acc                            1110171 non-null  float64
 53  inv_mths_since_recent_bc            1110171 non-null  float64
 54  inv_mths_since_recent_bc_dlq        1110171 non-null  float64
 55  inv_mths_since_recent_inq           1110171 non-null  float64
 56  inv_mths_since_recent_revol_delinq  1110171 non-null  float64
 57  num_accts_ever_120_pd               1110171 non-null  float64
 58  num_actv_bc_tl                      1110171 non-null  float64
 59  num_actv_rev_tl                     1110171 non-null  float64
 60  num_bc_sats                         1110171 non-null  float64
 61  num_bc_tl                           1110171 non-null  float64
 62  num_il_tl                           1110171 non-null  float64
 63  num_op_rev_tl                       1110171 non-null  float64
 64  num_rev_accts                       1110171 non-null  float64
 65  num_rev_tl_bal_gt_0                 1110171 non-null  float64
 66  num_sats                            1110171 non-null  float64
 67  num_tl_120dpd_2m                    1110171 non-null  float64
 68  num_tl_30dpd                        1110171 non-null  float64
 69  num_tl_90g_dpd_24m                  1110171 non-null  float64
 70  num_tl_op_past_12m                  1110171 non-null  float64
 71  pct_tl_nvr_dlq                      1110171 non-null  float64
 72  percent_bc_gt_75                    1110171 non-null  float64
 73  pub_rec_bankruptcies                1110171 non-null  float64
 74  tax_liens                           1110171 non-null  float64
 75  tot_hi_cred_lim                     1110171 non-null  float64
 76  total_bal_ex_mort                   1110171 non-null  float64
 77  total_bc_limit                      1110171 non-null  float64
 78  total_il_high_credit_limit          1110171 non-null  float64
 79  fraction_recovered                  1110171 non-null  float64
 80  il_util_imputed                     1110171 non-null  bool
dtypes: bool(1), float64(74), int64(1), object(5)
memory usage: 687.1+ MB

Good. Ready to drop rows with nulls in loans_2 and move on to the DataFrame for the model that adds the new metrics for joint applications.

好。准备删除loans_2具有空值的行，然后转到该模型的DataFrame，该模型为联合应用程序添加了新指标。

<class 'pandas.core.frame.DataFrame'>
Int64Index: 14453 entries, 421222 to 2157147
Data columns (total 94 columns):
 #   Column                                   Non-Null Count  Dtype
---  ------                                   --------------  -----
 0   loan_amnt                                14453 non-null  float64
 1   term                                     14453 non-null  object
 2   emp_length                               14453 non-null  object
 3   home_ownership                           14453 non-null  object
 4   annual_inc                               14453 non-null  float64
 5   purpose                                  14453 non-null  object
 6   dti                                      14453 non-null  float64
 7   delinq_2yrs                              14453 non-null  float64
 8   cr_hist_age_mths                         14453 non-null  int64
 9   fico_range_low                           14453 non-null  float64
 10  fico_range_high                          14453 non-null  float64
 11  inq_last_6mths                           14453 non-null  float64
 12  inv_mths_since_last_delinq               14453 non-null  float64
 13  inv_mths_since_last_record               14453 non-null  float64
 14  open_acc                                 14453 non-null  float64
 15  pub_rec                                  14453 non-null  float64
 16  revol_bal                                14453 non-null  float64
 17  revol_util                               14453 non-null  float64
 18  total_acc                                14453 non-null  float64
 19  collections_12_mths_ex_med               14453 non-null  float64
 20  inv_mths_since_last_major_derog          14453 non-null  float64
 21  application_type                         14453 non-null  object
 22  annual_inc_joint                         14453 non-null  float64
 23  dti_joint                                14453 non-null  float64
 24  acc_now_delinq                           14453 non-null  float64
 25  tot_coll_amt                             14453 non-null  float64
 26  tot_cur_bal                              14453 non-null  float64
 27  open_acc_6m                              14453 non-null  float64
 28  open_act_il                              14453 non-null  float64
 29  open_il_12m                              14453 non-null  float64
 30  open_il_24m                              14453 non-null  float64
 31  inv_mths_since_rcnt_il                   14453 non-null  float64
 32  total_bal_il                             14453 non-null  float64
 33  il_util                                  14453 non-null  float64
 34  open_rv_12m                              14453 non-null  float64
 35  open_rv_24m                              14453 non-null  float64
 36  max_bal_bc                               14453 non-null  float64
 37  all_util                                 14453 non-null  float64
 38  total_rev_hi_lim                         14453 non-null  float64
 39  inq_fi                                   14453 non-null  float64
 40  total_cu_tl                              14453 non-null  float64
 41  inq_last_12m                             14453 non-null  float64
 42  acc_open_past_24mths                     14453 non-null  float64
 43  avg_cur_bal                              14453 non-null  float64
 44  bc_open_to_buy                           14453 non-null  float64
 45  bc_util                                  14453 non-null  float64
 46  chargeoff_within_12_mths                 14453 non-null  float64
 47  delinq_amnt                              14453 non-null  float64
 48  mo_sin_old_il_acct                       14453 non-null  float64
 49  mo_sin_old_rev_tl_op                     14453 non-null  float64
 50  inv_mo_sin_rcnt_rev_tl_op                14453 non-null  float64
 51  inv_mo_sin_rcnt_tl                       14453 non-null  float64
 52  mort_acc                                 14453 non-null  float64
 53  inv_mths_since_recent_bc                 14453 non-null  float64
 54  inv_mths_since_recent_bc_dlq             14453 non-null  float64
 55  inv_mths_since_recent_inq                14453 non-null  float64
 56  inv_mths_since_recent_revol_delinq       14453 non-null  float64
 57  num_accts_ever_120_pd                    14453 non-null  float64
 58  num_actv_bc_tl                           14453 non-null  float64
 59  num_actv_rev_tl                          14453 non-null  float64
 60  num_bc_sats                              14453 non-null  float64
 61  num_bc_tl                                14453 non-null  float64
 62  num_il_tl                                14453 non-null  float64
 63  num_op_rev_tl                            14453 non-null  float64
 64  num_rev_accts                            14453 non-null  float64
 65  num_rev_tl_bal_gt_0                      14453 non-null  float64
 66  num_sats                                 14453 non-null  float64
 67  num_tl_120dpd_2m                         14453 non-null  float64
 68  num_tl_30dpd                             14453 non-null  float64
 69  num_tl_90g_dpd_24m                       14453 non-null  float64
 70  num_tl_op_past_12m                       14453 non-null  float64
 71  pct_tl_nvr_dlq                           14453 non-null  float64
 72  percent_bc_gt_75                         14453 non-null  float64
 73  pub_rec_bankruptcies                     14453 non-null  float64
 74  tax_liens                                14453 non-null  float64
 75  tot_hi_cred_lim                          14453 non-null  float64
 76  total_bal_ex_mort                        14453 non-null  float64
 77  total_bc_limit                           14453 non-null  float64
 78  total_il_high_credit_limit               14453 non-null  float64
 79  revol_bal_joint                          14453 non-null  float64
 80  sec_app_fico_range_low                   14453 non-null  float64
 81  sec_app_fico_range_high                  14453 non-null  float64
 82  sec_app_cr_hist_age_mths                 14453 non-null  Int64
 83  sec_app_inq_last_6mths                   14453 non-null  float64
 84  sec_app_mort_acc                         14453 non-null  float64
 85  sec_app_open_acc                         14453 non-null  float64
 86  sec_app_revol_util                       14453 non-null  float64
 87  sec_app_open_act_il                      14453 non-null  float64
 88  sec_app_num_rev_accts                    14453 non-null  float64
 89  sec_app_chargeoff_within_12_mths         14453 non-null  float64
 90  sec_app_collections_12_mths_ex_med       14453 non-null  float64
 91  sec_app_inv_mths_since_last_major_derog  14453 non-null  float64
 92  fraction_recovered                       14453 non-null  float64
 93  il_util_imputed                          14453 non-null  bool
dtypes: Int64(1), bool(1), float64(86), int64(1), object(5)
memory usage: 10.4+ MB

Phew, the data’s all clean now! Time for the fun part.

ew，数据现在全部干净了！时间是有趣的部分。

建立神经网络 (Building the neural networks)

After a good deal of trial and error, I found that a network architecture with three hidden layers, each followed by a dropout layer of rate 0.3, was as good as I could find. I used ReLU activation in those hidden layers, and adam optimization and a loss metric of mean squared error in the model as a whole. I tried using mean absolute error at first, but then I found that the resulting model would essentially always guess either 1 or 0 for the output, and the majority of the dataset’s output is 1. Therefore, larger errors needed to be penalized to a greater degree, which is what mean squared error is good at.

经过大量的反复试验，我发现一个网络体系结构具有三个隐藏层，每个层次后面紧跟着一个速率为0.3的退出层，这是我所能找到的。我在那些隐藏层中使用了ReLU激活，在整个模型中使用了adam优化和均方误差的损失度量。我最初尝试使用平均绝对误差，但随后发现结果模型实际上总是会为输出猜测1或0，并且数据集的大部分输出为1。因此，较大的误差需要受到较大的惩罚。度，即平方误差最擅长的。

The dataset being so large, I had great results increasing the batch size for the first couple models.

数据集是如此之大，对于增加前几个模型的批处理量，我取得了很好的结果。

Model 1:
Epoch 1/100
6939/6939 [==============================] - 17s 3ms/step - loss: 0.0738 - val_loss: 0.0601
Epoch 2/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0600 - val_loss: 0.0597
Epoch 3/100
6939/6939 [==============================] - 18s 3ms/step - loss: 0.0595 - val_loss: 0.0592
Epoch 4/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0594 - val_loss: 0.0589
Epoch 5/100
6939/6939 [==============================] - 18s 3ms/step - loss: 0.0593 - val_loss: 0.0597
Epoch 6/100
6939/6939 [==============================] - 17s 3ms/step - loss: 0.0593 - val_loss: 0.0591
Epoch 7/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0592 - val_loss: 0.0591
Epoch 8/100
6939/6939 [==============================] - 17s 3ms/step - loss: 0.0591 - val_loss: 0.0597
Epoch 9/100
6939/6939 [==============================] - 17s 3ms/step - loss: 0.0591 - val_loss: 0.0588
Epoch 10/100
6939/6939 [==============================] - 18s 3ms/step - loss: 0.0591 - val_loss: 0.0589
Epoch 11/100
6939/6939 [==============================] - 18s 3ms/step - loss: 0.0591 - val_loss: 0.0585
Epoch 12/100
6939/6939 [==============================] - 17s 3ms/step - loss: 0.0590 - val_loss: 0.0586
Epoch 13/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0590 - val_loss: 0.0587
Epoch 14/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0590 - val_loss: 0.0591
Epoch 15/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0590 - val_loss: 0.0588
Epoch 16/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0590 - val_loss: 0.0589
Epoch 17/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0590 - val_loss: 0.0584
Epoch 18/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0590 - val_loss: 0.0591
Epoch 19/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0590 - val_loss: 0.0586
Epoch 20/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0587
Epoch 21/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0590 - val_loss: 0.0585
Epoch 22/100
6939/6939 [==============================] - 16s 2ms/step - loss: 0.0589 - val_loss: 0.0583
Epoch 23/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0588
Epoch 24/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0587
Epoch 25/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0587
Epoch 26/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0587
Epoch 27/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0589
Epoch 28/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0582
Epoch 29/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0590
Epoch 30/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0589
Epoch 31/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0588
Epoch 32/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0584
Epoch 33/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0587
Epoch 34/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0584
Epoch 35/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0588
Epoch 36/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0589
Epoch 37/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 38/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0586
Epoch 39/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0586
Epoch 40/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 41/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0589
Epoch 42/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0587
Epoch 43/100
6939/6939 [==============================] - 16s 2ms/step - loss: 0.0589 - val_loss: 0.0588
Epoch 44/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0589
Epoch 45/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0593
Epoch 46/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0587
Epoch 47/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0585
Epoch 48/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0590
Epoch 49/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0586
Epoch 50/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0587
Epoch 51/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0588
Epoch 52/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0586
Epoch 53/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0586
Epoch 54/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0593
Epoch 55/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0589
Epoch 56/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0588
Epoch 57/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 58/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0588
Epoch 59/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 60/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0589 - val_loss: 0.0584
Epoch 61/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 62/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0589
Epoch 63/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0584
Epoch 64/100
6939/6939 [==============================] - 16s 2ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 65/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0586
Epoch 66/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0588
Epoch 67/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0596
Epoch 68/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 69/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 70/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0591
Epoch 71/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0591
Epoch 72/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0590
Epoch 73/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0590
Epoch 74/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0586
Epoch 75/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 76/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0591
Epoch 77/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0591
Epoch 78/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0589
Epoch 79/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0586
Epoch 80/100
6939/6939 [==============================] - 15s 2ms/step - loss: 0.0588 - val_loss: 0.0588
Epoch 81/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 82/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0588 - val_loss: 0.0584
Epoch 83/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0588 - val_loss: 0.0592
Epoch 84/100
6939/6939 [==============================] - 18s 3ms/step - loss: 0.0588 - val_loss: 0.0589
Epoch 85/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0588 - val_loss: 0.0592
Epoch 86/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0588 - val_loss: 0.0586
Epoch 87/100
6939/6939 [==============================] - 18s 3ms/step - loss: 0.0588 - val_loss: 0.0584
Epoch 88/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0588 - val_loss: 0.0594
Epoch 89/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0588 - val_loss: 0.0590
Epoch 90/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0588 - val_loss: 0.0586
Epoch 91/100
6939/6939 [==============================] - 17s 3ms/step - loss: 0.0588 - val_loss: 0.0590
Epoch 92/100
6939/6939 [==============================] - 17s 3ms/step - loss: 0.0588 - val_loss: 0.0590
Epoch 93/100
6939/6939 [==============================] - 17s 2ms/step - loss: 0.0588 - val_loss: 0.0585
Epoch 94/100
6939/6939 [==============================] - 18s 3ms/step - loss: 0.0588 - val_loss: 0.0594
Epoch 95/100
6939/6939 [==============================] - 18s 3ms/step - loss: 0.0588 - val_loss: 0.0587
Epoch 96/100
6939/6939 [==============================] - 19s 3ms/step - loss: 0.0588 - val_loss: 0.0593
Epoch 97/100
6939/6939 [==============================] - 21s 3ms/step - loss: 0.0588 - val_loss: 0.0584
Epoch 98/100
6939/6939 [==============================] - 20s 3ms/step - loss: 0.0588 - val_loss: 0.0589
Epoch 99/100
6939/6939 [==============================] - 19s 3ms/step - loss: 0.0588 - val_loss: 0.0588
Epoch 100/100
6939/6939 [==============================] - 18s 3ms/step - loss: 0.0588 - val_loss: 0.0590
Model 2:
Epoch 1/100
5745/5745 [==============================] - 14s 2ms/step - loss: 0.1028 - val_loss: 0.0762
Epoch 2/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0757 - val_loss: 0.0740
Epoch 3/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0748 - val_loss: 0.0730
Epoch 4/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0743 - val_loss: 0.0734
Epoch 5/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0741 - val_loss: 0.0733
Epoch 6/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0740 - val_loss: 0.0730
Epoch 7/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0739 - val_loss: 0.0729
Epoch 8/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0738 - val_loss: 0.0732
Epoch 9/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0737 - val_loss: 0.0727
Epoch 10/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0736 - val_loss: 0.0733
Epoch 11/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0736 - val_loss: 0.0725
Epoch 12/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0736 - val_loss: 0.0726
Epoch 13/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0734 - val_loss: 0.0725
Epoch 14/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0735 - val_loss: 0.0726
Epoch 15/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0734 - val_loss: 0.0732
Epoch 16/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0734 - val_loss: 0.0726
Epoch 17/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0734 - val_loss: 0.0726
Epoch 18/100
5745/5745 [==============================] - 14s 2ms/step - loss: 0.0734 - val_loss: 0.0726
Epoch 19/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0733 - val_loss: 0.0732
Epoch 20/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0733 - val_loss: 0.0730
Epoch 21/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0733 - val_loss: 0.0725
Epoch 22/100
5745/5745 [==============================] - 14s 2ms/step - loss: 0.0732 - val_loss: 0.0726
Epoch 23/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0731 - val_loss: 0.0726
Epoch 24/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0732 - val_loss: 0.0725
Epoch 25/100
5745/5745 [==============================] - 14s 2ms/step - loss: 0.0731 - val_loss: 0.0727
Epoch 26/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0732 - val_loss: 0.0730
Epoch 27/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0732 - val_loss: 0.0725
Epoch 28/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0731 - val_loss: 0.0724
Epoch 29/100
5745/5745 [==============================] - 13s 2ms/step - loss: 0.0731 - val_loss: 0.0731
Epoch 30/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0731 - val_loss: 0.0725
Epoch 31/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0731 - val_loss: 0.0727
Epoch 32/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0725
Epoch 33/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0724
Epoch 34/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0727
Epoch 35/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0729
Epoch 36/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0723
Epoch 37/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0724
Epoch 38/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0729
Epoch 39/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0725
Epoch 40/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0723
Epoch 41/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0722
Epoch 42/100
5745/5745 [==============================] - 12s 2ms/step - loss: 0.0729 - val_loss: 0.0723
Epoch 43/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0728
Epoch 44/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0731 - val_loss: 0.0725
Epoch 45/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0725
Epoch 46/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0730
Epoch 47/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0727
Epoch 48/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0725
Epoch 49/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0727
Epoch 50/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0728
Epoch 51/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0724
Epoch 52/100
5745/5745 [==============================] - 12s 2ms/step - loss: 0.0729 - val_loss: 0.0724
Epoch 53/100
5745/5745 [==============================] - 12s 2ms/step - loss: 0.0729 - val_loss: 0.0730
Epoch 54/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0724
Epoch 55/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0724
Epoch 56/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0726
Epoch 57/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0725
Epoch 58/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0725
Epoch 59/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0727
Epoch 60/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0726
Epoch 61/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0728
Epoch 62/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0726
Epoch 63/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0725
Epoch 64/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0724
Epoch 65/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0724
Epoch 66/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0730
Epoch 67/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0726
Epoch 68/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0724
Epoch 69/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0727
Epoch 70/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0734
Epoch 71/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0729
Epoch 72/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0727
Epoch 73/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0727
Epoch 74/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0727
Epoch 75/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0728
Epoch 76/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0726
Epoch 77/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0726
Epoch 78/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0727 - val_loss: 0.0726
Epoch 79/100
5745/5745 [==============================] - 12s 2ms/step - loss: 0.0728 - val_loss: 0.0725
Epoch 80/100
5745/5745 [==============================] - 12s 2ms/step - loss: 0.0728 - val_loss: 0.0725
Epoch 81/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0727 - val_loss: 0.0728
Epoch 82/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0727 - val_loss: 0.0726
Epoch 83/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0727 - val_loss: 0.0727
Epoch 84/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0729
Epoch 85/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0728
Epoch 86/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0727 - val_loss: 0.0727
Epoch 87/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0727 - val_loss: 0.0730
Epoch 88/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0726 - val_loss: 0.0727
Epoch 89/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0726
Epoch 90/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0726
Epoch 91/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0730 - val_loss: 0.0728
Epoch 92/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0728
Epoch 93/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0729
Epoch 94/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0729
Epoch 95/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0727 - val_loss: 0.0727
Epoch 96/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0729 - val_loss: 0.0728
Epoch 97/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0727
Epoch 98/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0727 - val_loss: 0.0732
Epoch 99/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0727
Epoch 100/100
5745/5745 [==============================] - 11s 2ms/step - loss: 0.0728 - val_loss: 0.0729
Model 3:
Epoch 1/100
362/362 [==============================] - 1s 2ms/step - loss: 0.3603 - val_loss: 0.2006
Epoch 2/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1843 - val_loss: 0.1489
Epoch 3/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1386 - val_loss: 0.1311
Epoch 4/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1239 - val_loss: 0.1226
Epoch 5/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1173 - val_loss: 0.1181
Epoch 6/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1144 - val_loss: 0.1170
Epoch 7/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1132 - val_loss: 0.1163
Epoch 8/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1112 - val_loss: 0.1164
Epoch 9/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1105 - val_loss: 0.1139
Epoch 10/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1088 - val_loss: 0.1120
Epoch 11/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1087 - val_loss: 0.1118
Epoch 12/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1070 - val_loss: 0.1114
Epoch 13/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1059 - val_loss: 0.1116
Epoch 14/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1043 - val_loss: 0.1111
Epoch 15/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1036 - val_loss: 0.1103
Epoch 16/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1030 - val_loss: 0.1102
Epoch 17/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1024 - val_loss: 0.1098
Epoch 18/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1018 - val_loss: 0.1095
Epoch 19/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1014 - val_loss: 0.1086
Epoch 20/100
362/362 [==============================] - 1s 2ms/step - loss: 0.1005 - val_loss: 0.1086
Epoch 21/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0997 - val_loss: 0.1095
Epoch 22/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0993 - val_loss: 0.1092
Epoch 23/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0986 - val_loss: 0.1090
Epoch 24/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0983 - val_loss: 0.1096
Epoch 25/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0975 - val_loss: 0.1099
Epoch 26/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0964 - val_loss: 0.1092
Epoch 27/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0968 - val_loss: 0.1092
Epoch 28/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0960 - val_loss: 0.1093
Epoch 29/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0954 - val_loss: 0.1100
Epoch 30/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0952 - val_loss: 0.1096
Epoch 31/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0946 - val_loss: 0.1105
Epoch 32/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0942 - val_loss: 0.1109
Epoch 33/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0930 - val_loss: 0.1103
Epoch 34/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0917 - val_loss: 0.1103
Epoch 35/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0908 - val_loss: 0.1112
Epoch 36/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0922 - val_loss: 0.1107
Epoch 37/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0918 - val_loss: 0.1117
Epoch 38/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0910 - val_loss: 0.1111
Epoch 39/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0910 - val_loss: 0.1118
Epoch 40/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0897 - val_loss: 0.1126
Epoch 41/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0884 - val_loss: 0.1128
Epoch 42/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0890 - val_loss: 0.1121
Epoch 43/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0893 - val_loss: 0.1118
Epoch 44/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0877 - val_loss: 0.1122
Epoch 45/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0874 - val_loss: 0.1121
Epoch 46/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0864 - val_loss: 0.1119
Epoch 47/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0873 - val_loss: 0.1128
Epoch 48/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0858 - val_loss: 0.1126
Epoch 49/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0872 - val_loss: 0.1128
Epoch 50/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0852 - val_loss: 0.1133
Epoch 51/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0857 - val_loss: 0.1137
Epoch 52/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0848 - val_loss: 0.1142
Epoch 53/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0842 - val_loss: 0.1134
Epoch 54/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0839 - val_loss: 0.1120
Epoch 55/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0820 - val_loss: 0.1153
Epoch 56/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0831 - val_loss: 0.1139
Epoch 57/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0821 - val_loss: 0.1151
Epoch 58/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0829 - val_loss: 0.1147
Epoch 59/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0821 - val_loss: 0.1133
Epoch 60/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0820 - val_loss: 0.1148
Epoch 61/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0809 - val_loss: 0.1162
Epoch 62/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0808 - val_loss: 0.1151
Epoch 63/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0795 - val_loss: 0.1149
Epoch 64/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0802 - val_loss: 0.1159
Epoch 65/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0797 - val_loss: 0.1153
Epoch 66/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0791 - val_loss: 0.1158
Epoch 67/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0789 - val_loss: 0.1172
Epoch 68/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0804 - val_loss: 0.1152
Epoch 69/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0790 - val_loss: 0.1165
Epoch 70/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0788 - val_loss: 0.1167
Epoch 71/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0781 - val_loss: 0.1174
Epoch 72/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0772 - val_loss: 0.1186
Epoch 73/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0785 - val_loss: 0.1163
Epoch 74/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0778 - val_loss: 0.1163
Epoch 75/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0767 - val_loss: 0.1189
Epoch 76/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0774 - val_loss: 0.1189
Epoch 77/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0769 - val_loss: 0.1177
Epoch 78/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0759 - val_loss: 0.1187
Epoch 79/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0755 - val_loss: 0.1203
Epoch 80/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0761 - val_loss: 0.1188
Epoch 81/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0743 - val_loss: 0.1203
Epoch 82/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0753 - val_loss: 0.1177
Epoch 83/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0760 - val_loss: 0.1199
Epoch 84/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0746 - val_loss: 0.1191
Epoch 85/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0756 - val_loss: 0.1193
Epoch 86/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0743 - val_loss: 0.1206
Epoch 87/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0732 - val_loss: 0.1209
Epoch 88/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0746 - val_loss: 0.1213
Epoch 89/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0725 - val_loss: 0.1223
Epoch 90/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0738 - val_loss: 0.1196
Epoch 91/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0725 - val_loss: 0.1241
Epoch 92/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0744 - val_loss: 0.1226
Epoch 93/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0727 - val_loss: 0.1213
Epoch 94/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0718 - val_loss: 0.1218
Epoch 95/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0746 - val_loss: 0.1217
Epoch 96/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0733 - val_loss: 0.1227
Epoch 97/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0698 - val_loss: 0.1250
Epoch 98/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0731 - val_loss: 0.1225
Epoch 99/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0728 - val_loss: 0.1226
Epoch 100/100
362/362 [==============================] - 1s 2ms/step - loss: 0.0718 - val_loss: 0.1231

The first model performed best, settling around a mean squared error of 0.0588 (though it seems even after setting random_state inside train_test_split and seed inside the dropout layers, there’s still a bit of entropy left in the training of the model, so if you run this notebook yourself, the course of your training may look a little different). Apparently the additional records in the first dataset did more to aid in training than the additional metrics in the subsequent sets. And the dropout layers didn’t stop the third model from overfitting anyway.

第一个模型表现最佳，解决围绕0.0588均方误差(虽然它似乎甚至设置后random_state内train_test_split和seed漏失层内，仍然有位熵留在模型的训练，所以，如果你运行笔记本，培训的过程可能会有些不同)。显然，与后续集合中的其他指标相比，第一个数据集中的其他记录在培训方面的作用更大。而且，辍学层并没有阻止第三种模型过度拟合。

A line plot depicting the changes in training loss and validation loss metrics over the 100 epochs of training.

保存最终模型 (Saving the final model)

First I need to create the final model, training model_1’s architecture on the full dataset. Then I’ll save the model to disk with its save function and save the data transformer using joblib so I can use it in the API.

首先，我需要创建最终模型，在完整数据集上训练model_1的体系结构。然后，我将使用其save功能将模型保存到磁盘，并使用joblib保存数据转换器，以便可以在API中使用它。

Epoch 1/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0750
Epoch 2/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0597
Epoch 3/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0593
Epoch 4/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0592
Epoch 5/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0591
Epoch 6/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0590
Epoch 7/100
8674/8674 [==============================] - 17s 2ms/step - loss: 0.0590
Epoch 8/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0590
Epoch 9/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0589
Epoch 10/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0589
Epoch 11/100
8674/8674 [==============================] - 17s 2ms/step - loss: 0.0589
Epoch 12/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0589
Epoch 13/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 14/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 15/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0589
Epoch 16/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 17/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 18/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 19/100
8674/8674 [==============================] - 17s 2ms/step - loss: 0.0588
Epoch 20/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 21/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 22/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 23/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 24/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 25/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 26/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 27/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 28/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 29/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 30/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 31/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 32/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 33/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 34/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 35/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 36/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 37/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 38/100
8674/8674 [==============================] - 17s 2ms/step - loss: 0.0587
Epoch 39/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 40/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 41/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 42/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 43/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 44/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 45/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 46/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 47/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 48/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 49/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 50/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 51/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 52/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 53/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 54/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 55/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 56/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 57/100
8674/8674 [==============================] - 17s 2ms/step - loss: 0.0587
Epoch 58/100
8674/8674 [==============================] - 17s 2ms/step - loss: 0.0587
Epoch 59/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 60/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 61/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0586
Epoch 62/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 63/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0586
Epoch 64/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 65/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 66/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 67/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 68/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 69/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 70/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 71/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 72/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 73/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 74/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 75/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 76/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 77/100
8674/8674 [==============================] - 17s 2ms/step - loss: 0.0587
Epoch 78/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 79/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 80/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 81/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 82/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 83/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 84/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 85/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 86/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 87/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 88/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0588
Epoch 89/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 90/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 91/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 92/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 93/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 94/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 95/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 96/100
8674/8674 [==============================] - 17s 2ms/step - loss: 0.0587
Epoch 97/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 98/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587
Epoch 99/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0586
Epoch 100/100
8674/8674 [==============================] - 16s 2ms/step - loss: 0.0587['data_transformer.joblib']

构建API (Building the API)

I first tried building this API and its demonstrational front end on Glitch, which, officially, only supports Node.js back ends, but unofficially you can get a Python server running there (which I’ve done before using Flask). When I was almost finished, though, I tried importing TensorFlow to load my model, and it was then that I discovered that unlike Node.js dependencies, Python dependencies get installed to your project’s disk space on Glitch, and not even their pro plan provides enough space to contain the entire TensorFlow library. Which totally makes sense — I certainly wasn’t using the platform as intended.

我首先尝试在Glitch上构建此API及其示例性前端，该Glitch正式仅支持Node.js后端，但是在非正式的情况下，您可以在那里运行Python服务器( 在使用Flask 之前，我已经完成了此工作 )。不过，当我快要结束时，我尝试导入TensorFlow来加载我的模型，然后我发现与Node.js依赖项不同，Python依赖项已安装到项目在Glitch上的磁盘空间中，甚至他们的专业计划都没有提供有足够的空间来容纳整个TensorFlow库。完全有道理-我当然没有按预期使用平台。

Then I discovered PythonAnywhere! They have plenty of common Python libraries already installed out-of-the-box, including TensorFlow, so I got everything working perfectly there.

然后我发现了PythonAnywhere ！他们已经开箱即用地安装了许多常见的Python库，包括TensorFlow，所以我在那里一切都能正常工作。

So head on over if you’d like to check it out; the front end includes a form where you can fill in all the parameters for the API request, and there are a couple of buttons that let you fill the form with typical examples from the dataset (since there are a lot of fields to fill in). Or you can send a GET request to https://tywmick.pythonanywhere.com/api/predict if you really want to include every parameter in your query string. In either case, you’re also more than welcome to take a look at its source on GitHub.

因此，如果您想查看一下，请直接过去；前端包含一个表格，您可以在其中填写API请求的所有参数，并且有几个按钮可以让您使用数据集中的典型示例来填写表格(因为有很多字段需要填写) 。或者，如果您确实要在查询字符串中包含每个参数，则可以将GET请求发送到https://tywmick.pythonanywhere.com/api/predict 。无论哪种情况，都非常欢迎您在GitHub上查看其源代码。

One of the best/worst things about machine learning is that your models always have room for improvement. I mentioned a couple of ideas along the way above for how I could improve the model in the future, but what’s the first thing you would tweak? Leave a response—I’d love to hear!

关于机器学习的最好/最糟糕的事情之一就是您的模型总是有改进的空间。在上面的过程中，我提到了一些想法，以便将来我可以改进模型，但是您要调整的第一件事是什么？留下回应-我很想听听！

翻译自: https://towardsdatascience.com/loan-risk-neural-network-30c8f65f052e

weixin_26752765

关注

0
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
建立神经网络来预测贷款风险

深度分析 (In-Depth Analysis)Introduction 介绍 Data cleaning 数据清理 Building the neural networks 建立神经网络 Saving the final model 保存最终模型 Building the API 构建API 介绍 (Introduction)LendingClub is the world’s larg...
复制链接

扫一扫