MySQL Shell中跑出霸气的Reports

Filling Test Data Using Python's Data Science Modules

Check which version of Python MySQL Shell is running .

 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > import sys
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > sys.version
3.9.6 (default, Aug 25 2021, 16:22:38) 
[GCC 8.5.0 20210514 (Red Hat 8.5.0-3)]
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > sys.path.append("/home/maxwell/.local/lib/python3.9/site-packages")

Data filling step-by-step

 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > 
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > import numpy as np
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > import pandas as pd
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > import random
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > import string
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > from datetime import datetime
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > from datetime import timedelta
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > num_rows=1000
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > 

 Names and genders

[maxwell@MaxwellDBA datasets]$ pwd
/home/maxwell/datasets
[maxwell@MaxwellDBA datasets]$ head top-male-and-female-names.csv
Rank,Female Name,Count,Male Name,Count_ 
1,Mary,54276,John,108533 
2,Margaret,49170,William,87239 
3,Elizabeth,36556,James,69987 
4,Sarah,28230,David,62774 
5,Patricia,20689,Robert,56511 
6,Catherine,19713,Michael,51768 
7,Susan,19165,Peter,44758 
8,Helen,18881,Thomas,42467 
9,Emma,18192,George,39195 
[maxwell@MaxwellDBA datasets]$
MySQL  127.0.0.1:33060+ ssl  cookbook  Py > 
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > import sys
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > sys.version
3.9.6 (default, Aug 25 2021, 16:22:38) 
[GCC 8.5.0 20210514 (Red Hat 8.5.0-3)]
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > sys.path.append("/home/maxwell/.local/lib/python3.9/site-packages")
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > 
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > import numpy as np
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > import pandas as pd
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > import random
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > import string
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > from datetime import datetime
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > from datetime import timedelta
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > num_rows=1000
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > female_names=pd.read_csv("/home/maxwell/datasets/top-male-and-female-names.csv",usecols=["Female Name"]).rename(columns={'Female Name': 'name'})
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > male_names=pandas.read_csv("/home/maxwell/datasets/top-male-and-female-names.csv",usecols=["Male Name"]).rename(columns={'Male Name': 'name'})
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > male_names=pd.read_csv("/home/maxwell/datasets/top-male-and-female-names.csv",usecols=["Male Name"]).rename(columns={'Male Name': 'name'})
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > female_names['gender']=(['F']*female_names.count()['name'])
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > male_names['gender']=(['M']*male_names.count()['name'])
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > names=pd.concat([female_names, male_names],ignore_index=True)
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > import os
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > os.getcwd()
/home/maxwell
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > os.chdir('/home/maxwell/datasets')
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > os.getcwd()
/home/maxwell/datasets
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > names
         name gender
0        Mary      F
1    Margaret      F
2   Elizabeth      F
3       Sarah      F
4    Patricia      F
5   Catherine      F
6       Susan      F
7       Helen      F
8        Emma      F
9        John      M
10    William      M
11      James      M
12      David      M
13     Robert      M
14    Michael      M
15      Peter      M
16     Thomas      M
17     George      M
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > names=names.sample(num_rows, replace=True).set_index(pd.Series(range(num_rows)))
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > 

Surnames

MySQL  127.0.0.1:33060+ ssl  cookbook  Py > surnames=pd.read_csv("/home/maxwell/datasets/Names_2010Census.csv",usecols=["name"],skipfooter=1,engine='python').rename(columns={'name': 'surname'})
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > surnames=surnames.sample(num_rows,replace=True).set_index(pd.Series(range(num_rows)))
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > 

Diagnoses

 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > diagnoses=pd.read_csv('/home/maxwell/datasets/diagnosis.csv')
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > diagnoses
                               diagnosis  frequency
0                Acute coronary syndrome        2.1
1                Alcoholic liver disease        0.3
2                              Pneumonia        3.6
3  Chronic obstructive pulmonary disease        2.1
4                Gastro-intestinal bleed        0.8
5                          Heart failure        0.8
6                                 Sepsis        0.8
7                Urinary tract infection        2.4
8                            Data Phobia        6.2
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > diagnoses=diagnoses.sample(num_rows,replace=True,weights=diagnoses['frequency']).set_index(pd.Series(range(num_rows)))

Results

 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > results=pandas.DataFrame({"result":["R","N","D"],"frequency":[6,3,1]})
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > results=results.sample(num_rows,replace=True,weights=results['frequency']).set_index(pd.Series(range(num_rows)))
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > 

The table

 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > patients=session.get_schema('cookbook').get_table('patients')
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > # we'll start start the loop
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > for i in range(num_rows):

National ID

national_id=str(random.randrange(10,99)) +''.join(random.sample(string.ascii_uppercase, 2)) + str(random.randrange(100000, 999999))

Age

 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > age=random.randrange(15,99)

Dates a patient spent in the hospital

MySQL  127.0.0.1:33060+ ssl  cookbook  Py > date_arrived=datetime.strptime('2020-01-01','%Y-%m-%d')+ timedelta(days=random.randrange(365))
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > date_departed=date_arrived+timedelta(days=random.randrange(60))
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > date_arrived=date_arrived.strftime('%Y-%m-%d')
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > date_departed=date_departed.strftime('%Y-%m-%d')
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > 

 Putting it all together

import numpy as np
import pandas as pd
import random
import string
from datetime import datetime
from datetime import timedelta
num_rows=1000

def generate_patients_data(num_rows):
  # read datasets
  # names and genders
  female_names=pd.read_csv("/home/maxwell/datasets/top-male-and-female-names.csv",
  usecols=["Female Name"]).rename(columns={'Female Name':'name'})
  female_names['gender']=(['F']*female_names.count()['name'])
  male_names=pd.read_csv("/home/maxwell/datasets/top-male-and-female-names.csv",
  usecols=["Male Name"]).rename(columns={'Male Name':'name'})
  male_names['gender']=(['M']*male_names.count()['name'])
  names=pd.concat([female_names,male_names],ignore_index=True)
  surnames=pd.read_csv("/home/maxwell/datasets/Names_2010Census.csv",
  usecols=['name'],skipfooter=1,engine='python').rename(columns={'name':'surname'})
  # diagnoses
  diagnoses=pd.read_csv('/home/maxwell/datasets/diagnosis.csv')
  # Possible results
  results=pd.DataFrame({
  "result":["R","N","D"],
  "frequency":[6,3,1]})
  # Start building data
  diagnoses=diagnoses.sample(
  num_rows,replace=True,
  weights=diagnoses['frequency']).set_index(pd.Series(range(num_rows)))
  results=results.sample(num_rows,replace=True,weights=results['frequency']).set_index(pd.Series(range(num_rows)))
  names=names.sample(num_rows,replace=True).set_index(pd.Series(range(num_rows)))
  surnames=surnames.sample(num_rows,replace=True).set_index(pd.Series(range(num_rows)))
  # Get table object
  patients=session.get_schema('cookbook').get_table('patients')
  # Loop, inserting rows
  for i in range(num_rows):
    national_id=str(random.randrange(10,99))+ \
    ''.join(random.sample(string.ascii_uppercase,2))+ \
    str(random.randrange(100000,999999))
    age=random.randrange(15,99)
    date_arrived=datetime.strptime('2020-01-01','%Y-%m-%d')+ \
    timedelta(days=random.randrange(365))
    date_departed=date_arrived+timedelta(days=random.randrange(60))
    date_arrived=date_arrived.strftime('%Y-%m-%d')
    date_departed=date_departed.strftime('%Y-%m-%d')
    name=names['name'][i]
    gender=names['gender'][i]
    surname=surnames['surname'][i]
    result=results['result'][i]
    diagnosis=diagnoses['diagnosis'][i]
    patients.insert(
    'national_id','name','surname',
    'gender','age','diagnosis',
    'result','date_arrived','date_departed'
    ).values(
    national_id,name,surname,
    gender,age,diagnosis,
    result,date_arrived,date_departed
    ).execute()
  
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > def generate_patients_data(num_rows):
                                          ->   # read datasets
                                          ->   # names and genders
                                          ->   female_names=pd.read_csv("/home/maxwell/datasets/top-male-and-female-names.csv",
                                          ->   usecols=["Female Name"]).rename(columns={'Female Name':'name'})
                                          ->   female_names['gender']=(['F']*female_names.count()['name'])
                                          ->   male_names=pd.read_csv("/home/maxwell/datasets/top-male-and-female-names.csv",
                                          ->   usecols=["Male Name"]).rename(columns={'Male Name':'name'})
                                          ->   male_names['gender']=(['M']*male_names.count()['name'])
                                          ->   names=pd.concat([female_names,male_names],ignore_index=True)
                                          ->   surnames=pd.read_csv("/home/maxwell/datasets/Names_2010Census.csv",
                                          ->   usecols=['name'],skipfooter=1,engine='python').rename(columns={'name':'surname'})
                                          ->   # diagnoses
                                          ->   diagnoses=pd.read_csv('/home/maxwell/datasets/diagnosis.csv')
                                          ->   # Possible results
                                          ->   results=pd.DataFrame({
                                          ->   "result":["R","N","D"],
                                          ->   "frequency":[6,3,1]})
                                          ->   # Start building data
                                          ->   diagnoses=diagnoses.sample(
                                          ->   num_rows,replace=True,
                                          ->   weights=diagnoses['frequency']).set_index(pd.Series(range(num_rows)))
                                          ->   results=results.sample(num_rows,replace=True,weights=results['frequency']).set_index(pd.Series(range(num_rows)))
                                          ->   names=names.sample(num_rows,replace=True).set_index(pd.Series(range(num_rows)))
                                          ->   surnames=surnames.sample(num_rows,replace=True).set_index(pd.Series(range(num_rows)))
                                          ->   # Get table object
                                          ->   patients=session.get_schema('cookbook').get_table('patients')
                                          ->   # Loop, inserting rows
                                          ->   for i in range(num_rows):
                                          ->     national_id=str(random.randrange(10,99))+ \
                                          ->     ''.join(random.sample(string.ascii_uppercase,2))+ \
                                          ->     str(random.randrange(100000,999999))
                                          ->     age=random.randrange(15,99)
                                          ->     date_arrived=datetime.strptime('2020-01-01','%Y-%m-%d')+ \
                                          ->     timedelta(days=random.randrange(365))
                                          ->     date_departed=date_arrived+timedelta(days=random.randrange(60))
                                          ->     date_arrived=date_arrived.strftime('%Y-%m-%d')
                                          ->     date_departed=date_departed.strftime('%Y-%m-%d')
                                          ->     name=names['name'][i]
                                          ->     gender=names['gender'][i]
                                          ->     surname=surnames['surname'][i]
                                          ->     result=results['result'][i]
                                          ->     diagnosis=diagnoses['diagnosis'][i]
                                          ->     patients.insert(
                                          ->     'national_id','name','surname',
                                          ->     'gender','age','diagnosis',
                                          ->     'result','date_arrived','date_departed'
                                          ->     ).values(
                                          ->     national_id,name,surname,
                                          ->     gender,age,diagnosis,
                                          ->     result,date_arrived,date_departed
                                          ->     ).execute()
                                          -> 
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > \sql truncate table patients
Query OK, 0 rows affected (0.0244 sec)
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > session.get_schema('cookbook').get_table('patients').count()
0
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > generate_patients_data(1000)
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > session.get_schema('cookbook').get_table('patients').count()
1000
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > session.get_schema('cookbook').get_table('patients').select().limit(10)
+----+-------------+-----------+----------------+--------+-----+-----------------+-------------------------+--------+--------------+---------------+
| id | national_id | name      | surname        | gender | age | additional_data | diagnosis               | result | date_arrived | date_departed |
+----+-------------+-----------+----------------+--------+-----+-----------------+-------------------------+--------+--------------+---------------+
|  1 | 79AF555574  | Emma      | BALKEY         | F      |  58 | NULL            | Pneumonia               | R      | 2020-12-12   | 2021-01-12    |
|  2 | 11YS819517  | Elizabeth | PAQUETTE       | F      |  60 | NULL            | Heart failure           | R      | 2020-08-18   | 2020-09-09    |
|  3 | 46YL924212  | David     | GLICKER        | M      |  73 | NULL            | Urinary tract infection | R      | 2020-08-07   | 2020-08-12    |
|  4 | 86JD612318  | George    | LOPEZBELTRAN   | M      |  52 | NULL            | Acute coronary syndrome | R      | 2020-08-16   | 2020-09-17    |
|  5 | 55KW110008  | George    | OROZCOGONZALEZ | M      |  25 | NULL            | Urinary tract infection | R      | 2020-07-15   | 2020-08-09    |
|  6 | 24HB929221  | Mary      | KIRKELIE       | F      |  81 | NULL            | Heart failure           | R      | 2020-06-30   | 2020-08-21    |
|  7 | 32HE856471  | David     | CANZONA        | M      |  33 | NULL            | Acute coronary syndrome | R      | 2020-09-21   | 2020-11-07    |
|  8 | 64BK628444  | John      | GANDIA         | M      |  67 | NULL            | Urinary tract infection | D      | 2020-06-20   | 2020-06-25    |
|  9 | 46GC444803  | John      | SHERIFF        | M      |  73 | NULL            | Pneumonia               | R      | 2020-10-28   | 2020-12-12    |
| 10 | 70WC944263  | Sarah     | WILKS          | F      |  95 | NULL            | Acute coronary syndrome | R      | 2020-04-13   | 2020-05-01    |
+----+-------------+-----------+----------------+--------+-----+-----------------+-------------------------+--------+--------------+---------------+
10 rows in set (0.0008 sec)
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > 

Reusing Your Scripts for MySQL Shell

 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > 
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > import numpy as np
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > import pandas as pd
 MySQL  127.0.0.1:33060+ ssl  cookbook  Py > \source /home/maxwell/datasets/mysql_shell/generate_patients_data.py
MySQL  127.0.0.1:33060+ ssl  cookbook  Py > \sql
Switching to SQL mode... Commands end with ;
Fetching table and column names from `cookbook` for auto-completion... Press ^C to stop.
 MySQL  127.0.0.1:33060+ ssl  cookbook  SQL > 
 MySQL  127.0.0.1:33060+ ssl  cookbook  SQL > \source /home/maxwell/datasets/tables/patients.sql
Query OK, 0 rows affected (0.0002 sec)
Query OK, 0 rows affected (0.0136 sec)
Query OK, 0 rows affected (0.0002 sec)
Query OK, 0 rows affected (0.0179 sec)
Query OK, 0 rows affected (0.0003 sec)
Query OK, 20 rows affected (0.0042 sec)

Records: 20  Duplicates: 0  Warnings: 0
+----+-------------+-----------+-----------+--------+-----+-----------------+---------------+--------+--------------+---------------+
| id | national_id | name      | surname   | gender | age | additional_data | diagnosis     | result | date_arrived | date_departed |
+----+-------------+-----------+-----------+--------+-----+-----------------+---------------+--------+--------------+---------------+
|  1 | 89AR642465  | Mary      | Smith     | F      |  24 | NULL            | Appendicitis  | R      | 2020-01-15   | 2020-01-20    |
|  2 | 71GE601633  | John      | Johnson   | M      |  57 | NULL            | Asthma        | N      | 2020-03-10   | 2020-03-25    |
|  3 | 15QT023022  | Margaret  | Williams  | F      |  50 | NULL            | Breast Cancer | D      | 2020-05-08   | 2020-08-07    |
|  4 | 84DC051879  | William   | Brown     | M      |  64 | NULL            | Arthritis     | N      | 2020-05-10   | 2020-05-27    |
|  5 | 54XT916306  | Elizabeth | Jones     | F      |  39 | NULL            | Data Phobia   | R      | 2020-04-17   | 2020-06-03    |
|  6 | 06VZ647400  | James     | Garcia    | M      |  86 | NULL            | Diabetes      | D      | 2020-08-09   | 2020-08-15    |
|  7 | 02WS884704  | Sarah     | Miller    | F      |  31 | NULL            | Diabetes      | R      | 2020-08-10   | 2020-08-20    |
|  8 | 78FS043029  | David     | Davis     | M      |  75 | NULL            | Arthritis     | N      | 2020-06-13   | 2020-06-22    |
|  9 | 45MY529190  | Patricia  | Rodriguez | F      |  35 | NULL            | Diabetes      | N      | 2020-02-18   | 2020-03-02    |
| 10 | 30NC108735  | Robert    | Martinez  | M      |  35 | NULL            | Appendicitis  | R      | 2020-09-12   | 2020-09-18    |
| 11 | 38BP394037  | Catherine | Hernandez | F      |  61 | NULL            | Asthma        | R      | 2020-07-13   | 2020-07-19    |
| 12 | 96HE717336  | Michael   | Lopez     | M      |  33 | NULL            | Data Phobia   | D      | 2020-09-15   | 2020-09-20    |
| 13 | 26ES505212  | Susan     | Gonzalez  | F      |  81 | NULL            | Data Phobia   | R      | 2020-10-01   | 2020-10-28    |
| 14 | 04WT954962  | Peter     | Wilson    | M      |  86 | NULL            | Diabetes      | N      | 2020-09-26   | 2020-10-07    |
| 15 | 65XJ029932  | Helen     | Anderson  | F      |  57 | NULL            | Data Phobia   | R      | 2020-10-06   | 2020-10-26    |
| 16 | 59DW008178  | Thomas    | Thomas    | M      |  53 | NULL            | Data Phobia   | R      | 2020-09-15   | 2020-10-26    |
| 17 | 99XC682639  | Emma      | Taylor    | F      |  69 | NULL            | Breast Cancer | R      | 2020-07-18   | 2020-09-15    |
| 18 | 72TY932806  | George    | Moore     | M      |  20 | NULL            | Data Phobia   | R      | 2020-11-01   | 2020-11-10    |
| 19 | 28VU492728  | Alice     | Jackson   | F      |  72 | NULL            | Arthritis     | N      | 2020-10-01   | 2020-11-10    |
| 20 | 09SK434607  | Richard   | Martin    | M      |  23 | NULL            | Asthma        | N      | 2020-11-18   | 2020-11-28    |
+----+-------------+-----------+-----------+--------+-----+-----------------+---------------+--------+--------------+---------------+
20 rows in set (0.0008 sec)
 MySQL  127.0.0.1:33060+ ssl  cookbook  SQL >
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值