初步处理爬取到的150708个单词的数据（原始网页文档格式，包含注音、释义与例句，等等）

最新推荐文章于 2023-07-20 16:14:16 发布

u25th_engineer

最新推荐文章于 2023-07-20 16:14:16 发布

阅读量3.9k

点赞数 1

分类专栏： Python HTML5 数据库文章标签： python linux 爬虫正则表达式 Oracle数据库

本文链接：https://blog.csdn.net/u25th_engineer/article/details/105828868

版权

前言：

正文：

前言：

在正式介绍处理数据前先放出本次实验所用服务器与笔记本的配置以及其他一些工具的版本信息。服务器配置：

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                1
On-line CPU(s) list:   0
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz
Stepping:              1
CPU MHz:               2494.222
BogoMIPS:              4988.44
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              40960K
NUMA node0 CPU(s):     0
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt

笔记本配置：

Host Name:                 DESKTOP-UBDN0UL
OS Name:                   Microsoft Windows 10 Pro N for Workstations
OS Version:                10.0.18363 N/A Build 18363
OS Manufacturer:           Microsoft Corporation
OS Configuration:          Standalone Workstation
OS Build Type:             Multiprocessor Free
Registered Owner:          hadoop001
Registered Organization:
Product ID:                00392-30000-00001-AA159
Original Install Date:     2020-01-29, 07:48:44
System Boot Time:          2020-05-02, 00:37:38
System Manufacturer:       Dell Inc.
System Model:              G3 3579
System Type:               x64-based PC
Processor(s):              1 Processor(s) Installed.
                           [01]: Intel64 Family 6 Model 158 Stepping 10 GenuineIntel ~2304 Mhz
BIOS Version:              Dell Inc. 1.2.1, 2018-07-18
Windows Directory:         C:\Windows
System Directory:          C:\Windows\system32
Boot Device:               \Device\HarddiskVolume5
System Locale:             en-us;English (United States)
Input Locale:              en-us;English (United States)
Time Zone:                 (UTC+08:00) Beijing, Chongqing, Hong Kong, Urumqi
Total Physical Memory:     16,245 MB
Available Physical Memory: 7,745 MB
Virtual Memory: Max Size:  20,484 MB
Virtual Memory: Available: 3,122 MB
Virtual Memory: In Use:    17,362 MB
Page File Location(s):     C:\pagefile.sys
Domain:                    WORKGROUP
Logon Server:              \\DESKTOP-UBDN0UL
Hotfix(s):                 12 Hotfix(s) Installed.
                           [01]: KB4537572
                           [02]: KB4513661
                           [03]: KB4516115
                           [04]: KB4517245
                           [05]: KB4521863
                           [06]: KB4524244
                           [07]: KB4528759
                           [08]: KB4537759
                           [09]: KB4538674
                           [10]: KB4541338
                           [11]: KB4552152
                           [12]: KB4549951
Network Card(s):           3 NIC(s) Installed.
                           [01]: Realtek PCIe GbE Family Controller
                                 Connection Name: Ethernet
                                 Status:          Media disconnected
                           [02]: Intel(R) Wireless-AC 9462
                                 Connection Name: Wi-Fi
                                 DHCP Enabled:    Yes
                                 DHCP Server:     192.168.43.1
                                 IP address(es)
                                 [01]: 192.168.43.58
                                 [02]: fe80::852e:ffd6:7cee:e82
                           [03]: Bluetooth Device (Personal Area Network)
                                 Connection Name: Bluetooth Network Connection
                                 Status:          Media disconnected
Hyper-V Requirements:      VM Monitor Mode Extensions: Yes
                           Virtualization Enabled In Firmware: Yes
                           Second Level Address Translation: Yes
                           Data Execution Prevention Available: Yes

PyCharm版本信息：

PyCharm 2019.3.4 (Professional Edition)
Build #PY-193.6911.25, built on March 18, 2020
Licensed to hadoop001

Runtime version: 11.0.6+8-b520.43 amd64
VM: OpenJDK 64-Bit Server VM by JetBrains s.r.o
Windows 10 10.0
GC: ParNew, ConcurrentMarkSweep
Memory: 1963M
Cores: 8
Registry: 
Non-Bundled Plugins: R4Intellij, aws.toolkit

Python版本信息：

Python 3.7.4 (tags/v3.7.4:e09359112e, Jul  8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)] on win32

Oracle与PL/S

最低0.47元/天解锁文章

u25th_engineer

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录