PYTHON
包括社交挖掘,数据挖掘
guaguastd
这个作者很懒,什么都没留下…
展开
-
[Python数据结构] 序列中n个最大最小集
>>> import heapq>>> nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]>>> heapq.nlargest(3, a)[5, 4, 3]>>> heapq.nsmallest(3, a) [1, 2, 3]>>>>>> portfolio = [... {'name': 'IBM', 'shares': 100...原创 2021-11-08 21:21:50 · 749 阅读 · 0 评论 -
[Python基础] 可变长度序列赋值
>>> def drop_frist_last(nums):... first, *middle, last = nums... return sum(middle)/len(middle)>>> drop_frist_last([1,2,3])2.0>>> drop_frist_last([1,2,3,4])2.5>>> record = ('Hello', 'a@qq.com', 123,456)...原创 2021-11-06 13:22:46 · 325 阅读 · 0 评论 -
[Python数据结构] 将序列赋值给独立变量
>>> p = (7, 8)>>> p(7, 8)>>> a, b = p>>> a7>>> b8>>> data = ['1', '2', [3, 4]]>>> data['1', '2', [3, 4]]>>> a, b, c = data>>> a'1'>>> b'2'>>>原创 2021-11-06 12:11:52 · 314 阅读 · 0 评论 -
Replace all Matches (替换所有匹配)
需求:获取字符串1 2 3 4 5 6 7中的2,5, 6, 7方法:1. Pythonimport resubject = '1 2 3 4 5 6 7'list = []innerre = re.compile("\d+")for outermatch in re.finditer("(?s)(.*?)", subject): list.ext翻译 2014-05-23 09:33:22 · 1565 阅读 · 0 评论 -
Test if a Match Can Be Found Within a Subject String (测试匹配是否可以在字符串中找到)
需求:The regex pattern can be found中的reg翻译 2014-05-13 09:11:49 · 659 阅读 · 0 评论 -
Python 用类自定义数值的四舍五入
//code 1class RoundFloatManual(object): def __init__(self, val): assert isinstance(val, float), "Value must be a float!" self.value = round(val, 2)//outputIn [47]: rfm = RoundF翻译 2014-12-25 10:01:38 · 1334 阅读 · 0 评论 -
Literal Regular Expression in Source Code (代码中的字面正则表达式)
需求:将正则表达式[$"'\n\d/\\]作为变量放入代码中方法:Pyth翻译 2014-05-08 11:18:25 · 1008 阅读 · 0 评论 -
Python的itertools和迭代器
1. chainimport itertoolslistone = ['a', 'b', 'c']listtwo = ['11', '22', '33']for item in itertools.chain(listone, listtwo): print item,output:a b c 11 22 332. countcout返回一个无界的迭代器import转载 2015-05-18 07:10:15 · 734 阅读 · 0 评论 -
Python 对字典元素的赋值 (字典key和value均为组合类型)
>>> doc_title = 'nihao'>>> url = 'www.nihao.com'>>> td_matrix = {}>>> td_matrix[(doc_title, url)] = {}>>> td_matrix{('nihao', 'www.nihao.com'): {}}>>> td_matrix[(doc_title, url)]['good'] = 1>>>原创 2015-01-21 05:33:58 · 14808 阅读 · 0 评论 -
Python 实现选择排序
# Sorts a sequence in ascending order using the selection sort algorithmdef selectionSort(theSeq): n = len(theSeq) for i in range(n-1): # Assume the ith element is the smallest翻译 2015-01-16 17:13:01 · 596 阅读 · 0 评论 -
Split a String(分隔字符串)
需求:将I like bold and italic fonts变为翻译 2014-06-05 15:15:54 · 574 阅读 · 0 评论 -
Python 查找有序列表中指定元素所在位置
# Modified version of the binary search that returns the index within# a sorted sequence indicating where the target should be locateddef findSortedPosition(theList, target): low = 0 high =翻译 2015-01-19 15:15:39 · 7513 阅读 · 1 评论 -
Python 使用list实现简单的set
# Implementation of iterclass _SetIterator: def __init__(self, theList): self._setItems = theList self._curItem = 0 def __iter__(self): return self def next(self):翻译 2014-12-24 17:16:10 · 738 阅读 · 0 评论 -
Python 使用单链表实现多项式 (Polynomial)
#!/usr/bin/python # -*- coding: utf-8 -*-'''Created on 2015-1-26@author: beyondzhou@name: linkPolynomail.py'''# Implementation of the Polynomial ADT using a sorted linked listclass linkPolyn翻译 2015-01-26 15:10:06 · 1724 阅读 · 1 评论 -
Python 正则表达式验证Social Secury Number
Regexp Expression^(?!000|666)[0-8][0-9]{2}-(?!00)[0-9]{2}-(?!0000)[0-9]{4}$match:111-111-1111no-match:000-111-1111666-111-1111翻译 2014-07-29 09:23:51 · 529 阅读 · 0 评论 -
Match Previously Matched Text Again (匹配前面匹配的文本)
需求1:匹配2008-08-08(\d\)翻译 2014-04-09 10:38:05 · 646 阅读 · 0 评论 -
Python 正则表达式替换应用 (用文本中一部分替换另一部分)
import refobj = open('bws.html', 'r')subject = fobj.readlines()fobj.close()all = []fobj = open('bws.html.new', 'w')for eachLine in subject: result = re.sub(r' (.*?) \2 <', eachLine) a原创 2015-01-06 11:36:26 · 3265 阅读 · 0 评论 -
Python 快速读取文件内容到列表中
>>> f = open('nihao')>>> data = [line for line in f.readlines()]>>> f.close()>>> print data['fjdkfj\n', 'fdjkfj\n', 'fdjkfj\n', 'ddd\n', 'ddd\n', 'ddd']>>> f = open('nihao')>>> data = [line.stri原创 2015-01-21 11:25:47 · 5758 阅读 · 0 评论 -
Validate and Format North American Phone Numbers (验证和格式化北美电话号码)
需求:验证及替换一下电话号码为标准格式12翻译 2014-06-26 10:21:53 · 750 阅读 · 0 评论 -
Python 将HTML转换为TXT
CODE:#!/usr/bin/python # -*- coding: utf-8 -*-'''Created on 2014-9-5@author: guaguastd@name: html_to_text.py'''from login import google_api_requestfrom html import cleanHtmlwhile True:翻译 2014-09-05 07:24:09 · 5270 阅读 · 1 评论 -
Python 正则表达式限定输入为特定字符
Regex Expression# alphanumber^[a-zA-Z0-9]+$# ASCII character^[\x00-\x7F]+$# ASCII noncontrol characters and line break^[\n\r\x20-\x7E]+$# shared ISO-8859-1 and windows-1252 characters^[\x0翻译 2014-07-23 13:59:30 · 2888 阅读 · 0 评论 -
Python 格式化日期
>>> import datetime>>> now = datetime.datetime.now()>>> other = now.strftime("%Y%m%d%H%M%S")>>> other'20150429054922'原创 2015-04-29 05:49:26 · 610 阅读 · 0 评论 -
Set Regular Expression Options (设置正则表达式选项)
需求:free-spacing, case insensitive, dot matches lines breaks, and "^ and $ mat"翻译 2014-05-12 09:10:54 · 912 阅读 · 0 评论 -
Python 正则表达式查找XML注释中的特定词
1. Two-step approach import resubject = ''' This "TODO" is not within a comment, but the next one is. <!-- TODO : Come up with a cooler comment for this example. -->'''翻译 2014-12-15 14:01:11 · 1618 阅读 · 0 评论 -
Python 将字符串每两个以空格分开
>>> import re>>> subject = '080045000106309140003F2F7D100A0A3C0A0A0A3D0A00000800450000EE000000003F06CCF0C0A8C864C0A8646400000000000000000000000050000000F7410000'>>> result = re.sub(r"(?<=\w)(?=(?:\w原创 2015-02-05 11:39:29 · 15017 阅读 · 1 评论 -
使用tcl/expect实现ftp交互(含手工输入)
需求: 登录到ftp服务器ftp.google.org (用户名:google 密码:google), 根据用户输入获取Rfc文档。实现:#!/usr/bin/env expect set timeout 15set u_Prompt "Name"set p_Prompt "Password:"set f_Prompt "ftp>"set sUser原创 2013-04-18 17:17:31 · 1957 阅读 · 0 评论 -
Python libraries collection
1) deloreanDolorean is a really cool date/time library. Apart from having a sweet name, it's one of the more natural feeling date/time munging libraries I've used in Python. It's sort of like moment转载 2015-01-29 11:31:53 · 747 阅读 · 0 评论 -
Python 正则表达式验证IPv4地址
1. Simple regex to check for an IP address^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$2. Accurate regex to check for an IP address, allowing leading zeros^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25翻译 2014-10-31 10:17:17 · 5744 阅读 · 0 评论 -
Genereate a histogram of how many times each unique word is used in that test
#! /usr/bin/expect --# Genereate a histogram of how many times each unique word is used in that test.proc hWord {sText} { set debug 0 # Print the primary string text if原创 2014-03-24 13:36:12 · 568 阅读 · 0 评论 -
Use recursive procedure to perform a bubble sort on a list of data
#! /usr/bin/expect --# Use recursive procedure to perform a bubble sort on a list of dataproc recBubblesort {sData sLen} { set debug 0 # Print the primary data if {$debug == 1原创 2014-03-24 17:58:17 · 539 阅读 · 0 评论 -
Python 正则表达式限定文本行数最多5行
正则表达式# separator is \r\n or \n\A(?:[^\r\n]*(?:\r\n?|\n)){0,4}[^\r\n]*\Z# separator is special separator\A(?:[^\n-\r\x85\u2028\u2029]*(?:\r\n?|[\n-\f\x85\u2028\u2029])){0,4}[^\n-\r\x85\u2028\u202翻译 2014-07-25 11:24:31 · 1009 阅读 · 0 评论 -
Match without adding it to the Overall match (匹配但是不把它添加到整体匹配中)
需求: 匹配My cat is furry中的cat方法:翻译 2014-04-28 10:40:27 · 632 阅读 · 0 评论 -
Python 实现简单的加减猜结果游戏
#! /usr/bin/env pythonfrom operator import add, subfrom random import randint, choiceops = {'+': add, '-': sub}MAXTRIES = 2def doprob(): op = choice('+-') nums = [randint(1,10) for i i翻译 2014-12-09 14:37:23 · 715 阅读 · 0 评论 -
使用shell统计出出现次数排名top10的网址
#!/bin/shfoo(){ if [ $# -ne 1 ]; then echo "Usage:$0 filename"; exit -1 fiegrep -o "http://[a-zA-Z0-9.]+\.[a-zA-Z]{2,3}" website | awk '{ count[$0]++ } END { printf("%-30s %s\n","wensit原创 2012-12-19 17:17:16 · 3793 阅读 · 1 评论 -
Linux 批量修改文件名
build@dev-16-new:~/3922/release/pica8/automation/suite/vxlan_gre$ lspic8OvsL2gre_02_01.tcl pic8OvsL2gre_02_06.tcl pic8OvsL2gre_02_11.tcl pic8OvsVxlan_01_02.tcl pic8OvsVxlan_01_07.tcl pic8OvsVxla原创 2015-01-23 15:23:53 · 573 阅读 · 0 评论 -
Python 对字符串半金字塔图形输出
>>> s = 'abcde'>>> for i in [None] + range(-1, -len(s), -1):... print s[:i]...abcdeabcdabcaba>>> for i in range(0, len(s), 1) + [None]:... print s[:i]...aababcabcdabc原创 2015-01-09 09:25:04 · 6638 阅读 · 0 评论 -
Python 使用list实现堆栈 (基于class, 包含迭代器)
Python 使用list实现堆栈 (基于class, 包含迭代器)原创 2015-01-27 16:18:34 · 951 阅读 · 0 评论 -
Python 实现冒泡排序
def bubbleSort( theSeq ): n = len( theSeq ) # Perform n-1 bubble operations on the sequence for i in range( n - 1 ) : # Bubble the largest item to the end. for j in range(原创 2015-01-16 15:41:44 · 778 阅读 · 0 评论 -
Python 正则表达式提取UNC路径中的server和share
1. Regular Expression^\\\\([a-zA-Z0-9_.$ -]+)\\([a-zA-Z0-9_.$ -]+)eg.\\server\share\folder\file.ext2. Python codeimport resubject = '''\\\\server\\share\\folder\\file.ext'''match = re.search翻译 2014-11-11 09:57:07 · 1180 阅读 · 0 评论 -
Split a string, but keep the the regexp (分割字符串,保留分隔符)
需求:将I like bold and italic fonts变为'I like ', 'Python:import resubject = 'I like bold and italic fonts'reobj = re.compile("]*>")result = reobj.split(subject)print result翻译 2014-06-06 15:10:35 · 1342 阅读 · 0 评论