『sklearn学习』《sklearn》第三章:特征提取与处理

### --------------------------------------------------- ###
#     ------ 特征提取与处理 ------

#     ------ 分类变量特征提取 ------
from sklearn.feature_extraction import DictVectorizer

one_hot_encoder = DictVectorizer()
instances = [{"city": "New York"}, {"city": "San Francisco"}, {"city": "Chapel Hill"}]
print one_hot_encoder.fit_transform(instances).toarray()

#     ------ 词库表示法 ------
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer()
corpus = [
    "UNC played Duke in basketball",
    "Duke lost the basketball game",
    "I ate a sandwich"
]

print vectorizer.fit_transform(corpus).todense()
print vectorizer.vocabulary_

# 词汇表里面有10个单词,但a不在词汇表里面,是
# 因为a的长度不符合CountVectorizer类的要求


from sklearn.metrics.pairwise import euclidean_distances
counts = vectorizer.fit_transform(corpus).todense()
print "counts: \n", counts
for x, y in [[0, 1], [0, 2], [1, 2]]:
    dist = euclidean_distances(counts[x], counts[y])
    print '文档{}与文档{}的距离{}'.format(x, y, dist)

#     ------ 图片特征提取 ------
# 通过图像值提取特征
from sklearn import datasets
digits = datasets.load_digits()
print 'Feature vector:\n', digits.images[0].reshape(-1, 64)

# 对感兴趣的点进行特征提取
import numpy as np
from skimage.feature import corner_harris, corner_peaks
from skimage.color import rgb2gray
import matplotlib.pyplot as plt
import skimage.io as io
from skimage.exposure import equalize_hist

def show_corners(corners, image):
    fig = plt.figure()
    plt.gray()
    plt.imshow(image)
    y_corner, x_corner = zip(*corners)
    plt.plot(x_corner, y_corner, 'or')
    plt.xlim(0, image.shape[1])
    plt.ylim(image.shape[0], 0)
    fig.set_size_inches(np.array(fig.get_size_inches()) * 1.5)
    # plt.show()


mandrill = io.imread(r'C:\Users\admin\Desktop\test.jpg')
mandrill = equalize_hist(rgb2gray(mandrill))
corners = corner_peaks(corner_harris(mandrill), min_distance=2)
show_corners(corners, mandrill)
"""
图片的像素高低是会影响兴趣点的数量的
"""

import mahotas as mh
from mahotas.features import surf

image = mh.imread(r'C:\Users\admin\Desktop\test.jpg', as_grey=True)
print '第一个SURF描述符:\n'.format(surf.surf(image)[0])
print '抽取了%s个SURF描述符' % len(surf.surf(image))

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值