python中实现简单抽样的函数

numpy库的实现

简单随机抽样

indexs=[numpy.random.randint(len(data)) for _ in range(k) ]
data[indexs]#data需要是narray类型

按不同概率抽样

numpy.random.choice(a,size=None,replace=None,p=None)

该函数可以根据不同的概率进行有放回和无放回抽样,这里的p需要满足sum( p )=1

np.random.choice([1,2,3],size=(6,1),replace=True,p=[0.1,0.3,0.6])
# array([[2],
#        [2],
#        [2],
#        [2],
#        [3],
#        [3]])
Parameters
-----------
a : 1-D array-like or int
    If an ndarray, a random sample is generated from its elements.
    If an int, the random sample is generated as if a were np.arange(a)
size : int or tuple of ints, optional
    Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
    ``m * n * k`` samples are drawn.  Default is None, in which case a
    single value is returned.
replace : boolean, optional
    Whether the sample is with or without replacement
p : 1-D array-like, optional
    The probabilities associated with each entry in a.
    If not given the sample assumes a uniform distribution over all
    entries in a.

Returns
--------
samples : single item or ndarray
    The generated random samples

Raises
-------
ValueError
    If a is an int and less than zero, if a or p are not 1-dimensional,
    if a is an array-like of size 0, if p is not a vector of
    probabilities, if a and p have different lengths, or if
    replace=False and the sample size is greater than the population
    size

pandas库的实现

 pandas.DataFrame.sample(n=None,frac=None,replace=False,weights=None,random_state=None,axis=None)

该函数主要针对DataFrame,可以进行纵向和横向的抽样,这里的weights只需要大于零就可以了

df=pd.DataFrame({'x':[1,2,3],'w':[9,5,3]})
df.sample(n=2,frac=None,replace=False,weights=df.w,random_state=1,axis=0)
# Out[65]: 
#    x  w
# 0  1  9
# 1  2  5
Parameters
----------
n : int, optional
    Number of items from axis to return. Cannot be used with `frac`.
    Default = 1 if `frac` = None.
frac : float, optional(抽样的比例)
    Fraction of axis items to return. Cannot be used with `n`.
replace : boolean, optional
    Sample with or without replacement. Default = False.
weights : str or ndarray-like, optional
    Default 'None' results in equal probability weighting.
    If passed a Series, will align with target object on index. Index
    values in weights not found in sampled object will be ignored and
    index values in sampled object not in weights will be assigned
    weights of zero.
    If called on a DataFrame, will accept the name of a column
    when axis = 0.
    Unless weights are a Series, weights must be same length as axis
    being sampled.
    If weights do not sum to 1, they will be normalized to sum to 1.
    Missing values in the weights column will be treated as zero.
    inf and -inf values not allowed.
random_state : int or numpy.random.RandomState, optional
    Seed for the random number generator (if int), or numpy RandomState
    object.
axis : int or string, optional
    Axis to sample. Accepts axis number or name. Default is stat axis
    for given data type (0 for Series and DataFrames, 1 for Panels).

Returns
-------
A new object of same type as caller.

random库的实现

无放回等概率抽样

 random.sample(population,k)

该函数实现了无放回等概率样本抽样,例子:

random.sample([1,2,3],2)
Chooses k unique random elements from a population sequence or set.

Returns a new list containing elements from the population while
leaving the original population unchanged.  The resulting list is
in selection order so that all sub-slices will also be valid random
samples.  This allows raffle winners (the sample) to be partitioned
into grand prize and second place winners (the subslices).

Members of the population need not be hashable or unique.  If the
population contains repeats, then each occurrence is a possible
selection in the sample.

To choose a sample in a range of integers, use range as an argument.
This is especially fast and space efficient for sampling from a
large population:   sample(range(10000000), 60)

参考:

  1. pandas.DataFrame.sample的官方文件:
    https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html

2.https://blog.csdn.net/qq_41080850/article/details/87906590?depth_1-

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值