python中实现简单抽样的函数

最新推荐文章于 2024-01-28 08:05:50 发布

zoujiahui_2018

最新推荐文章于 2024-01-28 08:05:50 发布

阅读量3.4k

点赞数

分类专栏： python

原文链接：https://blog.csdn.net/qq_41080850/article/details/87906590?depth_1-

版权

python 专栏收录该内容

64 篇文章 2 订阅

订阅专栏

numpy库的实现

简单随机抽样

indexs=[numpy.random.randint(len(data)) for _ in range(k) ]
data[indexs]#data需要是narray类型

按不同概率抽样

numpy.random.choice(a,size=None,replace=None,p=None)

该函数可以根据不同的概率进行有放回和无放回抽样，这里的p需要满足sum( p )=1

np.random.choice([1,2,3],size=(6,1),replace=True,p=[0.1,0.3,0.6])
# array([[2],
#        [2],
#        [2],
#        [2],
#        [3],
#        [3]])

Parameters
-----------
a : 1-D array-like or int
    If an ndarray, a random sample is generated from its elements.
    If an int, the random sample is generated as if a were np.arange(a)
size : int or tuple of ints, optional
    Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
    ``m * n * k`` samples are drawn.  Default is None, in which case a
    single value is returned.
replace : boolean, optional
    Whether the sample is with or without replacement
p : 1-D array-like, optional
    The probabilities associated with each entry in a.
    If not given the sample assumes a uniform distribution over all
    entries in a.

Returns
--------
samples : single item or ndarray
    The generated random samples

Raises
-------
ValueError
    If a is an int and less than zero, if a or p are not 1-dimensional,
    if a is an array-like of size 0, if p is not a vector of
    probabilities, if a and p have different lengths, or if
    replace=False and the sample size is greater than the population
    size

pandas库的实现

 pandas.DataFrame.sample(n=None,frac=None,replace=False,weights=None,random_state=None,axis=None)

该函数主要针对DataFrame，可以进行纵向和横向的抽样，这里的weights只需要大于零就可以了

df=pd.DataFrame({'x':[1,2,3],'w':[9,5,3]})
df.sample(n=2,frac=None,replace=False,weights=df.w,random_state=1,axis=0)
# Out[65]: 
#    x  w
# 0  1  9
# 1  2  5

Parameters
----------
n : int, optional
    Number of items from axis to return. Cannot be used with `frac`.
    Default = 1 if `frac` = None.
frac : float, optional（抽样的比例）
    Fraction of axis items to return. Cannot be used with `n`.
replace : boolean, optional
    Sample with or without replacement. Default = False.
weights : str or ndarray-like, optional
    Default 'None' results in equal probability weighting.
    If passed a Series, will align with target object on index. Index
    values in weights not found in sampled object will be ignored and
    index values in sampled object not in weights will be assigned
    weights of zero.
    If called on a DataFrame, will accept the name of a column
    when axis = 0.
    Unless weights are a Series, weights must be same length as axis
    being sampled.
    If weights do not sum to 1, they will be normalized to sum to 1.
    Missing values in the weights column will be treated as zero.
    inf and -inf values not allowed.
random_state : int or numpy.random.RandomState, optional
    Seed for the random number generator (if int), or numpy RandomState
    object.
axis : int or string, optional
    Axis to sample. Accepts axis number or name. Default is stat axis
    for given data type (0 for Series and DataFrames, 1 for Panels).

Returns
-------
A new object of same type as caller.

random库的实现

无放回等概率抽样

 random.sample(population,k)

该函数实现了无放回等概率样本抽样，例子：

random.sample([1,2,3],2)

Chooses k unique random elements from a population sequence or set.

Returns a new list containing elements from the population while
leaving the original population unchanged.  The resulting list is
in selection order so that all sub-slices will also be valid random
samples.  This allows raffle winners (the sample) to be partitioned
into grand prize and second place winners (the subslices).

Members of the population need not be hashable or unique.  If the
population contains repeats, then each occurrence is a possible
selection in the sample.

To choose a sample in a range of integers, use range as an argument.
This is especially fast and space efficient for sampling from a
large population:   sample(range(10000000), 60)

参考：

pandas.DataFrame.sample的官方文件：
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html

2.https://blog.csdn.net/qq_41080850/article/details/87906590?depth_1-

zoujiahui_2018

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
python中实现简单抽样的函数

numpy库的实现简单随机抽样indexs=[numpy.random.randint(len(data)) for _ in range(k) ]data[indexs]#data需要是narray类型按不同概率抽样numpy.random.choice(a,size=None,replace=None,p=None)该函数可以根据不同的概率进行有放回和无放回抽样，这里的p需要满...
复制链接

扫一扫

专栏目录