numpy库的实现
简单随机抽样
indexs=[numpy.random.randint(len(data)) for _ in range(k) ]
data[indexs]#data需要是narray类型
按不同概率抽样
numpy.random.choice(a,size=None,replace=None,p=None)
该函数可以根据不同的概率进行有放回和无放回抽样,这里的p需要满足sum( p )=1
np.random.choice([1,2,3],size=(6,1),replace=True,p=[0.1,0.3,0.6])
# array([[2],
# [2],
# [2],
# [2],
# [3],
# [3]])
Parameters ----------- a : 1-D array-like or int If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if a were np.arange(a) size : int or tuple of ints, optional Output shape. If the given shape is, e.g., ``(m, n, k)``, then ``m * n * k`` samples are drawn. Default is None, in which case a single value is returned. replace : boolean, optional Whether the sample is with or without replacement p : 1-D array-like, optional The probabilities associated with each entry in a. If not given the sample assumes a uniform distribution over all entries in a. Returns -------- samples : single item or ndarray The generated random samples Raises ------- ValueError If a is an int and less than zero, if a or p are not 1-dimensional, if a is an array-like of size 0, if p is not a vector of probabilities, if a and p have different lengths, or if replace=False and the sample size is greater than the population size
pandas库的实现
pandas.DataFrame.sample(n=None,frac=None,replace=False,weights=None,random_state=None,axis=None)
该函数主要针对DataFrame,可以进行纵向和横向的抽样,这里的weights只需要大于零就可以了
df=pd.DataFrame({'x':[1,2,3],'w':[9,5,3]})
df.sample(n=2,frac=None,replace=False,weights=df.w,random_state=1,axis=0)
# Out[65]:
# x w
# 0 1 9
# 1 2 5
Parameters ---------- n : int, optional Number of items from axis to return. Cannot be used with `frac`. Default = 1 if `frac` = None. frac : float, optional(抽样的比例) Fraction of axis items to return. Cannot be used with `n`. replace : boolean, optional Sample with or without replacement. Default = False. weights : str or ndarray-like, optional Default 'None' results in equal probability weighting. If passed a Series, will align with target object on index. Index values in weights not found in sampled object will be ignored and index values in sampled object not in weights will be assigned weights of zero. If called on a DataFrame, will accept the name of a column when axis = 0. Unless weights are a Series, weights must be same length as axis being sampled. If weights do not sum to 1, they will be normalized to sum to 1. Missing values in the weights column will be treated as zero. inf and -inf values not allowed. random_state : int or numpy.random.RandomState, optional Seed for the random number generator (if int), or numpy RandomState object. axis : int or string, optional Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames, 1 for Panels). Returns ------- A new object of same type as caller.
random库的实现
无放回等概率抽样
random.sample(population,k)
该函数实现了无放回等概率样本抽样,例子:
random.sample([1,2,3],2)
Chooses k unique random elements from a population sequence or set. Returns a new list containing elements from the population while leaving the original population unchanged. The resulting list is in selection order so that all sub-slices will also be valid random samples. This allows raffle winners (the sample) to be partitioned into grand prize and second place winners (the subslices). Members of the population need not be hashable or unique. If the population contains repeats, then each occurrence is a possible selection in the sample. To choose a sample in a range of integers, use range as an argument. This is especially fast and space efficient for sampling from a large population: sample(range(10000000), 60)
参考:
- pandas.DataFrame.sample的官方文件:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html
2.https://blog.csdn.net/qq_41080850/article/details/87906590?depth_1-