1, numpy.
outer
(a, b, out=None)[source]
Compute the outer product of two vectors.
Given two vectors, a = [a0, a1, ..., aM]
and b = [b0, b1, ..., bN]
, the outer product is:
[[a0*b0 a0*b1 ... a0*bN ] [a1*b0 . [ ... . [aM*b0 aM*bN ]]
2,解释
可以想象成先把a(无论他是几维)转为一维列向量,将b(无论他是几维)转为一维行向量,然后对a这个一维列向量依次乘以b中的每一个元素就可以.
a = [[1,2],[3,4]]
b = [[1,10],[100,1000]]
outer = np.outer(a,b)
np.set_printoptions(suppress=True)#suppress=True 表示取消科学记数法显示数据
print (outer)
#运行结果
[[ 1 10 100 1000]
[ 2 20 200 2000]
[ 3 30 300 3000]
[ 4 40 400 4000]]
########################################################################
x = np.array(['a', 'b', 'c'], dtype=object)
np.outer(x, [1, 2, 3])
#运行结果
array([['a', 'aa', 'aaa'],
['b', 'bb', 'bbb'],
['c', 'cc', 'ccc']], dtype=object)
3,dtype=object的含义
1,Mixed types are stored with the object dtype
pd.Series(["jack", "joe", None])
0 jack 1 joe 2 None dtype: object2,Pandas uses the object dtype for storing strings.
x = np.array(['a', 'b', 'c'], dtype=object)
np.outer(x, [1, 2, 3])array([['a', 'aa', 'aaa'],
['b', 'bb', 'bbb'],
['c', 'cc', 'ccc']], dtype=object)
The dtype object comes from NumPy, it describes the type of element in a ndarray. Every element in a ndarray must has the same size in byte. For int64 and float64, they are 8 bytes. But for strings, the length of the string is not fixed. So instead of save the bytes of strings in the ndarray directly, Pandas use object ndarray, which save pointers to objects, because of this the dtype of this kind ndarray is object.
Here is an example:
- the int64 array contains 4 int64 value.
- the object array contains 4 pointers to 3 string objects.
NumPy arrays are stored as contiguous blocks of memory. They usually have a single datatype (e.g. integers, floats or fixed-length strings) and then the bits in memory are interpreted as values with that datatype.
Creating an array with dtype=object
is different. The memory taken by the array now is filled with pointers to Python objects which are being stored elsewhere in memory (much like a Python list
is really just a list of pointers to objects, not the objects themselves).
Arithmetic operators such as *
don't work with arrays such as ar1
which have a string_
datatype (there are special functions instead - see below). NumPy is just treating the bits in memory as characters and the *
operator doesn't make sense here. However, the line
np.array(['avinash','jay'], dtype=object) * 2
works because now the array is an array of (pointers to) Python strings. The *
operator is well defined for these Python string objects. New Python strings are created in memory and a new object
array with references to the new strings is returned.
If you have an array with string_
or unicode_
dtype and want to repeat each string, you can use np.char.multiply
:
In [52]: np.char.multiply(ar1, 2)
Out[52]: array(['avinashavinash', 'jayjay'],
dtype='<U14')
https://numpy.org/devdocs/reference/arrays.dtypes.html
https://stackoverflow.com/questions/21018654/strings-in-a-dataframe-but-dtype-is-object
https://saskeli.github.io/data-analysis-with-python-summer-2019/pandas2.html