Numpy的outer外积以及dtype=object的含义

 

1, numpy.outer(about=None)[source]

Compute the outer product of two vectors.

Given two vectors, a = [a0, a1, ..., aM] and b = [b0, b1, ..., bN], the outer product is:

[[a0*b0  a0*b1 ... a0*bN ]
 [a1*b0    .
 [ ...          .
 [aM*b0            aM*bN ]]

2,解释

可以想象成先把a(无论他是几维)转为一维列向量,将b(无论他是几维)转为一维行向量,然后对a这个一维列向量依次乘以b中的每一个元素就可以.

a = [[1,2],[3,4]]
b = [[1,10],[100,1000]]
outer = np.outer(a,b)
np.set_printoptions(suppress=True)#suppress=True 表示取消科学记数法显示数据
print (outer)

#运行结果
[[   1   10  100 1000]
 [   2   20  200 2000]
 [   3   30  300 3000]
 [   4   40  400 4000]]
########################################################################
x = np.array(['a', 'b', 'c'], dtype=object)
np.outer(x, [1, 2, 3])
#运行结果
array([['a', 'aa', 'aaa'],
       ['b', 'bb', 'bbb'],
       ['c', 'cc', 'ccc']], dtype=object)

3,dtype=object的含义

1,Mixed types are stored with the object dtype

pd.Series(["jack", "joe", None])

0    jack
1     joe
2    None
dtype: object

2,Pandas uses the object dtype for storing strings.

x = np.array(['a', 'b', 'c'], dtype=object)
np.outer(x, [1, 2, 3])

array([['a', 'aa', 'aaa'],
       ['b', 'bb', 'bbb'],
       ['c', 'cc', 'ccc']], dtype=object)

The dtype object comes from NumPy, it describes the type of element in a ndarray. Every element in a ndarray must has the same size in byte. For int64 and float64, they are 8 bytes. But for strings, the length of the string is not fixed. So instead of save the bytes of strings in the ndarray directly, Pandas use object ndarray, which save pointers to objects, because of this the dtype of this kind ndarray is object.

Here is an example:

  • the int64 array contains 4 int64 value.
  • the object array contains 4 pointers to 3 string objects.

NumPy arrays are stored as contiguous blocks of memory. They usually have a single datatype (e.g. integers, floats or fixed-length strings) and then the bits in memory are interpreted as values with that datatype.

Creating an array with dtype=object is different. The memory taken by the array now is filled with pointers to Python objects which are being stored elsewhere in memory (much like a Python list is really just a list of pointers to objects, not the objects themselves).

Arithmetic operators such as * don't work with arrays such as ar1 which have a string_ datatype (there are special functions instead - see below). NumPy is just treating the bits in memory as characters and the * operator doesn't make sense here. However, the line

np.array(['avinash','jay'], dtype=object) * 2

works because now the array is an array of (pointers to) Python strings. The * operator is well defined for these Python string objects. New Python strings are created in memory and a new object array with references to the new strings is returned.


If you have an array with string_ or unicode_ dtype and want to repeat each string, you can use np.char.multiply:

In [52]: np.char.multiply(ar1, 2)
Out[52]: array(['avinashavinash', 'jayjay'], 
      dtype='<U14')

https://numpy.org/devdocs/reference/arrays.dtypes.html

https://stackoverflow.com/questions/21018654/strings-in-a-dataframe-but-dtype-is-object

https://stackoverflow.com/questions/29877508/what-does-dtype-object-mean-while-creating-a-numpy-array

https://saskeli.github.io/data-analysis-with-python-summer-2019/pandas2.html 

 

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值