python np hstack_python - Numpy hstack - “ValueError：所有输入数组必须具有相同数量的维度” - 但它们确实如此 - 堆栈内存溢出...

最新推荐文章于 2024-01-12 10:16:29 发布

weixin_39664774

最新推荐文章于 2024-01-12 10:16:29 发布

阅读量1.2k

点赞数

文章标签： python np hstack

我想加入两个numpy数组。在一个文本中运行TF-IDF后，我有一组列/功能。在另一个我有一个列/功能是一个整数。所以我读了一列火车和测试数据，在这上面运行TF-IDF，然后我想添加另一个整数列，因为我认为这将有助于我的分类器更准确地了解它应该如何表现。

不幸的是，当我尝试运行hstack将此单列添加到我的其他numpy数组时，我在标题中收到错误。

这是我的代码：

#reading in test/train data for TF-IDF

traindata = list(np.array(p.read_csv('FinalCSVFin.csv', delimiter=";"))[:,2])

testdata = list(np.array(p.read_csv('FinalTestCSVFin.csv', delimiter=";"))[:,2])

#reading in labels for training

y = np.array(p.read_csv('FinalCSVFin.csv', delimiter=";"))[:,-2]

#reading in single integer column to join

AlexaTrainData = p.read_csv('FinalCSVFin.csv', delimiter=";")[["alexarank"]]

AlexaTestData = p.read_csv('FinalTestCSVFin.csv', delimiter=";")[["alexarank"]]

AllAlexaAndGoogleInfo = AlexaTestData.append(AlexaTrainData)

tfv = TfidfVectorizer(min_df=3, max_features=None, strip_accents='unicode',

analyzer='word',token_pattern=r'\w{1,}',ngram_range=(1, 2), use_idf=1,smooth_idf=1,sublinear_tf=1) #tf-idf object

rd = lm.LogisticRegression(penalty='l2', dual=True, tol=0.0001,

C=1, fit_intercept=True, intercept_scaling=1.0,

class_weight=None, random_state=None) #Classifier

X_all = traindata + testdata #adding test and train data to put into tf-idf

lentrain = len(traindata) #find length of train data

tfv.fit(X_all) #fit tf-idf on all our text

X_all = tfv.transform(X_all) #transform it

X = X_all[:lentrain] #reduce to size of training set

AllAlexaAndGoogleInfo = AllAlexaAndGoogleInfo[:lentrain] #reduce to size of training set

X_test = X_all[lentrain:] #reduce to size of training set

#printing debug info, output below :

print "X.shape => " + str(X.shape)

print "AllAlexaAndGoogleInfo.shape => " + str(AllAlexaAndGoogleInfo.shape)

print "X_all.shape => " + str(X_all.shape)

#line we get error on

X = np.hstack((X, AllAlexaAndGoogleInfo))

以下是输出和错误消息：

X.shape => (7395, 238377)

AllAlexaAndGoogleInfo.shape => (7395, 1)

X_all.shape => (10566, 238377)

---------------------------------------------------------------------------

ValueError Traceback (most recent call last)

in ()

31 print "X_all.shape => " + str(X_all.shape)

32 #X = np.column_stack((X, AllAlexaAndGoogleInfo))

---> 33 X = np.hstack((X, AllAlexaAndGoogleInfo))

34 sc = preprocessing.StandardScaler().fit(X)

35 X = sc.transform(X)

C:\Users\Simon\Anaconda\lib\site-packages\numpy\core\shape_base.pyc in hstack(tup)

271 # As a special case, dimension 0 of 1-dimensional arrays is "horizontal"

272 if arrs[0].ndim == 1:

--> 273 return _nx.concatenate(arrs, 0)

274 else:

275 return _nx.concatenate(arrs, 1)

ValueError: all the input arrays must have same number of dimensions

是什么原因引起了我的问题？我怎样才能解决这个问题？据我所知，我应该能够加入这些专栏？我误解了什么？

谢谢。

编辑：

使用下面的答案中的方法会收到以下错误：

---------------------------------------------------------------------------

ValueError Traceback (most recent call last)

in ()

---> 36 X = np.column_stack((X, AllAlexaAndGoogleInfo))

37 sc = preprocessing.StandardScaler().fit(X)

38 X = sc.transform(X)

C:\Users\Simon\Anaconda\lib\site-packages\numpy\lib\shape_base.pyc in column_stack(tup)

294 arr = array(arr,copy=False,subok=True,ndmin=2).T

295 arrays.append(arr)

--> 296 return _nx.concatenate(arrays,1)

297

298 def dstack(tup):

ValueError: all the input array dimensions except for the concatenation axis must match exactly

有趣的是，我试图打印X的dtype ，这很好用：

X.dtype => float64

但是，尝试打印AllAlexaAndGoogleInfo如下所示：

print "AllAlexaAndGoogleInfo.dtype => " + str(AllAlexaAndGoogleInfo.dtype)

产生：

'DataFrame' object has no attribute 'dtype'

weixin_39664774

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。