我正在学习Python和Scikit learn,我正在做一些简单的练习。在特定情况下,我运行以下代码:import pandas as pd
df = pd.read_csv('SMSSpamCollection',delimiter='\t',header=None) # from UCIMachineLearningRepository http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model.logistic import LogisticRegression
from sklearn.cross_validation import train_test_split, cross_val_score
X_train_raw, X_test_raw, y_train, y_test = train_test_split(df[1], df[0])
我打印:
^{pr2}$
输出:3035 Get ready for <#> inches of pleasure...
2577 In sch but neva mind u eat 1st lor..
3302 RCT' THNQ Adrian for U text. Rgds Vatian
90 Yeah do! Don‘t stand to close tho- you‘ll catc...
2355 R we going with the <#> bus?
Name: 1, dtype: object
然后我将逐一索引X系列测试的第一个元素:X_test_raw[0]
出去'Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...'
那么X_test_raw[1]
出去'Ok lar... Joking wif u oni...'
那么X_test_raw[2]
出去KeyError: 2L
怎么回事?当我对前5个元素序列进行切片时,以及在分别索引该序列的每个元素时,为什么会返回不同的值?为什么我在索引序列的3d元素时会收到关键错误消息?在
你的建议将不胜感激