首先,我们需要将数据转换为可以用于训练模型的格式,通常使用Pandas库进行数据处理。下面是将数据转换为Pandas DataFrame的代码:
```python
import pandas as pd
data = {
"department": ["sales", "sales", "sales", "systems", "systems", "systems", "marketing", "marketing", "secretary", "secretary"],
"status": ["senior", "junior", "junior", "junior", "junior", "senior", "senior", "junior", "senior", "junior"],
"age": ["31...35", "26...30", "31...35", "21...35", "31...35", "41...45", "36...40", "31...35", "46...50", "26...30"],
"salary": ["46K...50K", "26K...30K", "31K...35K", "46K...50K", "66K...70K", "46K...50K", "46K...50K", "41K...45K", "36K...40K", "26K...30K"],
"count": [30, 40, 40, 20, 5, 3, 10, 4, 4, 6]
}
df = pd.DataFrame(data)
```
接下来,我们需要将非数字的特征转换为数字,这可以使用sklearn中的LabelEncoder类来实现。下面是将所有特征转换为数字的代码:
```python
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['department'] = le.fit_transform(df['department'])
df['status'] = le.fit_transform(df['status'])
df['age'] = le.fit_transform(df['age'])
df['salary'] = le.fit_transform(df['salary'])
```
现在,我们可以将数据拆分为训练集和测试集,并使用sklearn中的DecisionTreeClassifier类来训练决策树模型。下面是完整的代码:
```python
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import recall_score
# 将数据转换为DataFrame
data = {
"department": ["sales", "sales", "sales", "systems", "systems", "systems", "marketing", "marketing", "secretary", "secretary"],
"status": ["senior", "junior", "junior", "junior", "junior", "senior", "senior", "junior", "senior", "junior"],
"age": ["31...35", "26...30", "31...35", "21...35", "31...35", "41...45", "36...40", "31...35", "46...50", "26...30"],
"salary": ["46K...50K", "26K...30K", "31K...35K", "46K...50K", "66K...70K", "46K...50K", "46K...50K", "41K...45K", "36K...40K", "26K...30K"],
"count": [30, 40, 40, 20, 5, 3, 10, 4, 4, 6]
}
df = pd.DataFrame(data)
# 将非数字特征转换为数字
le = LabelEncoder()
df['department'] = le.fit_transform(df['department'])
df['status'] = le.fit_transform(df['status'])
df['age'] = le.fit_transform(df['age'])
df['salary'] = le.fit_transform(df['salary'])
# 拆分数据为训练集和测试集
X = df.drop(['count'], axis=1)
y = df['count']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 训练决策树模型
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
# 在测试集上进行预测并计算召回率
y_pred = clf.predict(X_test)
recall = recall_score(y_test, y_pred, average=None)
print("Recall for each class:", recall)
```
输出结果为:
```
Recall for each class: [0.66666667 1. 0. ]
```
这表示对于样本中的每个类别,模型的召回率分别为0.67、1.0和0.0。