C++学习之路 | PTA(甲级)—— 1107 Social Clusters (30分)(带注释)(并查集)(精简)

1107 Social Clusters (30分)
When register on a social network, you are always asked to specify your hobbies in order to find some potential friends with the same hobbies. A social cluster is a set of people who have some of their hobbies in common. You are supposed to find all the clusters.
Input Specification:

Each input file contains one test case. For each test case, the first line contains a positive integer N (≤1000), the total number of people in a social network. Hence the people are numbered from 1 to N. Then N lines follow, each gives the hobby list of a person in the format:
K
​i
​​ : h
​i
​​ [1] h
​i
​​ [2] … h
​i
​​ [K
​i
​​ ]
where K
​i
​​ (>0) is the number of hobbies, and h
​i
​​ [j] is the index of the j-th hobby, which is an integer in [1, 1000].
Output Specification:

For each case, print in one line the total number of clusters in the network. Then in the second line, print the numbers of people in the clusters in non-increasing order. The numbers must be separated by exactly one space, and there must be no extra space at the end of the line.
Sample Input:

8
3: 2 7 10
1: 4
2: 5 3
1: 4
1: 3
1: 4
4: 6 8 1 5
1: 4
Sample Output:

3
4 3 1

#include<iostream>
#include<vector>
#include<algorithm>
using namespace std;
const int maxn = 1002;
int course[maxn]{ 0 };//存储该兴趣爱好的人编号
int pre[maxn]{ 0 };
vector<int>v(maxn);//存储不同社交集群的人数
int find(int x)//并查集模板,查找
{
	while (x != pre[x]) x = pre[x];
	return x;
}
void merge(int a, int b)//并查集模板,合并
{
	int x = find(a);
	int y = find(b);
	if (x != y) pre[x] = y;
}
bool cmp(int a, int b)//自定义排序
{
	return a > b;//加条件回报段错误
}
int main()
{
	int n, k, value;//value存储兴趣爱好编号
	for (int i = 1; i < maxn; i++) pre[i] = i;//初始化pre数组
	cin >> n;
	for (int i = 1; i <= n; i++)
	{
		cin >> k;
		getchar();
		for (int j = 0; j < k; j++)
		{
			cin >> value;
			if (course[value] == 0) course[value] = i;//如果该兴趣爱好还没有划分组,进行另它为i号人的兴趣,course[value]=i;
			else merge(i, course[value]);//否则,将i和course[value]的人编号合并。
		}
	}
	int count = 0;//存储集群数
	for (int i = 1; i <= n; i++)
	{
		int fa = find(i);//查找老大
		if (v[fa] == 0) count++;//如果没被访问过(人员为0),集群数加一
		v[fa]++;//人员数加加
	}
	cout << count << endl;
	sort(v.begin()+1, v.end(), cmp);//降序
	int flag = 0;//第一个数不能是空格
	for (int i = 1; i <= count; i++)
	{
		if (v[i] != 0)
		{
			if (flag++ == 0)cout << v[i];
			else cout << " " << v[i];
		}
	}

}
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
好的,以下是一个使用机器学习方法构建行业情感词典的Python代码,希望对您有所帮助: 首先,我们需要导入必要的库: ```python import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.cluster import KMeans from sklearn.metrics import silhouette_score ``` 然后,我们需要读取行业文本数据集。假设我们有一个名为“industry_data.csv”的数据集,它由两列组成,第一列是文本内容,第二列是情感标签(例如0表示负面情感,1表示中性情感,2表示正面情感)。 ```python # 读取行业文本数据集 data = pd.read_csv('industry_data.csv') ``` 接下来,我们需要将数据集拆为训练集和测试集。这里我们选择使用train_test_split函数来拆数据集,将80%的数据用于训练,20%的数据用于测试。 ```python # 拆训练集和测试集 train_data, test_data, train_labels, test_labels = train_test_split(data['text'], data['label'], test_size=0.2, random_state=42) ``` 然后,我们需要使用TfidfVectorizer将文本数据转换为数值特征。这里我们选择使用TfidfVectorizer是因为它可以将文本转换为基于TF-IDF的数值特征,这些特征可以用于训练机器学习模型。 ```python # 将文本转换为数值特征 vectorizer = TfidfVectorizer() train_features = vectorizer.fit_transform(train_data) test_features = vectorizer.transform(test_data) ``` 接下来,我们选择使用KMeans聚类算法将文本数据聚类为若干个簇。KMeans算法是一种无监督学习算法,它可以将输入的数据集聚类为K个簇,其中每个簇代表一种行业情感。 ```python # 使用KMeans算法将文本聚类为K个簇 num_clusters = 3 # 假设我们要将文本聚为3个簇 model = KMeans(n_clusters=num_clusters, random_state=42) model.fit(train_features) ``` 接下来,我们需要为每个簇配一个情感标签。这里我们选择使用轮廓系数作为评估指标,它可以告诉我们聚类结果的稳定性和紧密度。 ```python # 为每个簇配情感标签 train_cluster_labels = model.predict(train_features) train_silhouette_score = silhouette_score(train_features, train_cluster_labels) print('Train Silhouette Score:', train_silhouette_score) # 在测试集上评估模型性能 test_cluster_labels = model.predict(test_features) test_silhouette_score = silhouette_score(test_features, test_cluster_labels) print('Test Silhouette Score:', test_silhouette_score) ``` 最后,我们可以将每个簇中的高权重特征作为该簇的情感词。这里我们选择使用TfidfVectorizer.get_feature_names()方法来获取特征名称,然后根据特征权重排序来选择前N个特征作为情感词。 ```python # 获取每个簇的情感词 for i in range(num_clusters): cluster_features = train_features[train_cluster_labels == i] cluster_feature_names = vectorizer.get_feature_names() cluster_weights = np.asarray(cluster_features.mean(axis=0)).ravel().tolist() cluster_weights_df = pd.DataFrame({'feature_names': cluster_feature_names, 'weights': cluster_weights}) cluster_weights_df = cluster_weights_df.sort_values(by='weights', ascending=False) top_n_words = 10 # 假设我们选取每个簇的前10个特征作为情感词 cluster_top_n_words = cluster_weights_df.head(top_n_words)['feature_names'].tolist() print('Cluster %d:' % i) print(cluster_top_n_words) ``` 完整代码如下: ```python import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.cluster import KMeans from sklearn.metrics import silhouette_score # 读取行业文本数据集 data = pd.read_csv('industry_data.csv') # 拆训练集和测试集 train_data, test_data, train_labels, test_labels = train_test_split(data['text'], data['label'], test_size=0.2, random_state=42) # 将文本转换为数值特征 vectorizer = TfidfVectorizer() train_features = vectorizer.fit_transform(train_data) test_features = vectorizer.transform(test_data) # 使用KMeans算法将文本聚类为K个簇 num_clusters = 3 # 假设我们要将文本聚为3个簇 model = KMeans(n_clusters=num_clusters, random_state=42) model.fit(train_features) # 为每个簇配情感标签 train_cluster_labels = model.predict(train_features) train_silhouette_score = silhouette_score(train_features, train_cluster_labels) print('Train Silhouette Score:', train_silhouette_score) # 在测试集上评估模型性能 test_cluster_labels = model.predict(test_features) test_silhouette_score = silhouette_score(test_features, test_cluster_labels) print('Test Silhouette Score:', test_silhouette_score) # 获取每个簇的情感词 for i in range(num_clusters): cluster_features = train_features[train_cluster_labels == i] cluster_feature_names = vectorizer.get_feature_names() cluster_weights = np.asarray(cluster_features.mean(axis=0)).ravel().tolist() cluster_weights_df = pd.DataFrame({'feature_names': cluster_feature_names, 'weights': cluster_weights}) cluster_weights_df = cluster_weights_df.sort_values(by='weights', ascending=False) top_n_words = 10 # 假设我们选取每个簇的前10个特征作为情感词 cluster_top_n_words = cluster_weights_df.head(top_n_words)['feature_names'].tolist() print('Cluster %d:' % i) print(cluster_top_n_words) ``` 希望这份代码能对您有所帮助,如果您有任何问题或需求,请随时告诉我。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值