Compound Words

本文介绍了一种算法,用于从给定的单词集合中找出所有由两个单词组合而成的复合词。采用暴力搜索方法及map容器提高查找效率。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

描述

You are to find all the two-word compound words in a dictionary. A two-word compound word is a word in the dictionary that is the concatenation of exactly two other words in the dictionary.

题意:给定单词集合S,包含若干单词,找出S中所有满足这样条件的元素p:p==str1+str2 && str1属于S && str2属于S
解法:暴力搜;或者用set,map的查找函数,我在这里用的是map容器,如果有对map容器不了解的,可以看看本博客有关容器的介绍

输入

Standard input consists of a number of lowercase words, one per line, in alphabetical order. There will be no more than 120,000 words.

输出

Your output should contain all the compound words, one per line, in alphabetical order.

样例输入

a
alien
born
less
lien
never
nevertheless
new
newborn
the
zebra

样例输出

alien
newborn

代码:

#include<iostream>
#include<map>
#include<cstring>
#include<cstdio>
using namespace std;
char a[120010][30];
char q[100],p[100];
map<string,int>anc;//创建容器
int main()
{
    int n=0,l,j;
    while(gets(a[n])!=NULL)
    {
        anc[a[n]]=1;
        n++;
    }
    j=0;
    for(int i=0;i<n;i++)
    {
        l=strlen(a[i]);
        for(int j=0;j<l-1;j++)
        {
            for(int k=0;k<j;k++)
                q[k]=a[i][k];
                q[j]='\0';
            for(int k=j;k<l;k++)
                p[k-j]=a[i][k];
                p[l-j]='\0';
            if(anc[q]==1&&anc[p]==1)
            {
                cout<<a[i]<<endl;
                break;
            }

        }
    }
    return 0;
}


 

### BERT Vocabulary Definition and Usage In natural language processing, the concept of a vocabulary is fundamental to models like BERT (Bidirectional Encoder Representations from Transformers). The BERT model uses a specific type of tokenization method called WordPiece[^1]. This tokenizer breaks down input text into tokens that can be efficiently processed by the neural network. The BERT vocabulary consists of a predefined set of subword units or word pieces. These include whole words as well as parts of words with special characters such as prefixes or suffixes. For instance, uncommon compound words might not appear directly within this dictionary but instead are split into more common components found in it. A typical size for the BERT base model's vocabulary includes around 30,522 unique tokens which cover both regular English words along with special symbols used during training and inference phases[^2]. When applying BERT for tasks including grammatical error correction, each piece of incoming textual data gets converted using these learned mappings before being fed through layers designed specifically around transformer architecture principles[^3]. During pre-training stages involving masked language modeling objectives, certain positions within sentences get randomly obscured; then predictions attempt recovery based on contextual clues derived via bidirectionality across surrounding context windows—this process inherently leverages rich semantic representations encoded inside its extensive lexicon resource. #### Example Code Demonstrating Tokenizer Use ```python from transformers import BertTokenizer tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') text = "I love programming." tokens = tokenizer.tokenize(text) print(tokens) ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值