再来一个免费词频表，学英语必备。

最新推荐文章于 2019-10-04 10:59:20 发布

love_hot_girl

最新推荐文章于 2019-10-04 10:59:20 发布

阅读量4.6k

点赞数

文章标签： access web

N-GRAMS from the COCA and COHA corpora of American English
home	compare to Google	samples	using the data	historical (COHA)	non-English	free downloads	purchase

These n-grams are based on the largest publicly-available, genre-balanced corpus of English -- the 450 million word Corpus of Contemporary American English (COCA). With this n-grams data (2, 3, 4, 5-word sequences, with their frequency), you can carry out powerful queries offline -- without needing to access the corpus via the web interface.

A few examples (from among an unlimited number of searches) might be:

NOUN + NOUN sequences	three word strings with a preposition in the middle position
VERB + the + NOUN sequences	two word strings, where the words begin or end with certain letters
like + word + word	(potential) phrasal verb: VERB + ADV particle

The data is available in several different formats:

1	Free lists	1 million most frequent 2, 3, 4, and 5-grams
2	Inexpensive data sets	All n-grams that occur three times or more: 6.2 million 2-grams, 11.9 million 3-grams, and 8.3million 4-grams
3	All 2, 3, and 4-grams	Up to 155 million distinct strings -- searchable by word form and part of speech (as above), and also lemma

If you're interested in the frequency of single words (including frequency by genre and sub-genre), or collocates (all words "near by" a given word), you might look at http://www.wordfrequency.info.