stop words是指自然语言处理当中会被过滤掉的一些单词,一般是指无意义的定冠词,不定冠词(a,an,the
), 连接词(of,but...
),这个并没有统一的标准,而是针对具体的任务和文档来说,那些高频经常出现的词语因为对具体任务来说其实没有帮助(比如文档分类,几乎每个文档都有上面提到的词语,对分类没有任何帮助),所以在处理的时候会去掉这些单词,来提升针对性任务的结果。虽然没有统一的stop word list,但是有很多小众使用的stop words,下面列出一部分,具体参见中英文停止词表
able
about
above
according
accordingly
across
actually
after
afterwards
again
against
ain't
all
allow
allows
almost
alone
along
already
also
although
always
am
among
amongst
an
and
another
any
anybody
anyhow
anyone
anything
anyway
anyways
anywhere
apart
appear
appreciate
appropriate
are
aren't
around
as
a's
aside
ask
asking
associated
at
available
away
awfully
be
became
because
become
becomes
becoming
been
before
beforehand
behind
being
believe
below
beside
besides
best
better
between
beyond
both
brief
but
by
came
can
cannot
cant
can't
cause
causes
certain
相信通过上面,大家应该能理解stop words的意思了。