Python 使用nltk发掘文本实体间的联系

#!/usr/bin/python 
# -*- coding: utf-8 -*-

'''
Created on 2015-1-26
@author: beyondzhou
@name: entity_interaction_discovery.py
'''

import json
from interaction import extract_interactions

BLOG_DATA = r"E:\eclipse\Web\dFile\feed.json"
blog_data = json.loads(open(BLOG_DATA).read())

# Display selected interactions on a per-sentence basis
for post in blog_data:
    post.update(extract_interactions(post['content']))
    print post['title']
    print '-' * len(post['title'])
    for interactions in post['entity_interactions']:
        print '; '.join(i[0] for i in interactions)
    print

import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
  
def extract_interactions(txt):
    sentences = sent_tokenize(txt)
    tokens = [word_tokenize(s) for s in sentences]
    pos_tagged_tokens = [nltk.pos_tag(t) for t in tokens]

    entity_interactions = []
    for sentence in pos_tagged_tokens:

        all_entity_chunks = []
        previous_pos = None
        current_entity_chunk = []

        for (token, pos) in sentence:
            if pos == previous_pos and pos.startswith('NN'):
                current_entity_chunk.append(token)
            elif pos.startswith('NN'):
                if current_entity_chunk != []:
                    all_entity_chunks.append((' '.join(current_entity_chunk), pos))
                current_entity_chunk = [token]

            previous_pos = pos

        if len(all_entity_chunks) > 1:
            entity_interactions.append(all_entity_chunks)
        else:
            entity_interactions.append([])

    assert len(entity_interactions) == len(sentences)

    return dict(entity_interactions=entity_interactions, 
                 sentences=sentences)

Four short links: 23 January 2015
---------------------------------
Investment Areas — I’m; struck; cluster; cloud development; sensors; Pattern; web mining module; Python
tools; data; Google; Twitter; Wikipedia API; web crawler; HTML DOM; parser; language processing; taggers; search; sentiment analysis; WordNet ); machine learning; vector space model; SVM ); network analysis; canvas

Expectations; Brilliance Underlie Gender Distributions Across Academic Disciplines ( Science ) — Surveys; fields; attributes; brilliance; fields
fields; people; raw talent; departments; percentages


Designing on a system level
---------------------------
Andy Goodman; designer; group director
Goodman; design; teams; globe
Goodman; contributor; Designing; Emerging; Technologies; —; conversation; covers embeddables; wearables
conversation; Goodman; design”; service designer; design
it’s; system level; design; thinking; systems; computer; systems; systems; computer; systems; systems
moments; journeys; context; using different; devices; objects
idea; interactions; way; interactions
things; way; work

Bitcoin is just the first app to use blockchain technology
----------------------------------------------------------
Editor’s; Lorne Lantz; program; Radar Summit; Bitcoin; Blockchain; January
program; registration information; visit; Bitcoin; Blockchain


time; bunch; computer; geeks

bitcoin; believers

currencies; payment; networks; bitcoin; bank; government
thousands; computers; world verify; transactions; manage
technology; blockchain; pathway; time; computers
case; consensus; updates


Blockchain scalability
----------------------
Author; Vitalik Buterin
Editor’s; Kieren James-Lubin; program; Radar Summit; Bitcoin; Blockchain; January
program; registration information; visit; Bitcoin; Blockchain
talk; CoinJar; fall; bitcoin expert; Andreas Antonopoulos; comment; worries; bitcoin; reason; IPv4 can’t; day.”; issue; bitcoin scalability; phrase “blockchain scalability”; discussions
Will; requirements; bitcoin transaction; blockchain compromise; security; users; copy; blockchain; ability; number; transactions; blocks; transactions
article; we’ll; explore; meanings; scalability”; solutions
stumbling; blocks; scalability; tendency; centralization; blockchain; blockchain; requirements; storage; bandwidth; power; nodes”; network; risk; centralization; blockchain; nodes; process
issue; blockchain; limit; megabyte; block; minutes; limit; fork”

processing; fees; bitcoin; transactions; fees

consider; issues


Bringing an end to synthetic biology’s semantic debate
------------------------------------------------------------
Editor’s; podcast; part; investigation
topics; copy; edition; BioCoder; publication
Free; downloads
Tim Gardner; founder; Riffyn; Synthetic Biology Working Group; Commission Scientific Committees; biology; assess; risk assessment; methodologies; research
Gardner; Radar Podcast; episode; biology landscape; issues; research; experimentation; addressing
biology; Download; edition; areas; investigation; EU’s; Synthetic Biology Working Group
definition; reads; application; science; technology; engineering; design; manufacture and/or modification; materials; living organisms.”; Gardner; significance; definition; part; manufacture; modification; materials; living organisms.’; Biotechnologies; manipulation; biology; anything; materials; living
That’s; debate; ‘this; biology; that’s; biology; protein engineer; someone; gene; circuits; someone; protein engineer; biologist; parts libraries; modularity; boundaries
advances; capabilities



Building and deploying large-scale machine learning pipelines
-------------------------------------------------------------
algorithms; implementations; scale; data sets; list; matrix factorization; SVM; regression; LASSO
fact; machine learning; experts; problem; optimization problem
course; practice; machine; projects; optimization
Data; scientists; data projects; problems; machine learning
Decisions; stage affect; things; downstream; interactions; parts; pipeline; area

Source; Ben Recht
World New York; presentation; UC Berkeley Professor Ben Recht; UC Berkeley AMPLab; building; machine learning
ties; Spark; community; ideas; projects


Four short links: 22 January 2015
---------------------------------
Microsoft HoloLens Goggles ( Wired ) —; media release; thing; person
I’m; investors; sure




climb; irrelevance
Facebook ( YouTube ) —; brilliant fake; ad

Natural Language; Social Robotics ( Robohub ) — Natural; language; interfaces
GUI; command line; NLP; robots; Internet; things; wearables; systems; Apple’s; Siri; Google’s Now; Microsoft’s Cortana; Nuance’s Nina; Amazon’s Echo

Natural; language interaction; robots; anything
It’s; form; UX
Microservices; Testing ( Martin Fowler ) —; testing; component; boundaries; face; data stores; HTTP
discussion; testing; world

How to make a UX designer
-------------------------

case; Heather Wydeven; UX; designer; Nerdery; UX; theater
interview; Wydeven; time; route; UX; design; entering; UX; field; designers; designers
spending; years; theater; Wydeven; skills; career
UX; design; UX; root; motivation; something; UX; designers; recognition; things; desire
design; Wydeven; “I; web design; UX; design; time; know; websites; experiences; websites; ‘There’s
job; websites

The 3Ps of the blockchain: platforms, programs and protocols
------------------------------------------------------------
buzz lingo; Service”; BaaS; Platform”; BaaP; burgeoning landscape; implementations; activity; blockchain’s; consensus protocol
blockchain’s; sweet spot; development platform; “ Understanding; blockchain; surprise; landscape; platforms; protocols; smart
Breaking-up; paradigm; perfect world; blockchain

adoption; mass; users; market; diversification; choices; bitcoin currency; blockchain protocol


Four short links: 21 January 2015
---------------------------------
PC; Mouse —; PC




G+; Usage ( BoingBoing ) —; profiles

Medium; Data —; machine
New Hardware; DARPA Robotics Challenge Finals ( IEEE ) —; future; we’ll; kwh battery; wireless router

The Internet of Things is really about software
-----------------------------------------------
Download; report; Internet
cover; Harvard Business Review; November; observers; demo

years; companies; developments
goods; software; realm
Internet; Things; changes; kinds; software intelligence; have; industries; finance
Mike Loukides; idea; Internet; Things; impacts; report; Internet; Things
romance; gratification; hardware; Internet; Things; software; hardware; Internet; rest
IoT; area; software; it’s; characteristics; things; seen; web software


What containers can do for you
------------------------------
days; headline; “the; container; engine; buzz; CoreOS; project; Rocket
technology; containers; advantages; containers
inherent portability; opportunities; organizations
Containerization; moment; there’s


Four short links: 20 January 2015
---------------------------------

Mind; Eyes; Reading
Theory; Mind; Predicts; Collective Intelligence ( PLoS ) —; theory; mind; abilities; determinant; group; intelligence; online; groups; group; communication
Phone/Skype; emails; chats; activities; person
MIT; Faculty Search —; gigs; MIT; climate change; “undefined.”; Great

— evaluation; systems
Folks; scalability; fact; scalability; end; performance
performance; benefits; systems


Striking parallels between mathematics and software engineering
---------------------------------------------------------------
Editor’s; Alice Zheng; part; team teaching; Machine Learning Day; Strata + Hadoop World
Visit; Strata + Hadoop World; website; information
year; graduate school; epiphany; mathematics; whole perspective
study machine; research area; combines elements; computer science; statistics; subfields; mathematics; optimization
lot; students; deluge
night; office
textbook; guide; Introduction; Linear Algebra

definitions; eigen decomposition; Jordan; forms; matrix; inversions


Come; matrix; operations
hopeless wall; symbols; flash; went
insight; math
moment; mathematics; truth; transcendent; perfection
night; mathematics
Math; software; programs; design




Four short links: 19 January 2015
---------------------------------
Reset; Rowan Simpson ) — It; bit; years; worth; tweets
water cooler; fine; somebody; spends; day
Google’s; Brain —; subject; Google’s; AI; ethics; committee; Q; Will


Q; Transparency


AVA; Source ( Laura Bell ) — Assessment; Visualization; Analysis
AVA; realities; organisation; structures
map; people; entities; suite; information security awareness
Deep Learning; Torch; Facebook; Facebook AI Research; sources; deep; modules; Torch; computing framework; support; machine

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值