One of the most populair Ruby on Rails plugins is acts_as_taggable. More specifically the enhanced version acts_as_taggable on steriods. It allows you to easily add tags to any model and helps you generate tag clouds.
After reading Nate Koechley article on Yahoo’s term extractor API i was inspired to connect it to acts_as_taggable_on_steroids. The goal is to parse any peace of content (article, blog post, review etc.) and let Yahoo return a list of terms or tags.
Setting up
The first thing you need to do is install the acts_as_taggable_on_steroids plugin.
Rails 2.* ruby script/plugin install git://github.com/jviney/acts_as_taggable_on_steroids.git Rails 3: rails plugin install git://github.com/jviney/acts_as_taggable_on_steroids.git
Prepare the database
Next we generate and apply the migration:
ruby script/generate acts_as_taggable_migration
rake db:migrate
Define which model you want to use
class Post <ActiveRecord::Base acts_as_taggable belongs_to :userend
You can now use the tagging methods provided by acts_as_taggable, #tag_list and #tag_list=. Both these methods work like regular attribute accessors.
p = Post.find(:first)
p.tag_list # []
p.tag_list = “Funny, Silly”
p.save
p.tag_list # ["Funny", "Silly"]
You can also add or remove arrays of tags.
p.tag_list.add(“Great”, “Awful”)
p.tag_list.remove(“Funny”)
Sign up at Yahoo
You need to sign up for an application ID at Yahoo! Web Services. You need the application ID as an identifier when making API call.
Add parse_tags function
I have written a small function that you can place inside your Posts model and use to create the API call for you.
class Post <ActiveRecord::Base acts_as_taggable belongs_to :user defself.parse_tags(context, query) tags = [] #yahoo parsing of tags url = URI.parse('http://search.yahooapis.com/ContentAnalysisService/V1/termExtraction') post_args = {'appid'=>'your application ID', 'context'=> context, 'query'=> query } resp, data = Net::HTTP.post_form(url, post_args) # extract event information doc = REXML::Document.new(data) doc.elements.each('ResultSet/Result')do|element| tags << element.textend return tags endend
Context is the text part you would like to scan and query is an optional query to help with the extraction process. (usually a title or subject)
Lets say you have a post model with a body and title column. Now you can do the following:
p = Post.new
p.body = “Italian sculptors and painters of the renaissance favored the Virgin Mary for inspiration.”
p.title = “madonna”
p.tag_list = Post.parse(p.body, p.title)
p.tag_list # “italian sculptors, virgin mary, painters, renaissance, inspiration”
Note: Rate Limit
The Term Extraction service is limited to 5,000 queries per IP address per day.
Sources: Yahoo! Search Web Services, acts_as_taggable on steroids