<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Flying memes &#187; wordnet</title>
	<atom:link href="http://sandropaganotti.com/tag/wordnet/feed/" rel="self" type="application/rss+xml" />
	<link>http://sandropaganotti.com</link>
	<description>Just another WordPress weblog</description>
	<lastBuildDate>Sun, 25 Jul 2010 12:45:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>A semantic experiment for separate good from bad.</title>
		<link>http://sandropaganotti.com/2010/05/04/a-semantic-experiment-for-separate-good-from-bad/</link>
		<comments>http://sandropaganotti.com/2010/05/04/a-semantic-experiment-for-separate-good-from-bad/#comments</comments>
		<pubDate>Tue, 04 May 2010 11:37:35 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Algoritmi]]></category>
		<category><![CDATA[Semantic Relations]]></category>
		<category><![CDATA[wordnet]]></category>

		<guid isPermaLink="false">http://sandropaganotti.com/?p=379</guid>
		<description><![CDATA[Yesterday was sunday and I came up with a fascinating idea: what happens if I use wordnet to measure the distance between two words ? By assigning weights to all the relation types and by navigate this relations graph I thought to be able to measure the distance between a word and the others in [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday was sunday and I came up with a fascinating idea: what happens if I use <a href="http://github.com/roja/words">wordnet</a> to measure the distance between two words ? By assigning  weights to all the relation types and by navigate this relations graph I  thought to be able to measure the distance between a word and the others  in terms of the minimum sum of weights of the edges between each pair made of the chosen word and another.</p>
<p><span id="more-379"></span><br />
So I tried to assign weights using the relation type as discriminator, to make an example take the word &#8216;sword&#8217; and its relations:</p>
<pre><code>
related-term        relation type           assigned weight
weapon:             hypernym                5
backsword:          hyponym                 2
blade:              part_meronym            3
broadsword:         hyponym                 2
cavalry sword:      hyponym                 2
cutlas:             hyponym                 2
Excalibur:          instance_hyponym        2
falchion:           hyponym                 2
fencing sword:      hyponym                 2
foible:             part_meronym            3
forte:              part_meronym            3
haft:               part_meronym            3
hilt:               part_meronym            3
rapier:             hyponym                 2
point:              part_meronym            3
</code></pre>
<p>The weight I choose for each of the relation types tried to follow the statement<br />
&#8216;The more the words are related the less greater the number is&#8217;; so weapon is less<br />
related to sword than broadsword because the first express a concept broader than sword<br />
(also a nuclear bomb is a weapon); the second instead detail the word &#8216;sword&#8217; and<br />
make true the statement &#8216;A broadsword is always a sword&#8217; so it&#8217;s more related to the<br />
chosen word.</p>
<p>By following this general rule I associated a weight to each of the most common  relation types and wrote down a few lines of code in order to compute weights by navigate the relation graph:</p>
<pre>
<code class="ruby">
def compute_distances(weights = :default, max_depth_allowed = 6)

  # retrieve the list of the weights associated to each relation type
  # (its just an hash {:relation_type => weight})
  weights = CONFIG_FILE['distance']["#{weights}"]
  data = Words::Wordnet.new

  # get a list of sysnsets as a starting point (eg: red, crimson)
  synsets_to_analyze = self.synsets.map{|s| [s.synset_id,0,0]}
  synsets_to_store   = []

  # process the first element of the list
  # until the sysets_to_analyze stack is empty
  while(sys = synsets_to_analyze.shift) do
    sys_id,dis,dep = *sys; next if dep >= max_depth_allowed
    sys = Words::Synset.new(sys_id,data.wordnet_connection,nil) rescue next;

    # save the current sysnset words into an output array
    sys.words.each {|w|  synsets_to_store.unshift([sys_id,w,dis])}

    # put each of the sysnset related to this into the stack unless they
    # are already present
    sys.relations.each do |r|
      synsets_to_analyze.unshift(
        [r.destination.synset_id, dis + weights["#{r.relation_type}"],dep + 1]
      ) if r.is_semantic? and
           !synsets_to_store.find{|s| s.first == r.destination.synset_id}
    end
  end

  # now in sysnsets_to_store you have an array of the words each of them
  # with the weight that separe it from the starting synsets.
 # (now I store them on a db, but is just because the context is the same as the Abacus gem)
  synsets_to_store.each do |s|
    a_id = ArticleKey.find_by_the_key(s[1]).id rescue next
    self.distances.find_or_create_by_article_key_id( a_id, :distance => s[2])
  end

end
</code>
</pre>
<p>Here some of the results for &#8216;sword&#8217; with depth = 3:</p>
<pre><code>
sword:                           0
brand:                           0
steel:                           0
broadsword:                      2
rapier:                          2
tuck:                            2
backsword:                       2
fencing sword:                   2
falchion:                        2
Excalibur:                       2
cutlas:                          2
sabre:                           2
cavalry sword:                   2
saber:                           2
cutlass:                         2
foible:                          3
blade:                           3
hilt:                            3
forte:                           3
tip:                             3
peak:                            3
point:                           3
helve:                           3
haft:                            3
claymore:                        4
scimitar:                        4
saber:                           4
sabre:                           4
foil:                            4
epee:                            4
arm:                             5
basket hilt:                     5
head:                            5
weapon system:                   5
knife blade:                     5
weapon:                          5
widow's peak:                    5
cusp:                            5
razorblade:                      5
cutting edge:                    6
pommel:                          6
knob:                            6
knife edge:                      6
fire ship:                       7
shaft:                           7
slasher:                         7
missile:                         7
Greek fire:                      7
missile:                         7
weapon of mass destruction:      7
light arm:                       7
WMD:                             7
gun:                             7
flamethrower:                    7
pike:                            7
brass knucks:                    7
knucks:                          7
brass knuckles:                  7
knuckles:                        7
W:                               7
tomahawk:                        7
hatchet:                         7
lance:                           7
knuckle duster:                  7
bow and arrow:                   7
projectile:                      7
sling:                           7
bow:                             7
stun baton:                      7
spear:                           7
stun gun:                        7
convexity:                       8
cutting implement:               8
convex shape:                    8
portion:                         8
part:                            8
handle:                          8
grip:                            8
hold:                            8
handgrip:                        8
reap hook:                       9
knife:                           9
sticker:                         9
dagger:                          9
axe:                             9
file:                            9
awl:                             9
lawn mower:                      9
mower:                           9
scissors:                        9
ax:                              9
sickle:                          9
cone shape:                      9
conoid:                          9
cone:                            9
reaping hook:                    9
pencil:                          9
arrowhead:                       9
knife:                           9
pair of scissors:                9
spatula:                         9
spatula:                         9
alpenstock:                      9
instrument:                      10
weapons system:                  11
implements of war:               11
arms:                            11
munition:                        11
weaponry:                        11
</code></pre>
<p>Now, as you may notice, there still a lot of tuning to do; for example it is pretty strange that &#8216;weapon of mass destruction&#8217; is more semantically related to &#8216;sword&#8217; than &#8216;dagger&#8217; <img src='http://sandropaganotti.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> . </p>
<p>Anyway I&#8217;m pretty pleased of the results of this small experiment thus I&#8217;m still far from my initial idea: calculate the weight of each word of the dictionary in relation to &#8216;good&#8217; and &#8216;bad&#8217; and use these weights to estimate the &#8216;mood&#8217; of some common trends in twitter.</p>
]]></content:encoded>
			<wfw:commentRss>http://sandropaganotti.com/2010/05/04/a-semantic-experiment-for-separate-good-from-bad/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ruby linguistic, Wordnet e LinkParser su Snow Leopard</title>
		<link>http://sandropaganotti.com/2010/02/03/ruby-linguistic-wordnet-e-linkparser-su-snow-leopard/</link>
		<comments>http://sandropaganotti.com/2010/02/03/ruby-linguistic-wordnet-e-linkparser-su-snow-leopard/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 22:46:09 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Librerie]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[link parser]]></category>
		<category><![CDATA[link-grammar]]></category>
		<category><![CDATA[natural language generator]]></category>
		<category><![CDATA[natural language parser]]></category>
		<category><![CDATA[wordnet]]></category>

		<guid isPermaLink="false">http://sandropaganotti.com/?p=292</guid>
		<description><![CDATA[Update, 05/02/10: Ecco il link alle slide della presentazione! Ringrazio tutti i partecipanti alla bellissima serata! Il prossimo giovedì (4 febbraio 2010) terrò un Lightning Talk al Ruby Social Club su alcuni strumenti interessanti che orbitano attorno al concetto di Natural Language Generation/Parsing. In particolare vedremo due librerie che consentono di interfacciarsi con  WordNet e [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Update, 05/02/10: </strong>Ecco il <a title="Natural Languages and Ruby" href="http://prezi.com/hdlhowymuge2/" target="_blank">link alle slide della presentazione</a>! Ringrazio tutti i partecipanti alla bellissima serata!</p>
<p>Il prossimo giovedì (4 febbraio 2010) terrò un Lightning Talk al <a href="http://therubymine.com/2010/01/25/primo-ruby-social-club-del-2010/" target="_blank">Ruby Social Club</a> su alcuni strumenti interessanti che orbitano attorno al concetto di Natural Language Generation/Parsing. In particolare vedremo due librerie che consentono di interfacciarsi con  <a href="http://wordnet.princeton.edu/" target="_blank">WordNet</a> e <a href="http://www.link.cs.cmu.edu/link/" target="_blank">Link-Grammar</a>.</p>
<p>Non voglio svelare nulla dello speech in questo post (anche se sicuramente venerdi farò un update allegando le slide della presentazione) ma solamente fornire a coloro che lo ritengano utile le istruzioni su come installare tali librerie sul proprio Mac.</p>
<p><span id="more-292"></span></p>
<h3>WordNet</h3>
<p><strong>Update, 05/02/10:</strong> mi segnalano <a href="http://github.com/roja/words" target="_blank">Words</a>, un wrapper analogo a quello che presento nelle prossime righe ma decisamente più aggiornato, it worths a look.</p>
<p>Partiamo da WordNet; la prima cosa da fare è installare la gemma BerkleyDB, il che comporta in primis l&#8217;installazione dello stesso BDB, quindi il primo comando è:</p>
<pre><code>sudo port install db47
</code></pre>
<p>Fatto questo <a href="ftp://ftp.eenet.ee/pub/FreeBSD/distfiles/ruby/bdb-0.6.4.tar.gz" target="_blank">scarichiamo la versione vecchia della gemma BDB</a> (<a href="http://github.com/mattbauer/bdb" target="_blank">quella su GitHub</a> non va bene, hanno cambiato i nomi alle costanti), decomprimiamola e modifichiamo la chiamata alla funzione &#8216;have_library&#8217; nel file src/extconf.rb (riga 72 e riga 79) come segue:</p>
<pre><code>have_library("db-#{with_ver}", db_version)
</code></pre>
<p>lanciamo infine la compilazione e l&#8217;installazione della libreria con i comandi (da eseguire all&#8217;interno della cartella di BDB):</p>
<pre><code>sudo env ARCHFLAGS="-arch x86_64" ruby extconf.rb  -- --with-db-include=/opt/local/include/db46 --with-db-lib=/opt/local/lib/db46 --with-db-version=4.6
make
sudo make install
</code></pre>
<p>Ok, ora installiamo la gemma di wordnet:</p>
<pre><code>sudo gem install wordnet
</code></pre>
<p>modifichiamo quindi il file &#8216;lib/wordnet/lexicon.rb&#8217; all&#8217;interno della gemma impostando alla riga 68 un path assoluto come ad esempio:</p>
<pre><code>DEFAULT_DB_ENV = File::join( '/Library/Ruby/Gems/1.8/gems/wordnet-0.0.5/ruby-wordnet' )
</code></pre>
<p>A questo punto non ci resta che  scaricare e convertire il database di wordnet nel formato BDB richiesto dalla gemma, per fare questo recuperiamo <a href="http://wordnetcode.princeton.edu/3.0/WordNet-3.0.tar.gz" target="_blank">l&#8217;ultima versione di wordnet disponibile</a> e decomprimiamola. Poi eseguiamo da linea di comando lo script presente nella cartella della gemma &#8216;convertdb.db&#8217; e, quando ci viene chiesto, inseriamo il percorso assoluto alla cartella &#8216;dict&#8217; all&#8217;interno dell&#8217;archivio di wordnet appena decompresso.<br />
Testiamo il funzionamento del tutto eseguendo questo semplice script:</p>
<pre><code class="ruby">require 'rubygems'
require 'wordnet'
include WordNet::Constants

lex     = WordNet::Lexicon::new
origins = lex.lookup_synsets( "house", Noun )
puts "#{(o=origins.first).words}: #{o.lex_info}"
[:meronyms,:hypernyms,:derivations,:hyponyms].each do |m|
  puts "#{m}: #{o.send(m).map{|s| s.words}.flatten.uniq.join(",")}"
end</code></pre>
<h3>Link Grammar</h3>
<p>Installiamo link grammar usando i MacPort:</p>
<pre><code>sudo port install link-grammar</code></pre>
<p>quindi <a href="http://deveiate.org/projects/Ruby-LinkParser" target="_blank">scarichiamo il sorgente della gemma</a> che fungerà da wrapper e decomprimiamolo in una cartella a nostro piacimento. Dall&#8217;interno di questa cartella compiliamo ed installiamo la gemma con questi comandi:</p>
<pre><code>ARCHFLAGS="-arch x86_64" rake --  --with-link-grammar-include=/opt/local/include/link-grammar --with-link-grammar-lib=/opt/local/lib
sudo rake install
</code></pre>
<p>Testiamo anche questa installazione eseguendo questo piccolo script:</p>
<pre><code class="ruby">require 'rubygems'
require 'linkparser'

dict = LinkParser::Dictionary.new('en')
sent = dict.parse( "People use Ruby for all kinds of nifty things." )

puts sent.subject
puts sent.verb
puts sent.object
</code></pre>
<h3>Linguistic</h3>
<p>Questa gemma funziona un pò da meta-wrapper raggruppando le funzionalità delle due gemme finora installate in un unico e omogeneo set di API. Installiamola con rubygems:</p>
<pre><code>sudo gem install linguistics</code></pre>
<p>quindi modifichiamo il file &#8216;lib/linguistics/en/linkparser.rb&#8217; all&#8217;interno della cartella dove risiede la gemma sistemando la riga 90 come segue:</p>
<pre><code>return @lp_dict ||= LinkParser::Dictionary.new('en', :verbosity =&gt; 0 )
</code></pre>
<p>Eseguiamo quindi un piccolo script di prova per sancirne la riuscita installazione:</p>
<pre><code class="ruby">require 'rubygems'
require 'linguistics'

Linguistics::use( :en )
frase = "the cat chased a snake"

puts &lt;&lt;-EOS
  Sogg:        #{frase.en.sentence.subject}
  Verbo:       #{frase.en.sentence.verb}
  Comp.ogg:    #{frase.en.sentence.object}
  Verbo (inf): #{frase.en.sentence.verb.en.infinitive}
EOS
</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://sandropaganotti.com/2010/02/03/ruby-linguistic-wordnet-e-linkparser-su-snow-leopard/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
