Flying memes

A semantic experiment for separate good from bad.

Yesterday was sunday and I came up with a fascinating idea: what happens if I use wordnet to measure the distance between two words ? By assigning weights to all the relation types and by navigate this relations graph I thought to be able to measure the distance between a word and the others in terms of the minimum sum of weights of the edges between each pair made of the chosen word and another.


So I tried to assign weights using the relation type as discriminator, to make an example take the word ‘sword’ and its relations:


related-term        relation type           assigned weight
weapon:             hypernym                5
backsword:          hyponym                 2                 
blade:              part_meronym            3
broadsword:         hyponym                 2
cavalry sword:      hyponym                 2
cutlas:             hyponym                 2
Excalibur:          instance_hyponym        2
falchion:           hyponym                 2
fencing sword:      hyponym                 2
foible:             part_meronym            3
forte:              part_meronym            3
haft:               part_meronym            3
hilt:               part_meronym            3
rapier:             hyponym                 2
point:              part_meronym            3

The weight I choose for each of the relation types tried to follow the statement
‘The more the words are related the less greater the number is’; so weapon is less
related to sword than broadsword because the first express a concept broader than sword
(also a nuclear bomb is a weapon); the second instead detail the word ‘sword’ and
make true the statement ‘A broadsword is always a sword’ so it’s more related to the
chosen word.

By following this general rule I associated a weight to each of the most common relation types and wrote down a few lines of code in order to compute weights by navigate the relation graph:


def compute_distances(weights = :default, max_depth_allowed = 6)
  
  # retrieve the list of the weights associated to each relation type
  # (its just an hash {:relation_type => weight})
  weights = CONFIG_FILE['distance']["#{weights}"]
  data = Words::Wordnet.new
  
  # get a list of sysnsets as a starting point (eg: red, crimson) 
  synsets_to_analyze = self.synsets.map{|s| [s.synset_id,0,0]}
  synsets_to_store   = []
  
  # process the first element of the list
  # until the sysets_to_analyze stack is empty 
  while(sys = synsets_to_analyze.shift) do
    sys_id,dis,dep = *sys; next if dep >= max_depth_allowed
    sys = Words::Synset.new(sys_id,data.wordnet_connection,nil) rescue next;
    
    # save the current sysnset words into an output array
    sys.words.each {|w|  synsets_to_store.unshift([sys_id,w,dis])}
    
    # put each of the sysnset related to this into the stack unless they 
    # are already present
    sys.relations.each do |r|
      synsets_to_analyze.unshift(
        [r.destination.synset_id, dis + weights["#{r.relation_type}"],dep + 1]
      ) if r.is_semantic? and
           !synsets_to_store.find{|s| s.first == r.destination.synset_id}
    end
  end
  
  # now in sysnsets_to_store you have an array of the words each of them 
  # with the weight that separe it from the starting synsets.
 # (now I store them on a db, but is just because the context is the same as the Abacus gem) 
  synsets_to_store.each do |s|
    a_id = ArticleKey.find_by_the_key(s[1]).id rescue next
    self.distances.find_or_create_by_article_key_id( a_id, :distance => s[2])
  end
  
end

Here some of the results for ‘sword’ with depth = 3:


sword:                           0
brand:                           0
steel:                           0
broadsword:                      2
rapier:                          2
tuck:                            2
backsword:                       2
fencing sword:                   2
falchion:                        2
Excalibur:                       2
cutlas:                          2
sabre:                           2
cavalry sword:                   2
saber:                           2
cutlass:                         2
foible:                          3
blade:                           3
hilt:                            3
forte:                           3
tip:                             3
peak:                            3
point:                           3
helve:                           3
haft:                            3
claymore:                        4
scimitar:                        4
saber:                           4
sabre:                           4
foil:                            4
epee:                            4
arm:                             5
basket hilt:                     5
head:                            5
weapon system:                   5
knife blade:                     5
weapon:                          5
widow's peak:                    5
cusp:                            5
razorblade:                      5
cutting edge:                    6
pommel:                          6
knob:                            6
knife edge:                      6
fire ship:                       7
shaft:                           7
slasher:                         7
missile:                         7
Greek fire:                      7
missile:                         7
weapon of mass destruction:      7
light arm:                       7
WMD:                             7
gun:                             7
flamethrower:                    7
pike:                            7
brass knucks:                    7
knucks:                          7
brass knuckles:                  7
knuckles:                        7
W:                               7
tomahawk:                        7
hatchet:                         7
lance:                           7
knuckle duster:                  7
bow and arrow:                   7
projectile:                      7
sling:                           7
bow:                             7
stun baton:                      7
spear:                           7
stun gun:                        7
convexity:                       8
cutting implement:               8
convex shape:                    8
portion:                         8
part:                            8
handle:                          8
grip:                            8
hold:                            8
handgrip:                        8
reap hook:                       9
knife:                           9
sticker:                         9
dagger:                          9
axe:                             9
file:                            9
awl:                             9
lawn mower:                      9
mower:                           9
scissors:                        9
ax:                              9
sickle:                          9
cone shape:                      9
conoid:                          9
cone:                            9
reaping hook:                    9
pencil:                          9
arrowhead:                       9
knife:                           9
pair of scissors:                9
spatula:                         9
spatula:                         9
alpenstock:                      9
instrument:                      10
weapons system:                  11
implements of war:               11
arms:                            11
munition:                        11
weaponry:                        11

Now, as you may notice, there still a lot of tuning to do; for example it is pretty strange that ‘weapon of mass destruction’ is more semantically related to ‘sword’ than ‘dagger’ :-).

Anyway I’m pretty pleased of the results of this small experiment thus I’m still far from my initial idea: calculate the weight of each word of the dictionary in relation to ‘good’ and ‘bad’ and use these weights to estimate the ‘mood’ of some common trends in twitter.

Tags: ,