FAQ
overflow

Great Answers to
Questions About Everything

QUESTION

What is the most space and time efficient way to implement a Trie in Mathematica?

Will it be practically faster than what is natively available in appropriate cases?

{ asked by Mr.Wizard }

ANSWER

A combination of rules and recursion is able to produce rather powerful solutions. Here is my take on it:

ClearAll[makeTree];
makeTree[wrds : {__String}] := makeTree[Characters[wrds]];
makeTree[wrds_ /; MemberQ[wrds, {}]] := 
     Prepend[makeTree[DeleteCases[wrds, {}]], {} -> {}];
makeTree[wrds_] := 
    Reap[If[# =!= {}, Sow[Rest[#], First@#]] & /@ 
       wrds, _, #1 -> makeTree[#2] &][[2]]

ClearAll[getSubTree];
getSubTree[word_String, tree_] := Fold[#2 /. #1 &, tree, Characters[word]]

ClearAll[inTreeQ];
inTreeQ[word_String, tree_] :=  MemberQ[getSubTree[word, tree], {} -> {}]

ClearAll[getWords];
getWords[start_String, tree_] :=
  Module[{wordStack = {}, charStack = {}, words},
    words[{} -> {}] :=
      wordStack = {wordStack, StringJoin[charStack]};
    words[sl_ -> ll_List] :=
      Module[{},
        charStack = {charStack, sl};
        words /@ ll;
        charStack = First@charStack;
      ];
    words[First@Fold[{#2 -> #1} &, getSubTree[start, tree], 
         Reverse@Characters[start]]
    ];
    ClearAll[words];
    Flatten@wordStack];

The last function serves to collect the words from a tree, by performing a depth-first tree traversal and maintaining the stack of accumulated characters and words.

Here is a short example:

In[40]:= words = DictionaryLookup["absc*"]
Out[40]= {abscess,abscessed,abscesses,abscessing,abscissa,abscissae,abscissas,
   abscission,abscond,absconded,absconder,absconders,absconding,absconds}

In[41]:= tree = makeTree[words]
Out[41]= {a->{b->{s->{c->{e->{s->{s->{{}->{},e->{d->{{}->{}},s->{{}->{}}},
      i->{n->{g->{{}->{}}}}}}},i->{s->{s->{a->{{}->{},e->{{}->{}},s->{{}->{}}},
        i->{o->{n->{{}->{}}}}}}},o->{n->{d->{{}->{},e->{d->{{}->{}},r->{{}->{},s->{{}->{}}}},
       i->{n->{g->{{}->{}}}},s->{{}->{}}}}}}}}}}

In[47]:= inTreeQ[#,tree]&/@words
Out[47]= {True,True,True,True,True,True,True,True,True,True,True,True,True,True}

In[48]:= inTreeQ["absd",tree] 
Out[48]= False

In[124]:= getWords["absce", tree]
Out[124]= {"abscess", "abscessed", "abscesses", "abscessing"}

I only constructed here a bare-bones tree, so you can only test whether or not the word is there, but not keep any other info. Here is a larger example:

In[125]:= allWords =  DictionaryLookup["*"];

In[126]:= (allTree = makeTree[allWords]);//Timing
Out[126]= {5.375,Null}

In[127]:= And@@Map[inTreeQ[#,allTree]&,allWords]//Timing
Out[127]= {1.735,True}

In[128]:= getWords["pro",allTree]//Short//Timing
Out[128]= {0.015,{pro,proactive,proactively,probabilist,
    <<741>>,proximate,proximately,proximity,proxy}}

In[129]:= DictionaryLookup["pro*"]//Short//Timing
Out[129]= {0.032,{pro,proactive,proactively,probabilist,<<741>>,
    proximate,proximately,proximity,proxy}}

I don't know which approach has been used for the built-in functionality, but the above implementation seems to be generally in the same calss for performance. The slowest part is due to the top-level tree-traversing code in getWords. It is slow because the top-level code is slow. One could speed it up considerably by hashing words to integers - then it can be Compiled. This is how I'd do that, if I were really concerned with speed.

EDIT

For a really nice application of a Trie data structure, where it allows us to achieve major speed-up (w.r.t. using DictionaryLookup, for example), see this post, where it was used it to implement an efficient Boggle solver.

{ answered by Leonid Shifrin }
Tweet