搜索常用到的“Do you mean"功能的算法（收藏）
Have you always wanted your very own
Lucene finite state transducer
(FST) but you couldn't figure out how to use Lucene's crazy APIs?
Then today is your lucky day! I just built a
simple web application
that creates an FST from the input/output strings that you enter.
If you just want a finite state automaton (no outputs) then enter only inputs, such as
If all of your outputs are
then the FST will use numeric outputs, where you sum up the outputs as you traverse a path to get the final output:
Finally, if the outputs are
then they are treated as strings, in which case you concatenate as you traverse the path:
The red arcs are the ones with the NEXT optimization: these arcs do not store a pointer to a node because their to-node is the very next node in the FST. This is a good optimization: it generally results in
a large reduction of the FST size
. The bolded arcs tell you the next node is final; this is most interesting when a prefix of another input is accepted, such as
Here the "r" arc is bolded, telling you that "star" is accepted. Furthermore, that node following the "r" arc has a final output, telling you the overall output for "star" is "abc".
The web app is a simple Python WSGI app; source code is
. It invokes a simple Java tool as a subprocess; source code (including generics violations!) is