[NCLUG] looking for patterns in text
Anthony Foiani
tkil at scrye.com
Tue Sep 10 10:09:46 MDT 2013
John Gilmore <j.arthur.gilmore at gmail.com> writes:
> My first impulse would be to start with a statistical filter, the same
> sort often used to filter spam. "bayes" is the keyword you'd want.
n-grams might also be applicable here, especially for direct cut&paste
detection. Apparently that can work with the Bayesian bits as well.
Although, doing a quick bit of research to avoid sounding like a
*complete* idiot, I did find this:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.116.4413&rep=rep1&type=pdf
Which seems to be working on exactly this problem. The OP might
find hints there.
t.
More information about the NCLUG
mailing list