[NCLUG] looking for patterns in text

Quentin Hartman qhartman at gmail.com
Mon Sep 9 17:59:57 MDT 2013


I tinkered with some Bayesian stuff for a previous employer. This is likely
a very good approach to this problem (the spam filter as a base is a great
suggestion) but be warned that if you can't find something reasonably off
the shelf you will be entering the world of Real Math(tm), and so if you
don't have a very strong mathematical background it will likely make your
brain hurt. It is not for the faint of heart.

QH




On Mon, Sep 9, 2013 at 5:49 PM, John Gilmore <j.arthur.gilmore at gmail.com>wrote:

> My first impulse would be to start with a statistical filter, the same
> sort often used to filter spam. "bayes" is the keyword you'd want.
>
> On Mon, Sep 9, 2013 at 4:10 PM, Mike Cullerton <michaelc at cullerton.com>
> wrote:
> > Hey Folks,
> >
> > I'm helping a neighbor learn python, and we're using a problem they have
> at work.
> >
> > They have text they want to parse, and compare to a known standard. They
> want to sort the text based on how similar it is to the standard.
> >
> > They receive feedback from the public during engineering projects. Some
> of this feedback is original. Some is copy/pasted from form letter
> boilerplate. They'd like to parse the feedback text and sort it based on
> how similar it is to the boilerplate.
> >
> > I'm guessing there's work out there already on this kind of stuff.
> >
> > I've done some basic searches, but I'm not getting what I want. I'm
> hoping someone here knows some terms I can use to get started on my
> searching.
> >
> > Any thoughts welcome.
> >
> > Thanks,
> > Mike
> >
> > _______________________________________________
> > NCLUG mailing list       NCLUG at lists.nclug.org
> >
> > To unsubscribe, subscribe, or modify
> > your settings, go to:
> > http://lists.nclug.org/mailman/listinfo/nclug
> _______________________________________________
> NCLUG mailing list       NCLUG at lists.nclug.org
>
> To unsubscribe, subscribe, or modify
> your settings, go to:
> http://lists.nclug.org/mailman/listinfo/nclug
>


More information about the NCLUG mailing list