Results 1 to 7 of 7
  1. #1
    Platinum Lounger
    Join Date
    Feb 2001
    Location
    Yilgarn region of Toronto, Ontario
    Posts
    5,453
    Thanks
    0
    Thanked 0 Times in 0 Posts
    If they do it with a data base of known spam (voted in by millions of gmail users), it must be a huge database.
    Even if it is encoded, compressed, structured like a spell-check dictionary, it would be large.

    None the less I found myself thinking that rather than configure a licensed copy of a heuristic spam filter, it might be worthwhile trying to make use of Gmail's data base, were one allowed to do so.

    Has anyone found an online data base of known and current spam that one could tie in to a mail-client such as Outlook or Thunderbird or Eudora?

  2. #2
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts

  3. #3
    Administrator
    Join Date
    Mar 2001
    Location
    St Louis, Missouri, USA
    Posts
    23,592
    Thanks
    5
    Thanked 1,059 Times in 928 Posts
    [quote name='chrisgreaves' post='784561' date='14-Jul-2009 13:48']If they do it with a data base of known spam (voted in by millions of gmail users), it must be a huge database.
    Even if it is encoded, compressed, structured like a spell-check dictionary, it would be large.

    None the less I found myself thinking that rather than configure a licensed copy of a heuristic spam filter, it might be worthwhile trying to make use of Gmail's data base, were one allowed to do so.

    Has anyone found an online data base of known and current spam that one could tie in to a mail-client such as Outlook or Thunderbird or Eudora?[/quote]

    If you want a filter that works best for what you believe to be spam why not use something likeSpamBayes: Bayesian anti-spam classifier written in Python.. It very quickly learns what you believe to be spam.

    BTW, I believe that none of the big email vendors such as Microsoft, Yahoo, & Google are going to make public their databases or their methods. That would enable spammers to evade their filters.

    Joe
    Joe

  4. #4
    Platinum Lounger
    Join Date
    Feb 2001
    Location
    Yilgarn region of Toronto, Ontario
    Posts
    5,453
    Thanks
    0
    Thanked 0 Times in 0 Posts
    [quote name='HansV' post='784564' date='14-Jul-2009 15:59']See "Use Google as spam filter"[/quote]
    Thanks, Hans.
    Without going into depth I'd pondered that technique. I recognize that GMail is an effective filter, and alreday have forwarding in place from my .GMAIL address to my .COM address. (It's how I know to go check gmail!).

    I am in the process of transferring all my domains, so I'll investigate what happens when my .COM address forwards mail to my .gmail address; if that gmail then forwards to com I could bring down the internet (grin!)

    I don't like the idea of abandoning my mail-client system, if only because it has such an archive of old emails.

  5. #5
    Platinum Lounger
    Join Date
    Feb 2001
    Location
    Yilgarn region of Toronto, Ontario
    Posts
    5,453
    Thanks
    0
    Thanked 0 Times in 0 Posts
    [quote name='joeperez' post='784567' date='14-Jul-2009 16:18']SpamBayes:[/quote]
    Thanks Joe.
    I would have replied sooner but that link sent me off on such an interesting read!

    I will probably give SpamBayes a shot.
    I am using POP3 SCAN MAILBOX as a filter right now, and since I declare that any email with an "at" sign in it is spam, my white list is essentially a closed list of valued contacts. I could migrate that existing list straight across to ensure existing fidelity.
    (Contact Me holds a description of my current technique)

    (later) I note from their page that WinXP/Thunderbird is not a tested combination.
    I have submitted my name as a willing and sometimes capable guinea-pig.

  6. #6
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts
    [quote name='chrisgreaves' post='785069' date='17-Jul-2009 15:16']I don't like the idea of abandoning my mail-client system, if only because it has such an archive of old emails.[/quote]
    I can understand that, but as Joe mentioned, I don't think Google will make its method for spam filtering and the corresponding database(s) publicly available other than through using Gmail.

  7. #7
    Platinum Lounger
    Join Date
    Feb 2001
    Location
    Yilgarn region of Toronto, Ontario
    Posts
    5,453
    Thanks
    0
    Thanked 0 Times in 0 Posts
    [quote name='HansV' post='785108' date='17-Jul-2009 14:35']I don't think Google will make its method for spam filtering and the corresponding database(s) publicly available other than through using Gmail.[/quote]
    I forgot to comment.
    I can quite understand Google not making its methods known - I'm not all that interested in them, although I suspect the prime function is a public vote.
    The database is available, of course, but only through GMail's front-end; it appears not to be available to humble developers as an online source.

    There again, I dare say that if I *did* manage to get Thunderbird to go online and chat with the database, it would slow down my mail scan/dl considerably, and I also suspect that there just aren't enough IDE drives left in Canada to hold *my* copy of the database ( )

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •