Email Fetching Duplicate Messages

Collapse
X
 
  • Time
  • Show
Clear All
new posts

  • yuri
    replied
    Note, that you can still have the job running right now that can take some time to finish.

    Cron config params:
    PHP Code:
     
    'cron' => array(
        /** Max number of jobs per one execution. */
        'maxJobNumber' => 15,
        
         /** Max execution time (in seconds) allocated for a sinle job. If exceeded then set to Failed.*/
        'jobPeriod' => 7800,    
      
        /** Attempts to re-run failed jobs. */
        'attempts' => 2  
    ) 
    

    Leave a comment:


  • yuri
    replied
    I meant multiple cron running in paraller. One script is running, the other script starts before previous is finished. If you have only one personal account that should not happen unless the cron run takes more than 2 hours. This could have happen, in this case system treat the job as failed and starts again (for 3 times before terminate). These params are configurable in data/config.php

    I think you need:
    1. Drop max portion size to ~100.
    2. Apply the changed file I linked above.

    Leave a comment:


  • cardmaverick
    replied
    You're database schema has no unique key on the 'message_id' column - that might be the issue here - assuming all emails have true unique message id's.

    Leave a comment:


  • cardmaverick
    replied
    You process email in parallel? I'm actually very familiar with PHP in parallel (I wrote an entire parallel processor for an internal program) - how are you processing in parallel? Are you dividing the work into piles? My own processor breaks each worker up with it's own pile of tasks to avoid race conditions, but when it comes to inserting into MYSQL - you can't do parallel inserts to the same database table - they are sequential no matter how many connections you have - if you are generating multiple connections, your wasting your time - it has no impact on insert performance. A better method is to parallel process the data outside the database - if anything can be done outside the database - then recombine edited data and do big insert statements of 5,000 or so records.

    Leave a comment:


  • cardmaverick
    replied
    Version 5.0.3

    Max email portion size for personal account fetching: 4,000 (I used 10 originally, but I bumped it up when I realized how long import would take with ~10K messages).

    All dependencies met at install, I also added in php mailparse before bringing messages into the system (not mentioned in the installer dependency section if I remember right).

    I was only using one personal email account - I did have my SMPT info entered into every place possible in the CRM though - perhaps that's contributing? I'm monitoring both Inbox and Sent folders. My email is provided by Godaddy - I use their Workspace Client online right now.

    I do develop - on the surface it does feel like it might be a bit of cron job failing mixed with inadequate data control in the database / processing script to check for duplicate messages. You could create some kind of hash column in the database and create the hash based on the actual message, then make the column a unique ID. The null message_id thing strikes me as being a quirk of email standards being all over the place. Every email I get from one company triggers an email format specification warning in the error logs.

    Hope that helps!

    Leave a comment:


  • joy11
    commented on 's reply
    I've added the fix and will keep an eye on it. Thanks for your help, Yuri.

  • joy11
    commented on 's reply
    I'm in the process of updating to the newest version. I'm on 4.8.4 on the live server and running 5.0.3 on my test but having to fix some custom things that no longer work before upgrading live.

  • yuri
    replied
    I've added some fixes to mail importer class, that I believe should solve the issue that email being imported in parallel processes can cause duplicates.

    Could you apply the changed file to your instance manually to check whether it helps?

    https://raw.githubusercontent.com/es...l/Importer.php

    Leave a comment:


  • yuri
    replied
    Do you use EspoCRM of the version 5.0.3?

    Leave a comment:


  • yuri
    commented on 's reply
    When an email sent twice, it could be the situation when email has been sent but SMTP server responded with an error. Then EspoCRM tries to re-send it (up to 3 times).

    Do you have the latest EspoCRM version?

  • yuri
    replied
    Maybe the cron script fails, and then re-starts that brings about duplicates. It needs to be investigated. How much personal email accounts do you have in the system?

    We've been using espocrm for years and have never encountered email duplicates.

    Leave a comment:


  • cardmaverick
    replied
    I've also noticed some messages have 'null' for message_id column in the database.

    Leave a comment:


  • cardmaverick
    replied
    Originally posted by yurikuzn
    So you have emails with identical message ids in espocrm database?
    Yes. This is a huge problem. I have literally hundreds of these. Email dates are wrong as well - I'm getting messages marked today that were received years ago. The whole thing seems to not work correctly.

    Leave a comment:


  • yuri
    replied
    So you have emails with identical message ids in espocrm database?

    Leave a comment:


  • cardmaverick
    replied
    Originally posted by yurikuzn
    How can I explain. It's a bug in the mail client that stored those email copies with different message ids.

    If you have such a situation then don't fetch "Sent" folder that contains email duplicates with different message id.
    That's not what's happening. The message ids are identical. I'm getting duplicates of both sent messages as well as duplicates of incoming messages.

    Leave a comment:

Working...