Announcement

Collapse
No announcement yet.

Using import features of 4.3.0 beta 2

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using import features of 4.3.0 beta 2

    I have a lot of data to up-load so I am using 4.3.0 beta 2.

    I have about 130K records to load so I have this broken down in 25,000 batches and it is failing on the 1st batch.

    Here is what happened the last time (this has happened more than once so can be reproduced):
    • I started the import to Contacts at 8:13am
    • The data contains the ID of an account (don't know if this makes a difference)
    • When I checked at 9:30 all was progressing well and was scheduled to finish about 12:00
    • At 10:23, 10:24 and 10:25 extra jobs were triggered by Cron to also process the data
    • These extra jobs are processing the input file from the start so I will have 4 of every record
    • I have not let these jobs finish so I don't know if extra jobs are triggered off
    • I did let a job run for a long time yesterday (which triggered the extra jobs) and after 24 hours there were still just 4 processes so the trigger does not seem to be time driven. I don't know if it is a data driven trigger (i.e. something in row 20,000 causes extra processes).
    This is running on an Ubuntu so here is the output from the "ps" command.
    Code:
    mat@dev4:~$ ps -ef| grep espo
    espocrm   3132  3130  0 08:13 ?        00:00:00 /bin/sh -c /usr/bin/php -f /var/www/html/espo/cron.php > /dev/null 2>&1
    espocrm   3134  3132  3 08:13 ?        00:05:42 /usr/bin/php -f /var/www/html/espo/cron.php
    espocrm   7568  7566  0 10:23 ?        00:00:00 /bin/sh -c /usr/bin/php -f /var/www/html/espo/cron.php > /dev/null 2>&1
    espocrm   7569  7568  3 10:23 ?        00:00:30 /usr/bin/php -f /var/www/html/espo/cron.php
    espocrm   7574  7572  0 10:24 ?        00:00:00 /bin/sh -c /usr/bin/php -f /var/www/html/espo/cron.php > /dev/null 2>&1
    espocrm   7575  7574  3 10:24 ?        00:00:28 /usr/bin/php -f /var/www/html/espo/cron.php
    espocrm   7584  7579  0 10:25 ?        00:00:00 /bin/sh -c /usr/bin/php -f /var/www/html/espo/cron.php > /dev/null 2>&1
    espocrm   7586  7584  3 10:25 ?        00:00:25 /usr/bin/php -f /var/www/html/espo/cron.php
    mat       8026  8002  0 10:38 pts/7    00:00:00 grep --color=auto espo
    mat@dev4:~$
    It only seems to generate the extra 3 jobs for some reason. At the time of writing it is 10:47 so no more jobs have been created.

    Help as always appreciated.

  • #2
    Hi,

    * Errors in data/logs?
    * Errors in apache log?
    * Administration > Scheduled Jobs > top-right dropdown > Jobs. Are you talking about these jobs? After import is run there must be the job created. And this job should be run by cron.php and do import work.
    Last edited by yuri; 11-29-2016, 11:09 AM.

    Comment


    • #3
      Ok, from my .../data/logs/espo-2016-11-29.log

      Code:
      [2016-11-29 07:32:01] Espo.ERROR: Uncaught Exception PDOException: "SQLSTATE[HY000] [1044] Access denied for user 'EspoCRM'@'%' to database 'espo'" at /var/www/html/espo/application/Espo/ORM/EntityManager.php line 138 {"exception":"[object] (PDOException(code: 1044): SQLSTATE[HY000] [1044] Access denied for user 'EspoCRM'@'%' to database 'espo' at /var/www/html/espo/application/Espo/ORM/EntityManager.php:138)"} []
      [2016-11-29 07:33:01] Espo.ERROR: Uncaught Exception PDOException: "SQLSTATE[HY000] [1044] Access denied for user 'EspoCRM'@'%' to database 'espo'" at /var/www/html/espo/application/Espo/ORM/EntityManager.php line 138 {"exception":"[object] (PDOException(code: 1044): SQLSTATE[HY000] [1044] Access denied for user 'EspoCRM'@'%' to database 'espo' at /var/www/html/espo/application/Espo/ORM/EntityManager.php:138)"} []
      [2016-11-29 10:26:28] Espo.ERROR: Import: [40001] SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction [] []
      [2016-11-29 10:41:07] Espo.ERROR: Import: [40001] SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction [] []
      [2016-11-29 10:52:57] Espo.ERROR: Import: [40001] SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction [] []
      I would suggest that the 7:30 items were triggered by me clearing up the problem from the last time (I restarted the Apache process and killed the running php linux processes). I note that there are 3 Serialization errors and 3 extra processes but the times do not align.

      I have looked in my /var/log/apache2/error.log and the only items are from 7:30 this morning and relate to me restarting the service.
      Code:
      [Tue Nov 29 07:35:11.440393 2016] [mpm_prefork:notice] [pid 1320] AH00163: Apache/2.4.18 (Ubuntu) configured -- resuming normal operations
      [Tue Nov 29 07:35:11.440446 2016] [core:notice] [pid 1320] AH00094: Command line: '/usr/sbin/apache2'
      Regarding the jobs on that menu item. I cannot see any job in here that relates to the import of data. I started the import off at about 8:13 and none of the jobs listed relate to importing data. I am happy to run a search of these items if you can tell me what test to look for. I am not sure I have answered your question on this point. Can you elaborate further what you are after please.

      Comment


      • #4
        The name of job "Import: runIdleImport". But search doesn't work in beta. Check what status does it have.

        If you disabled checking for duplicates import should take minutes not hours. If max execution time for php and mysql are sufficient.
        Last edited by yuri; 11-29-2016, 01:30 PM.

        Comment


        • #5
          I do have checking disabled as well as run from cron. With checking disabled it runs quicker than with it enabled but not in minutes. This I believe to be an issue at my end with my database. Simple inserts are taking 0.2 of a second, I will investigate separately (I have a hunch that it is fragmentation related).

          Looking at the database schema it seems that the data for the run jobs is either stored in the "job" table or the "scheduled_job_log_record" table.

          The following SQL commands both give 0 rows. The tables both contain 6,000+ records and have entries from around 8:13 which is when the job was started.

          Code:
          SELECT * FROM `job` where name like '%port%'
          SELECT * FROM `scheduled_job_log_record` where name like '%port%'
          Whilst having a look behind the scenes I also note that the "scheduled_job" table has 8 rows in it (also available through the CRM user interface) and none of them make any mention of importing anything. Happy to dump the information if useful.

          Thanks for your continued help on this.

          PS - I will create a separate post when I have investigated but I don't think that the "Revert" function is cleaning up phone numbers and emails when Contacts have been loaded. I will provide more information once tested.

          Comment


          • #6

            SELECT * FROM `job` where service_name like 'Import'

            Comment


            • #7
              Ah, I didn't spot that the first time. That returned a single row:

              id: 583d38561076dd667
              name: null
              deleted: 0
              execution_time: 29/11/2016 08:12
              service_name: import
              method: runIdleImport
              data: {"entityType":"Contact","params":{"headerRow":true ,"fieldDelimiter":",","textQualifier":""","dateFor mat":"YYYY-MM-DD","timeFormat":"HH:mm","personNameFormat":"f l","decimalMark":".","currency":"GBP","defaultValu es":{},"action":"create","skipDuplicateChecking":t rue,"idleMode":false},"attachmentId":"583d3855d5f8 0fe73","importFieldList":["coreId","coreIdNum","","","","accountId","salutat ionName","firstName","lastName","title","","",""," ","","phoneNumberOther","phoneNumberMobile","","em ailAddress",""],"importId":"583d38560ff061b73"}
              attempts: 0
              target_id: null
              target_type: null
              failed_attempts: 3
              created_at: 29/11/2016 08:12
              modified_at: 29/11/2016 10:25
              scheduled_job_id: null

              Some more information for you. At about 14:00 there were 4 PHP pairs of processes running at the Unix level (see 1st port). Now one of them has finished and there are 3 left running. These processes are still updating the file with additional rows.

              Comment


              • #8
                I can't find status. There is the Status field what I asked you to look at.

                Comment


                • #9
                  You can disable cron running in crontab. Then run cron manually in CLI. You will allow you to see all PHP errors that can be not logged. Create Import before this of course.

                  Comment


                  • #10
                    Sorry, missed Status from the data dump. Status is "Success".

                    Also I have checked and the 1st job actually ran to completion. The data being loaded has a unique column so "SELECT Count(distinct unique_column) FROM ..." gives me 25,000 rows which is correct.

                    I will now run the cron job from the command line and capture the output.

                    Comment


                    • #11
                      Ok, further investigation. I stopped Cron from running and launched the job manually. This ran for many hours (10+), didn't complete (because I stopped it) but looked like it would have done. There was nothing in the output file.

                      I therefore stopped it, reset everything and modified the cron job so that it went via a shell script wrapper to capture the output of each job to a separate file. Then started up the cron job again and started the import from the beginning to see what happened.

                      Here is what I have found.

                      Import started at 07:49 and has run since then. Nothing written to stdout or stderr
                      The cron.php task has been started every minute since then but finishes almost immediately generating no output
                      At 08:57 another cron.php job started but this one kept running. It is importing the data from the start
                      At 08:58 another cron.php job started but this one kept running. It is importing the data from the start
                      At 08:59 another cron.php job started but this one kept running. It is importing the data from the start
                      The cron.php task has been started every minute since then but finishes almost immediately generating no output
                      It is now 10:35

                      This behaviour is consistent with what I have seen before

                      When I look at the job table using the query above, there is 1 entry, here are some of the fields from it:
                      deleted: 0
                      status: Failed
                      attempts: 0
                      execute_time: 2016-11-30 07:46:10
                      service_name: Import
                      failed_attempts: 3
                      created_at: 2016-11-30 07:46:10
                      modified_at: 2016-11-30 09:59:02

                      If I leave the 1st job to complete I THINK that the status would change to Success. This is based upon the post above in that job 1 finished and the status was Success.

                      My reading of things is that something seems to trigger an issue 1 hour 10 minutes into the run so the Status is set to Failed (even though the process is still running). The system then tries to recover by starting the import again. This doesn't change the status so it tries to recover 2 more times and then stops trying to recover. This explains the 4 processes and 4 times the amount of data in the target file.

                      I don't know what causes the initial failure; a time out?

                      I also note that every single row in the job table (6,000+ of them) has "failed_attempts" set to 3. All the other items though have attempts set to 3 too except this item where it is 0.

                      One change from a "standard" config that I have just remembered making, is that the tables in the database use storage engine of InnoDB not MyISAM. Shouldn't make a difference but you never know.

                      Comment


                      • #12
                        Add to data/config.php


                        'cron' => array(
                        'jobPeriod' => 24 * 3600 // cron can be running 24 hours (parameter in seconds)
                        )

                        Comment


                        • #13
                          I made that change and set it running again. It has now been 2 hours and if behaving perfectly (it had always gone wrong by now); so please consider this resolved unless you hear otherwise.

                          What was it; a time out of 4,000 seconds?

                          If my original understanding is correct then it raises the question "is re-running the import job the correct recovery solution for a failure?"

                          Thanks

                          Comment


                          • #14
                            Script shoudn't be terminated with 'jobPeriod' parameter. That was something else.

                            Re running of failed import should not cause problems I think.

                            Comment


                            • #15
                              Everything is working, I have completed 2 other up-loads, I am just dwelling on the issue; happy to move on though.

                              It may just be a turn of phrase, but the initial job was never terminated, it ran its course. After about 1:10 its status would be set to failed, but it actually kept running to the end when the status would be set to "Success".

                              On the subject of correct recovery; If I am loading say 100,000 rows and it fails at 60,000 rows I don't think that the correct recovery process is to start the 100,000 rows off from the start because I would then end up with 160,000. I don't know what the correct recovery process should be, probably do nothing.

                              Thanks for all the help.

                              Comment

                              Working...
                              X