How to troubleshoot import of CSV files created by Espo ?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • tothewine
    Active Community Member
    • Jan 2018
    • 373

    How to troubleshoot import of CSV files created by Espo ?

    I went to old espo (5.8) and exported all entities of a custom scope with all the fields. Then I started a cron import in the new espo instance (5.9) but it hangs after 654 entities out of 25000 and stuck in "in progress". I was thinking if the file was created by espo it would not have problems importing. This is a bug I think....
  • esforim
    Active Community Member
    • Jan 2020
    • 2204

    #2
    Anything different in line 654 compare to the rest? Some weird text? Long char field? Server time out? PHP Execution time out maybe?

    Comment

    • yuri
      Member
      • Mar 2014
      • 8442

      #3
      In the most of topics here my first answer is to check logs.
      Last edited by yuri; 05-22-2020, 07:06 AM.
      If you find EspoCRM good, we would greatly appreciate if you could give the project a star on GitHub. We believe our work truly deserves more recognition. Thanks.

      Comment

      • tothewine
        Active Community Member
        • Jan 2018
        • 373

        #4
        Originally posted by yurikuzn
        In the most of topics here my first answer is to check logs.
        I had multiple problems (see the email thread) that spammed the log file so it is really big now (18GB). I am trying to trim duplicate lines in order to make it more manageable size and see if something comes up.

        Thinking about this, the problem may be in either csv parser or export code... but we need details.


        Originally posted by espcrm
        Anything different in line 654 compare to the rest? Some weird text? Long char field? Server time out? PHP Execution time out maybe?
        timeout should not be a possibility since I used "background import" which does the import as a cron job. I also have set very generous timeouts on that server (hours).


        The data looks quite "normal". All these entries have multiline descriptions (the third column is a normal "description" field) but the previous ones were imported without problems...

        Click image for larger version  Name:	2020-05-22--13-31-05_TextPad.png Views:	0 Size:	66.1 KB ID:	58654
        #1 - was imported #2 - it would be the next one but it did not get imported






        I will create a csv with only that portion so I can try to reproduce the problem.

        I was noticing the ones in the third column are CRLF while the ending lines are LF. Also there are TAB characters in there.
        It may have to do with that but why would the first batch of entities be imported without issue?


        Ps. I was thinking... wouldn't encoding the values with hex/url/base64 solve most problems of csv files?
        In that case when importing or exporting we could have a checkbox to do it or decode when importing. maybe I will try to implement it as a custom format.
        Last edited by tothewine; 05-22-2020, 11:58 AM.

        Comment

        • tothewine
          Active Community Member
          • Jan 2018
          • 373

          #5
          It is very strange... the second CSV file imported without errors.... while the first is still "in progress".

          Click image for larger version  Name:	2020-05-24--17-22-42_firefox.png Views:	0 Size:	6.3 KB ID:	58692


          When doing the import without idle mode I saw a gateway error. The idle mode should not have this problem.
          The bug here is that I see no way to resume an import without making a new one...
          Last edited by tothewine; 05-24-2020, 07:07 PM.

          Comment

          • tothewine
            Active Community Member
            • Jan 2018
            • 373

            #6
            After much digging I understood the php timeout limits were different between cron jobs and apache. This is now a bug report about having some logic for tracking csv import and resume it automatically. for example storing the import settings in the import entity rather than the import job and then having an 'offset' field in there which is increase by 1 every time an entity is imported correctly. If the import is interrupted then it would be sufficient to hit a "resume" button that will restart the import job, skipping the number of entities in 'offset' before start the actual import.
            Last edited by tothewine; 05-25-2020, 12:45 PM.

            Comment


            • esforim
              esforim commented
              Editing a comment
              That is great to hear and thank you for keeping at it.

              But is "One" your signature/sign off or did the post got cut off?

            • tothewine
              tothewine commented
              Editing a comment
              It was a typo :P
          • yuri
            Member
            • Mar 2014
            • 8442

            #7
            Import will be bit improved in v5.10.
            If you find EspoCRM good, we would greatly appreciate if you could give the project a star on GitHub. We believe our work truly deserves more recognition. Thanks.

            Comment


            • tothewine
              tothewine commented
              Editing a comment
              Looking forward to it!
          Working...