Unexplained system freezes not showing in error logs with the API not responding

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • marcusquinn
    Senior Member
    • Jan 2021
    • 133

    Unexplained system freezes not showing in error logs with the API not responding

    We are trying to introduce the first instance of Espo for live daily usage - but somehwow, with that usage (only a dozen or so usres so far), about 10 times a day the system is freezing and not responding. The UI works but the API isn't responding.

    We don't have the issue on the development server with an identical codebase, and the error logs aren't showing anything that explains. We've enabled DEBUG error logs, and still not showing anything.

    It's an annoying ghost bug as we don't know what user action is causing it to then diagnose, so no way to reproduce until it happens, at which point it is urgent to restart the system so everyone else can carry on working with it, and no error messages or clues in logs. Basically, the worst kind of issue to try and solve!

    What's our next line of support for this please?
  • marcusquinn
    Senior Member
    • Jan 2021
    • 133

    #2
    OK, what's the best channel for support then please?

    Comment

    • DEN
      Senior Member
      • Apr 2021
      • 106

      #3
      Hello,

      I think the problem may be due to insufficient RAM on your server or not enough hard disk space.
      In order to identify the problem, you need to refer to the logs of your MySQL database and Apache or Nginx server.

      To do this, follow the path:

      /var/log/mysql/error.log
      /var/log/<apache2 or nginx>/error.log

      You can look at these logs if you have root rights.





      Last edited by DEN; 06-10-2021, 09:47 AM.

      Comment

      • marcusquinn
        Senior Member
        • Jan 2021
        • 133

        #4
        Hi DEN thanks for the suggestions. I've upped the RAM allocated to the container from 8GB to 32GB just to be sure - that's not solving it.

        We're looking at logs, but this is the issue, no identifiable error is printing. I have a live business reliant on this now, and they want to invest in Espo, but with now 30+ users unable to work on it and having been unterrupted regularly many times a day for a couple of weeks now, we are all very frustrated.

        The system I love and recommend, the diagnostics and unknown support lines for remedies are very much making everyone anxious and suffer because they want to use it but it needs to stay alive.
        Last edited by marcusquinn; 06-10-2021, 10:04 AM.

        Comment

        • DEN
          Senior Member
          • Apr 2021
          • 106

          #5
          I understand you, but don't be discouraged, the main task for us now is to determine what the problem is.

          Try to analyze the situation using htop.
          Use this command `sudo htop`.

          Comment

          • marcusquinn
            Senior Member
            • Jan 2021
            • 133

            #6
            100% thank you - normally we are good with these things and always exhaust all self-diagnostics first. Thats' the frustraton on this one, lack of errors showing.

            Comment

            • marcusquinn
              Senior Member
              • Jan 2021
              • 133

              #7
              Here's an example of a current symptom:

              The platform loads, we can navigate to list views (although some feel slower), and then either some or all details views just don't load. All we see is the "Loading..." message, but the page never loads, and then the message goes.

              Sometimes this happens for all details pages, and then we restart the container and it only happens for User details pages.

              Then after some time from system usage, all details pages don't load again. No errors. No logs.

              Comment

              • vladimir.d
                Junior Member
                • May 2021
                • 17

                #8
                In the relation to this issue we get JS errors in the Web Inspector console, when opening entities, e.g. User details. `Clear Cache` and `Rebuild` don't help.

                Comment

                • vladimir.d
                  Junior Member
                  • May 2021
                  • 17

                  #9
                  If we clear 'Use Cache' checkbox we get lots of such errors in logs. What do they relate to?

                  Comment

                  • DEN
                    Senior Member
                    • Apr 2021
                    • 106

                    #10
                    To help you I need more information about your settings MySQL and server.

                    ​​You have to monitoring the situation in htop to understand which process eating up the memory.

                    Maybe you have some kind of custom customization and when it starts it loops.
                    "Custom customization is that Workflow, BPM or other Code customization".

                    Also try to Activate MySQL slow query log.
                    More detail here: https://takp.me/posts/how-to-get-a-m...-from-aws-rds/.



                    Comment

                    • marcusquinn
                      Senior Member
                      • Jan 2021
                      • 133

                      #11
                      Thanks DEN we're gathering more detail now and will add to this post asap. Hoping we can fix, we'll detail all the steps along the way as it may help others too.

                      Comment

                      • vladimir.d
                        Junior Member
                        • May 2021
                        • 17

                        #12
                        We use EspoCRM cloudron app, so there is a little number of tools installed to the application container, i.e. there is no `htop` installed.
                        nginx error_logs and access_logs go to `stdout` and then collected by the container logs, and as we noted above it doesn't contain any specific errors that could lead to identify the problem.
                        MySQL slow query logs is switched ON and the log is empty.
                        No errors have been found in the database by `mysqlcheck, everything is clean.

                        Errors mentioned here appear in the application log only when we switch `Use Cache` option off. Could they be related to incomplete relative data for some entities in the database?
                        Are there any tools available to validate crm data completeness?

                        Currently it doesn't open `/#User/view/ID` pages, but it opens `/#User/edit/ID`.

                        Also we have another instance of EspoCRM with the same config and same customisations, and it works just fine.
                        The difference between these instances are just the data and the number of users using it.

                        Comment

                        • DEN
                          Senior Member
                          • Apr 2021
                          • 106

                          #13
                          marcusquinn Do you use cloudron?

                          Comment


                          • marcusquinn
                            marcusquinn commented
                            Editing a comment
                            Hi Den, yes, recommended too - but also in these situations we have to then work out what is platform issues and what is application issues. In this case it is a bit of a combination so we are also seeing their support with access to diagnostic tools.
                        • vladimir.d
                          Junior Member
                          • May 2021
                          • 17

                          #14
                          Well, we have managed to sort one issue to get user details view working.
                          For some reason when VOIP Integration is disabled (installed but disabled) when it builds nested views it still renders VOIP Messages Panel.
                          So we have reactivated VOIP Integration extension as a quick solution for now.
                          But probably some logic needs to be refactored to avoid such errors, I suspect they could appear for other extensions as well.



                          For the freezing problem we identified that Apache got stuck, so what could we look at to get more insights? (just a reminder that there is nothing in apache logs).

                          Comment

                          • esforim
                            Active Community Member
                            • Jan 2020
                            • 2204

                            #15
                            Probably discouraging to say this but the forum is just for the community. unless it is a bug report there is no guarantee support. That is a pay support system for that.

                            Logging, have you already look at apache log error and enable debug mode yet? Here the documents for that: https://docs.espocrm.com/administrat...oubleshooting/

                            As for resolving these issue, it is out of my skill level.

                            Lastly see if any of these can help reduce lag/loading/crashing in some way, if one of them does then that mean an hardware issue I would say: https://docs.espocrm.com/administrat...ance-tweaking/

                            Also considering you using Cloudron, not sure if they do any throttling of usage.

                            Comment

                            Working...