Unexplained system freezes not showing in error logs with the API not responding

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • yuri
    Member
    • Mar 2014
    • 8552

    #16
    Please check how many records do you have in auth_token table. Whether the number is very big.
    Last edited by yuri; 06-11-2021, 07:31 AM.
    If you find EspoCRM good, we would greatly appreciate if you could give the project a star on GitHub. We believe our work truly deserves more recognition. Thanks.

    Comment

    • marcusquinn
      Senior Member
      • Jan 2021
      • 133

      #17
      Hey espocrm We'll pay for support, email reply directed us to continue the discussion here. Thanks for the links, and yes I do recommend Espo a lot on the Cloudron forum too, so I'm sure you have more developers coming from there too.

      Comment

      • marcusquinn
        Senior Member
        • Jan 2021
        • 133

        #18
        espocrm I appreciate your communications but am confused by your use of a handle so closely similar to the Espo trademark in your username, that it suggests to me an official relationship?

        Can you confirm your relationship with the Company; Letrium Limited as per the Trademark usage terms here: https://www.espocrm.com/trademark/

        As I have said before, we are not asking for free support, we are a paying client, and happy to pay for support - in exchange for the support being forthcoming.

        The support email response directed me here, hence we are having this open dialog, for the benefit of the community - and community for which I am happy to commit to making all the contributions towards that we can in our development with the platform.

        What we know is:

        1. CRM is a business critical service, because businesses rely on timely and reliable communications, and are legally obliged in many ways for data management and record keeping. CRM systems failures carry a high-risk of causing business failures, especially when staff and client sentiment tends to have more control over modern businesses than any directors could hope to in balance.
        2. It is possible for EspoCRM to fail to respond to normal interface interactions without meaning explanatory error message, nor diagnostics in error reporting, that could otherwise lead a developer to the source of the issue in a meaningful way to fix the issues.

        I am posting here as a last resort, after exhausting all documentation, self-help tools, email contact, and making allowances for large amounts of paid developer investigation time into the issue now amounting to several thousands of pounds in time costs.

        We are not incapable in business or development, but when the tools are depriving us of diagnostics information, we are unnecessarily binded to solving the problem ourselves, and are force to correspond here through lack of alternatives presented.

        Money is not a problem, but it must be in exchange for a reliable service, I do not know any business that will pay for support without any assurances that the payment will result in a reliable solution.

        All I have tight now is 1st-hand experience of repeat application errors that are only solved with systems resrarts but do not print anything to error logs that would lead to explanation on the action causing the failure to suggest what part of the system needs review or change to stop this happening again and again until ultimate failure of all that rely on the system that is doing this.

        Let us find a way to make these reliability assurances and share all this knowledge openly for the safety of all businesses that could have these issues and need diagnostics to be able to solve them without ransom for the informatiuon to solve the issues with.

        I come with great hope, optimism and support for the platform, communicy and developers - but need that to be matched with understanding, communications and perseverence for us to both find a solution, and make this discovery and remedy a permanent part of the resiliance of the platform to make itself fast to diagnose and fix in future without these delays or debats as to how to even get the information needed to direct attention to fixing whatever area it is that has become detrimental to the platform staying responsive.

        Comment

        • marcusquinn
          Senior Member
          • Jan 2021
          • 133

          #19
          espocrm Sorry, I have only just found this forum platform goes to multi-page threads, so I was replying to what appered to be the last comment on this post until I saw the 2nd page.

          Comment

          • eymen-elkum
            Active Community Member
            • Nov 2014
            • 472

            #20
            There's only one reason that makes espocrm stopped on some point without any error message in backend / frontend:

            When you have custom view and defined on properly path but the key path on the js file is not correct.

            For example, just change this line on any copy of your crm:

            FROM: define('views/site/master', 'view', function (Dep) {

            TO: define('views/site/master-xxxxx', 'view', function (Dep) {

            And espo will stop working

            To make sure the problem is bad custom view and not API freezing just visit this link: https://your-crm.com/api/v1/Metadata

            If you can see result but the UI freezing then the problem is 100% bad view file

            ANOTHER CASE:

            One of my clients was having similar freezing that caused because of auto generated teams and assigning them to users automatically, so some users where having +50.000 teams, and the roles were designed based on the teams, so mysql was freezing!
            Last edited by eymen-elkum; 06-15-2021, 05:22 AM.
            CEO of Eblasoft
            EspoCRM Expert since 2014
            Full Stack Web Developer since 2008
            Creator of Numerous Successful Extensions & Projects​

            Comment

            • esforim
              Active Community Member
              • Jan 2020
              • 2206

              #21
              Originally posted by marcusquinn
              espocrm Sorry, I have only just found this forum platform goes to multi-page threads, so I was replying to what appered to be the last comment on this post until I saw the 2nd page.
              Hi Marcus,

              I'm just another user like yourself, I think only 2-3 official team members come to the forum nowadays. Based on my experience I curate a list of them here, you can find out who under "EspoCRM Party" link at: https://github.com/o-data/EspoCRM-Le...-espocrm-party

              In term of my username, I couldn't think of a name when I first registered it and went with this, as for the the account username you are pinging (@espocrm ) it is probably the official account albeit abandon as they don't have a Community Manager anymore? The difference might be more noticeable once you look into it, as I don't have an avatar and missing a "o" in the username. I think I explain this somewhere else on the forum as well.

              Anyway don't take any of my word as in any official capacity. If I could add an signature I would have made that more clear. There is plenty of Feature request that you might have seen me "Like" on your Feature Request.

              As for the rest of the comments, I don't think it is relevant to me but I guess Maximus and Yuri might considered it as feedback for them to use.

              Comment

              • yuri
                Member
                • Mar 2014
                • 8552

                #22
                Could you provide access to your CRM so we could take a look? You can message in the custom portal.
                If you find EspoCRM good, we would greatly appreciate if you could give the project a star on GitHub. We believe our work truly deserves more recognition. Thanks.

                Comment


                • esforim
                  esforim commented
                  Editing a comment
                  Just a note based on Marcus post, it freeze unexpected without any reproducible method so it may require some sort of stress test with multiple users.
              • marcusquinn
                Senior Member
                • Jan 2021
                • 133

                #23
                I'm very sorry to say that this is **still** not solved, and we have a lot of very unhappy users every time it happens, because the system is provided to them, for their investment in time as users, on the basis it will save them time, and not stop them completing their work uninterrupted by unexplained, undiagnosable system outages.

                Payment is available for **results**, there is no expectation of any service for free. There is, however, an expectation that any request for money will come with an assurance of a solution within a justifiable budget, and a guide to communications, access and data security confidence through the process.

                How do we get support for this? Support that will be able to fix the fact that there are no diagnostics error messages or logs to even explain why the system stops responding please?

                Problems:
                1. The system fails without error message or diagnostics logs.
                2. The software brings down the entire webserver, and fails to self-heal and recover without manual intervention.
                3. User faith in the system development is now very low, despite the high amount of development investment we are making, the fundamental basics are that the most visible and common issue is the most memorable, and in this case it is total system failure with a blank screen, no error or explanation, and no diagnostics that any developer has yet been able to act on. A system cannot be successful when Users cannot trust it for their investment of time to work with it and help improve it.

                Seems to be another person had the same problem, although I don't know if they solved:
                * https://forum.cloudron.io/topic/3209...art-webservice

                Comment

                • eymen-elkum
                  Active Community Member
                  • Nov 2014
                  • 472

                  #24
                  Hi all,

                  I have started checking crm for marcusquinn , and would like to share some information (not sure yet if we are done but I think so):

                  I noticed that Clean-Up job is not active, and never ran before, the same is for Auth Token Control.

                  The job table is too big +1 million record, and this was causing slow queries, as you know this table is frequently accessed by espocrm,

                  Not sure why there is no error messages and just freezing, but I think this is something related to the Cloudron platform where the espo project is hosted on.

                  If we found more useful details we will share here, so everyone can get benefit.

                  EDIT:
                  I noticed some issues on cloudron forum like this:

                  Hello, we are running in an issue where our Rocket.Chat instance is unable to properly send messages after about 20 hours of uptime. Whenever I try to send ...


                  Seems freezing with other apps as well, so I think it is responsible of hidding the error messaging.

                  Best Regards
                  Last edited by eymen-elkum; 09-01-2021, 12:37 PM. Reason: added notice about cloudron freezing for rocket chat
                  CEO of Eblasoft
                  EspoCRM Expert since 2014
                  Full Stack Web Developer since 2008
                  Creator of Numerous Successful Extensions & Projects​

                  Comment


                  • marcusquinn
                    marcusquinn commented
                    Editing a comment
                    Thank you, very helpful sharing what we learn to compare notes and ideas.

                    I'm always very cautious of assumption, because it can end a search in an area for a solution where the solution may be.

                    So, we will put all this down as "circumstantial information", but equally are cautious of "correlation is not causation", as we might just be finding and fixing other issues too, but the root cause is still elusive.

                    There is also some psychology involved, in people very easily attach themselves to opinions or personal experience and miss scientific rigour.

                    "It sounds like..." is not the same as, "this means that".

                    "Have you tried this basic troubleshooting idea that anyone should know", is not the same as, "make sure this log file includes that trace message".

                    To me the quest is for the diagnostics tools, so that it is not possible to have a blank screen without a meaningful error message.

                    We have found a gap in diagnostics ability that is happing with this single-page application (SPA), that would not happen with conventional/old-school multi-page applications like Wordpress, SuiteCRM etc.

                    Don't get me wrong, an SPA through an API is 100% better, faster and the way we need to be - but, here we have a difference in error symptoms and diagnostics that is causing extraordinary time, money and distraction costs, without any tangible error messages or information that anyone else can really act on in tracing back to the source.
                • tarasm
                  Super Moderator
                  • Mar 2014
                  • 573

                  #25
                  I'm not sure that your problem is related to EspoCRM itself. It sounds like it's the problem of the web-server services.
                  Did you try to use EspoCRM on a server with native nginx/apache without external Apps Managers?

                  I would recommend you:
                  1. Use a dedicated server where only EspoCRM will be running.
                  2. Install EspoCRM on a fresh server by this script, https://github.com/espocrm/documenta...n-by-script.md or manually with nginx.
                  3. Correct PHP and MySQL settings based on your user count.

                  Please let me know how many simultaneous users do you have in EspoCRM? I can help you to choose a server configuration.
                  Job Offers and Requests

                  Comment


                  • marcusquinn
                    marcusquinn commented
                    Editing a comment
                    Thank you - it is a reasonable trouble-shooting comparison to make - however, there is a minor consideration to make in that this is a live system, relied upon by dozens of people, only kept alive right now by many developers attending to webserver restarts many, many times a day.

                    The problem is already consuming so much time, spare capacity to build and migrate hosting stacks just to see the same problem on another environment is quite a gamble. Yes, I also know it is also a gamble not to try that elimination either.

                    The fundamental problem is that a LAMP app should be able to run in Docker, and print errors to debug logs or screen or both when there is a blank screen failure that freezes the entire web server.

                    I would like to see an error message that says:

                    "EspoCRM has had a total failure, and stopped the webserver running it from working. This is the php error, this is the apache error, this is the mysql error, these resources limits have been reached, etc."

                    The failure in my eyes is that this is possible to bring down a webserver with no explanation.

                    Will a non-Docker hosting stack solve this lack of Espo debug logs giving us a backtrace that identifies the query, code or resource limit that is causing a blank screen dearth?

                    I suppose we won't know if we don't try, so we will keep trying everything, but that is the challenge for Espo, to be able to exhaust all possible methods of communicating errors, not for us to be blinded by silent fails without explanation.
                • Zosh
                  Member
                  • May 2021
                  • 93

                  #26
                  This is interesting... marcusquinn I am curious what makes you utilize this Cloudron container rather than opting for a more traditional deployment? It would be easier to ascertain if the issue is more toward the underlying platform end of the stack or on the EspoCRM end. It would seem the suspicion of the root issue here is shifting more toward the Cloudron end.

                  You can of course do what you want with free & open-source software, but something like running Cloudron really seems like a more exotic deployment compared to the recommended underlying tech stack on a fresh instance. The community is of course always here to happily help how it can, but if your issue or similar hasn't been encountered before on typical deployments, and you're using an unusual tech stack, then it likely could be due to your underlying platform stack where your efforts and allocating of resources could be more fruitful instead of pointing toward the application itself.

                  I see you mention your Dev instance is working fine, but it doesn't seem that it is undertaking the same load and usage as your Prod instance. Earlier in the thread it seems you're suggesting EspoCRM lacks diagnostics/troubleshooting/logging but the application can only be made aware of and forward to a log error conditions executing within its own code. If the application logs are empty as it pertains to your issue then that only further suggests looking deeper into the underlying tech stack.

                  Just some of my perspective from my own experiences for future consideration is all!!

                  Comment


                  • marcusquinn
                    marcusquinn commented
                    Editing a comment
                    Much of what I do is about de-duplicating effort. If we have 40 webapps, would you run 40 VPS instances?

                    Cloudron is Docker-based - so the question is why does Docker exist, or Kubernetes for that matter?

                    Why wouldn't we containerise applications under a common parent operating system?

                    Is correlation also causation in this instance?

                    We've also developed a vast amount with Espo in data structure and improvements, that I would love to share with the community too - but that needs packaging specifically for sharing, and this issue is taking time away from our otherwise determined upstream contributions.

                    I will invest and provide many developers contributing to Espo core, and extensions, and our business is not in selling any code, but in using it, so that investment will cost the community nothing other than the same interest all open source software asks. So, I hope you find we are also very committed to this software, stack and community, and making it work for many, many more organisations.

                    Is the information that we use a hosting stack that others here are less familiar with significant because it is different? Or because it is less familiar?

                    I'm happy to eliminate all "suspicions", but there is a time-cost in doing that, and time-costs in alternative hosting strategies are time away from developing features and improvements.

                    If we were hosting with Espo, would we be able to bring down their entire SaaS stack from the error we seem to uniquely have and stop many other businesses doing business with CRM too?

                    We don't know, but there are also risks and costs in these tests, and ultimately, each test is not improving Espo's diagnostics tools.

                    If anything, it will be a worse problem if we solve the symptom without knowing the cause, because then it remains an unknown risk that could happen again, and at a worst time.

                    The solution needs to be a solution that prevents repeat problems. Just saying that Espo doesn't work for Docker because we/you/it cannot diagnose issues, does eliminate a significant part of Espo's potential.

                    I hope you find myself and those I'm lucky to work with are not critics, but passionate fans - but we are desperate to separate speculation from science and ket to the cause of this bug once and for all, with debugging tools that cannot be fooled or stopped from telling the story.
                Working...