Duplicate checking

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • livewire
    Senior Member
    • Nov 2016
    • 100

    Duplicate checking

    Hi

    can u let me on what field is the duplicate check is done when uploading leads? is it possible to change it to a specific field?
  • Vadym
    Super Moderator
    • Jun 2021
    • 346

    #2
    Hi livewire,

    Perhaps the following thread will be helpful:

    Comment

    • rabii
      Active Community Member
      • Jun 2016
      • 1262

      #3
      Originally posted by livewire
      Hi

      can u let me on what field is the duplicate check is done when uploading leads? is it possible to change it to a specific field?
      The lead duplicate is checked through (first and last name) OR email address.
      Rabii
      Web Dev

      Comment

      • livewire
        Senior Member
        • Nov 2016
        • 100

        #4
        Thank you.

        What i want is to check duplicates only through phone numbers. So if add a custom duplicate check function will it replace the current duplicate checking method or add another clause?

        i have around 90k leads on my crm.. when i try to import a file with about 1500 leads it takes more than 2 days to complete it in idle mode, if i disable duplicate check it uploads in a few minutes.

        so if i write a custom duplicate check to check duplicate only through phone number. will there be an improvement?

        Comment

        • rabii
          Active Community Member
          • Jun 2016
          • 1262

          #5
          i think this is not a problem of duplicate check class, 1500 leads shouldn't take that much, try to run it directly without using Idle mode. i have uploaded 70K leads into the system as chunks of 10K each upload and it took only few minutes each chunck. so using a custom duplicateChecker class wouldn't help in this case especially if you want to check by phone number it is more expensive query then checking by name. if you decided to do it anyway, here is a class i have used in past which combine checking by name Or email Or phone number, fee free to customise it however you wanted.

          PHP Code:
          <?php
          
          namespace Espo\Custom\Classes\DuplicateWhereBuilders;
          
          use Espo\Core\ORM\Entity as CoreEntity;
          
          use Espo\Core\{
              Duplicate\WhereBuilder,
              Field\EmailAddressGroup,
              Field\PhoneNumberGroup,
          };
          
          use Espo\ORM\{
              Query\Part\Condition as Cond,
              Query\Part\WhereItem,
              Query\Part\Where\OrGroup,
              Entity,
          };
          
          /**
           * @implements WhereBuilder<CoreEntity>
           */
          class Contact implements WhereBuilder
          {
              public function build(Entity $entity): ?WhereItem
              {
                  assert($entity instanceof CoreEntity);
          
                  $orBuilder = OrGroup::createBuilder();
          
                  $toCheck = false;
          
                  if ($entity->get('firstName') || $entity->get('lastName')) {
                      $orBuilder->add(
                          Cond::and(
                              Cond::equal(
                                  Cond::column('firstName'),
                                  $entity->get('firstName')
                              ),
                              Cond::equal(
                                  Cond::column('lastName'),
                                  $entity->get('lastName')
                              )
                          )
                      );
          
                      $toCheck = true;
                  }
          
                  if (
                      ($entity->get('emailAddress') || $entity->get('emailAddressData')) &&
                      (
                          $entity->isNew() ||
                          $entity->isAttributeChanged('emailAddress') ||
                          $entity->isAttributeChanged('emailAddressData')
                      )
                  ) {
                      foreach ($this->getEmailAddressList($entity) as $emailAddress) {
                          $orBuilder->add(
                              Cond::equal(
                                  Cond::column('emailAddress'),
                                  $emailAddress
                              )
                          );
          
                          $toCheck = true;
                      }
                  }
          
                  if (
                      ($entity->get('phoneNumber') || $entity->get('phoneNumberData')) &&
                      (
                          $entity->isNew() ||
                          $entity->isAttributeChanged('phoneNumber') ||
                          $entity->isAttributeChanged('phoneNumberData')
                      )
                  ) {
                      foreach ($this->getPhoneNumberList($entity) as $phoneNumber) {
                          $orBuilder->add(
                              Cond::equal(
                                  Cond::column('phoneNumber'),
                                  $phoneNumber
                              )
                          );
          
                          $toCheck = true;
                      }
                  }
          
                  if (!$toCheck) {
                      return null;
                  }
          
                  return $orBuilder->build();
              }
          
              /**
               * @return string[]
               */
              private function getEmailAddressList(CoreEntity $entity): array
              {
                  if ($entity->get('emailAddressData')) {
                      /** @var EmailAddressGroup $eaGroup */
                      $eaGroup = $entity->getValueObject('emailAddress');
          
                      return $eaGroup->getAddressList();
                  }
          
                  if ($entity->get('emailAddress')) {
                      return [
                          $entity->get('emailAddress')
                      ];
                  }
          
                  return [];
              }
          
              private function getPhoneNumberList(CoreEntity $entity): array
              {
                  if ($entity->get('phoneNumberData')) {
                      /** @var PhoneNumberGroup $eaGroup */
                      $eaGroup = $entity->getValueObject('phoneNumber');
          
                      return $eaGroup->getNumberList();
                  }
          
                  if ($entity->get('phoneNumber')) {
                      return [
                          $entity->get('phoneNumber')
                      ];
                  }
          
                  return [];
              }
          }​
          Hope this helps
          Rabii
          Web Dev

          Comment

          • livewire
            Senior Member
            • Nov 2016
            • 100

            #6
            Thanks again.. When i run it directly, it it adds about 30 leads and says in progress forever. doesn't add further. its extremely slow when i enable enable the duplicate check...

            i just noticed when i skip the email fields its much faster.. i need the emails

            Comment

            • rabii
              Active Community Member
              • Jun 2016
              • 1262

              #7
              which version you are using ?
              Rabii
              Web Dev

              Comment

              • yuri
                Member
                • Mar 2014
                • 8621

                #8
                > it takes more than 2 days to complete it in idle mode

                How many leads do you have? It should not take that long.
                If you find EspoCRM good, we would greatly appreciate if you could give the project a star on GitHub. We believe our work truly deserves more recognition. Thanks.

                Comment

                • livewire
                  Senior Member
                  • Nov 2016
                  • 100

                  #9
                  Originally posted by rabii
                  which version you are using ?
                  7.0.7

                  Comment

                  • livewire
                    Senior Member
                    • Nov 2016
                    • 100

                    #10
                    Originally posted by yuri
                    > it takes more than 2 days to complete it in idle mode

                    How many leads do you have? It should not take that long.
                    The overall system has around 97000 leads

                    Comment

                    • livewire
                      Senior Member
                      • Nov 2016
                      • 100

                      #11
                      I uploaded a file with 2000 leads. Over 24 hours has passed and only 1221 has been added. Im so confused why it takes so much time with duplicate checking

                      Comment

                      • yuri
                        Member
                        • Mar 2014
                        • 8621

                        #12
                        97000 is not a big number. How many email addresses and phone numbers do you have? You can check at Administration > Email Address / Phone Numbers.
                        If you find EspoCRM good, we would greatly appreciate if you could give the project a star on GitHub. We believe our work truly deserves more recognition. Thanks.

                        Comment

                        • livewire
                          Senior Member
                          • Nov 2016
                          • 100

                          #13
                          Originally posted by yuri
                          97000 is not a big number. How many email addresses and phone numbers do you have? You can check at Administration > Email Address / Phone Numbers.
                          yeh.. Thats why im confused. Its super slow.

                          Phone : 99,302
                          Emails: 78,874

                          is there any specific setting that need to check?
                          Last edited by livewire; 05-12-2023, 08:22 AM.

                          Comment


                          • rabii
                            rabii commented
                            Editing a comment
                            can you check if the cache is enable (administration -> setting -> use cache)

                          • livewire
                            livewire commented
                            Editing a comment
                            its enabled
                        • yuri
                          Member
                          • Mar 2014
                          • 8621

                          #14
                          Does it take long if you import leads w/o email addresses so that only the name is checked for duplicates.
                          If you find EspoCRM good, we would greatly appreciate if you could give the project a star on GitHub. We believe our work truly deserves more recognition. Thanks.

                          Comment

                          • yuri
                            Member
                            • Mar 2014
                            • 8621

                            #15
                            How much time does it take to run this query:

                            Code:
                            SELECT SQL_NO_CACHE `id` FROM `lead` WHERE `first_name` = 'some name' AND `last_name` = 'some name' AND deleted = 0
                            If you find EspoCRM good, we would greatly appreciate if you could give the project a star on GitHub. We believe our work truly deserves more recognition. Thanks.

                            Comment

                            Working...