PDF - Recognize - Cut - Extract - send - update EspoCrm

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • enricorossa
    Senior Member
    • Jul 2018
    • 125

    PDF - Recognize - Cut - Extract - send - update EspoCrm


    I present my app to understand how to integrate it into ESPO CRM

    RULES CONFIGURATION
    recognize - break - export - send
    Locate the PDF type based on a geographic area in the document
    Break the document based on a key in a certain geographical area in the document
    Extract data from the document
    Search within the address book for the contact to whom to send the document based on the variables extracted from the document
    Update ESPOCRM (to do)


    In this video I quickly introduce how:

    recognize the invoices generated by the management system
    break the document
    extract the data
    search in the address book based on the VAT number
    send the document


    If you are interested in helping me design a plugin to update espo, please contact me
    or you know an existing module to adapt please contact me

    youtube video


    Last edited by enricorossa; 03-01-2023, 08:18 AM.
  • emillod
    Active Community Member
    • Apr 2017
    • 1405

    #2
    Hello Sir,
    is it actually related to EspoCRM? Or it's a marketing of your application? As you writing in your post, it's not even connected to EspoCRM, because you have it in TO DO ("Update ESPOCRM (to do)"​).
    I watched your video and there is nothing about EspoCRM.

    Comment

    • enricorossa
      Senior Member
      • Jul 2018
      • 125

      #3
      Hello, no it's not marketing because the application is not for sale and will be available in a free version, if you look at the bottom I wrote

      " If you are interested in helping me design a plugin to update espo, please contact me
      or you know an existing module to adapt please contact me"

      I'd just like to understand if someone might be interested in such an app and how to update espo if there is already some module that takes a json and processes it to update the expo entities.

      I apologize for my bad presentation and for my English.​

      If I have not been appropriate, I will delete the post

      Comment

    • emillod
      Active Community Member
      • Apr 2017
      • 1405

      #4
      enricorossa no problem. I just wanted to clarify this matter Don't worry

      Comment

      • esforim
        Active Community Member
        • Jan 2020
        • 2204

        #5
        Hi there,

        Looking at the video I can see what it purpose it and what you trying to do! And it is awesome!

        I been looking for something like this for quite a while, the closest is using tools that is hidden away or the server requirement is too difficult and not accessible to Shared Hosting type of server setup. Hopefully that not the case.

        For those that don't understand it basically exact data from PDF into a Text format.

        Think of it as PDF-to-Text Field

        PDF can always export to Text and not hard, export certain key data is what difficult and I haven't seen any software that do that yet: even offline/desktop software, at least I can't find one.

        I'm looking forward to see the result of this project, as your video doesn't show what the Result.txt look like... one feature I'm hoping it to export it into field, or perhaps CSV format or some other format where we can import/data/API Rest easier these exacted information.

        Secondly, is it possible to exact from keywords instead of "Geographic" location? For example, it would search for "Date:" and get all data after the word "Date:" and stop after a certain numbers of characters, or stop after the line end, etc. This might be more difficult to do?

        I don't think there anything similar in EspoCRM yet.

        Some other similar project: free, paid or close source are as follow;
        Extract data from PDF file with Python. Contribute to lgmarin/pdf_data_extract development by creating an account on GitHub.

        PDF data extraction is a common problem faced by organizations. This article covers 6 popular ways to extract data from PDF files in 2024.

        Do you want to extract data from PDF documents? Discover various PDF data extraction methods, such as PDF Parsing and Zonal OCR Technology.
        Last edited by esforim; 02-20-2023, 05:08 AM.

        Comment

        • enricorossa
          Senior Member
          • Jul 2018
          • 125

          #6
          My app allows you to graphically configure the rules that generate the commands to be executed by the poppler-util, using pdfToText.exe or PdftoCairo.exe you can recognize, export data and cut pdfs.
          Previously I had to write all the rules by hand in a file which I then executed via the command line and wasted a lot of time, especially when the pdf templates were modified,
          it was very difficult to test the commands with the right coordinates to find the position of the text to extract.
          We are transferring everything from the old CRM where I had written a module that takes the txt and updated the data, now the time has come to pass this too to ESPOCRM and I was wondering if there is a module to adapt.​

          1) I haven't tested it on server hosting yet because it's not possible to install the poppler-util, the executable that processes the commands generated by my app
          Example of command for extracting text​
          pdftotext.exe -f 1 -l 1 -r 150 -nopgbrk -x 685 -y 122 -W 308 -H 23 spool/1/spezzati/cedolini_ydfrL7H2W2.pdf spool/1/tmp/bPxpXTENkT/bPxpXTENkT.txt

          2) I currently use it locally and on a domain on VPS servers and it performs very well

          3) the software exports a JSON inside a txt

          [
          {
          "id":33,
          "idregola":10,
          "pagina":1,
          "x":71,
          "y":128,
          "w":99,
          "h":16,
          "variabile":"matricola",
          "valore":"0000000011"
          },
          {
          "id":34,
          "idregola":10,
          "pagina":1,
          "x":273,
          "y":125,
          "w":320,
          "h":25,
          "variabile":"cognome",
          "valore":"PINCO"
          },
          ]​

          4) Yes, you can extract as many data as you want and process it by telling it to extract up to N characters or up to another word
          (I have to implement the code for this function, but it can be done)

          The interesting thing about this app is that if you have 2 pdfs with the same keyword you are able to distinguish the two pdfs based on the geographical area.

          ES.
          PDF1 contains the word INVOICY in the area 70, 120, 99, 16 and is an invoice
          PDF2 contains the word INVOICY in the area 80, 140, 110, 20 but this is not an invoice but a summary.

          The software recognizes them and saves the pdfs in a folder of your choice with different names, the splitting and extracting rules work on a suffix of the name and apply the rule of
          splitting or extracting only on that suffix, by doing so you can structure a behavior that allows you to automate all the processing processes of the pdf, you save it in the folder
          of spool and you forget it, you will find the entered or updated data in espocrm.​​

          Comment

          • enricorossa
            Senior Member
            • Jul 2018
            • 125

            #7

            Good morning, I'm writing the espo module and testing the insertion, I can't understand how to load the list of fields of an entity via the api, currently I load a record from the entity and retrieve the name of the fields but doing so some fields like teams they are not loaded. Can you give me a hand? Thank you​

            Comment

            • enricorossa
              Senior Member
              • Jul 2018
              • 125

              #8
              I finished the module for importing into espocrm via API

              A few more tweaks like:

              1) fetch data from an external source via data extraction variables
              2) Relations with other entities​


              Watch the video indicated in the first post​

              Comment

              Working...