Skip to content

Intelligent Table Extraction

Note:Before learning how to use different functions, we recommend that read the Request Workflow to know a basic PDF processing process. When using different functions, you can set their own special parameters when uploading files. Other basic steps are consistent.

Intelligent Table Extraction:

java
 {    
   "lang": "auto" ,  
 }

Needed Parameters

lang:Supported types and definitions

  • auto - Automatic classification language
  • english - English
  • chinese - Simplified Chinese
  • chinese_tra - Traditional Chinese
  • korean - Korean
  • japanese - Japanese
  • latin - Latin
  • devanagari - Sanskrit alphabet

Example

  1. Authentication

    You need to replace and with accessToken in the publicKey and secretKey authentication getback values you get from the console.

    curl
    curl --location --request POST 'https://api-server.compdf.com/server/v1/oauth/token' \
    --header 'Content-Type: application/json' \
    --data-raw '{
        "publicKey": "publicKey",
        "secretKey": "secretKey"
    }'
    java
     import java.io.*;
     import okhttp3.*;
     public class main {
       public static void main(String []args) throws IOException{
         OkHttpClient client = new OkHttpClient().newBuilder()
           .build();
         MediaType mediaType = MediaType.parse("text/plain");
         RequestBody body = RequestBody.create(mediaType, "{\n    \"publicKey\": \"{{public_key}}\",\n    \"secretKey\": \"{{secret_key}}\"\n}");
         Request request = new Request.Builder()
           .url("https://api-server.compdf.com/server/v1/oauth/token")
           .method("POST", body)
           .build();
         Response response = client.newCall(request).execute();
       }
     }
  2. Create Task

    You need to replace with the accessToken which was obtained from the previous step, and replace with the language type you want to display the error information. After replacing them, you will get the taskId in the response data.

    curl
     curl --location --request GET 'https://api-server.compdf.com/server/v1/task/documentAI/tableRec' \
     --header 'Authorization: Bearer accessToken'
    java
     import java.io.*;
     import okhttp3.*;
     public class main {
       public static void main(String []args) throws IOException{
         OkHttpClient client = new OkHttpClient().newBuilder()
           .build();
         MediaType mediaType = MediaType.parse("text/plain");
         RequestBody body = RequestBody.create(mediaType, "");
         Request request = new Request.Builder()
           .url("https://api-server.compdf.com/server/v1/task/documentAI/tableRec?language={{language}}")
           .method("GET", body)
           .addHeader("Authorization", "Bearer {{accessToken}}")
           .build();
         Response response = client.newCall(request).execute();
       }
     }
  3. Upload Files

    Replace with the file you want to convert, with the taskId obtained in the previous step, with the language type you want to display the error information, and with the accessToken obtained in the first step.

    -Supported image formats: jpg,jpeg,png,bmp

    curl
     curl --location --request POST 'https://api-server.compdf.com/server/v1/file/upload' \
     --header 'Authorization: Bearer accessToken' \
     --form 'file=@"test.pdf"' \
     --form 'taskId="taskId"' \
     --form 'password=""' \
     --form 'parameter="{ \"lang\": \"auto\" }"'
    java
     import java.io.*;
     import okhttp3.*;
     public class main {
       public static void main(String []args) throws IOException{
         OkHttpClient client = new OkHttpClient().newBuilder()
           .build();
         MediaType mediaType = MediaType.parse("text/plain");
         RequestBody body = new MultipartBody.Builder().setType(MultipartBody.FORM)
           .addFormDataPart("file","{{file}}",
                            RequestBody.create(MediaType.parse("application/octet-stream"),
                                               new File("<file>")))
           .addFormDataPart("taskId","{{taskId}}")
           .addFormDataPart("language","{{language}}")
           .addFormDataPart("parameter","{    \"lang\": \"auto\"    }")
           .build();
         Request request = new Request.Builder()
           .url("https://api-server.compdf.com/server/v1/file/upload")
           .method("POST", body)
           .addHeader("Authorization", "Bearer {{accessToken}}")
           .build();
         Response response = client.newCall(request).execute();
       }
     }
  4. Process Files

    Replace with the taskId you obtained from the Create task, and with the accessToken obtained in the first step, and replace with the language type you want to display the error information.

    curl
     curl --location -g --request GET 'https://api-server.compdf.com/server/v1/execute/start?taskId=taskId' \
     --header 'Authorization: Bearer accessToken'
    java
     import java.io.*;
     import okhttp3.*;
     public class main {
      public static void main(String []args) throws IOException{
        OkHttpClient client = new OkHttpClient().newBuilder()
          .build();
        MediaType mediaType = MediaType.parse("text/plain");
        RequestBody body = RequestBody.create(mediaType, "");
        Request request = new Request.Builder()
          .url("https://api-server.compdf.com/server/v1/execute/start?taskId={{taskId}}&language={{language}}")
          .method("GET", body)
          .addHeader("Authorization", "Bearer {{accessToken}}")
          .build();
        Response response = client.newCall(request).execute();
      }
     }
  5. Get Task Information

    Replace with you from Create the task obtained in the taskId, replaced by access_token obtained in the first step.

    curl
     curl --location -g --request GET 'https://api-server.compdf.com/server/v1/task/taskInfo?taskId=taskId' \
     --header 'Authorization: Bearer accessToken'
    java
     import java.io.*;
     import okhttp3.*;
     public class main {
       public static void main(String []args) throws IOException{
         OkHttpClient client = new OkHttpClient().newBuilder()
           .build();
         MediaType mediaType = MediaType.parse("text/plain");
         RequestBody body = RequestBody.create(mediaType, "");
         Request request = new Request.Builder()
           .url("https://api-server.compdf.com/server/v1/task/taskInfo?taskId={{taskId}}")
           .method("GET", body)
           .addHeader("Authorization", "Bearer {{accessToken}}")
           .build();
         Response response = client.newCall(request).execute();
       }
     }

Result

File TypeDescription
.JSONForm Recognition results

Content

ParameterDescription
costtime spent on form identification
typetype of form
angleThe angle at which the form is rotated
widthwidth of the form
heightheight of the form
rowsnumber of rows in the form
colsNumber of columns in the form
positionThe rectangular box position of the form
height_of_rowsheight of each row of the form
width_of_colswidth of each column of the form
table_cellsinformation about all cells in the form
table_cells: start_rowThe start row of a cell
table_cells: end_rowThe end row of a cell
table_cells: start_colThe start column of a cell
table_cells: end_colThe end column of a cell
table_cells: textText in cells
table_cells: positionRectangular box position information for cells
table_cells: linesThe text lines included in the cell
table_cells: lines: textThe text line
table_cells: lines: scoreThe score identified by the text line
table_cells: lines: positiontext line position information
java
{
  "cost": 7566,
  "json_items": [
    {
      "type": "table_with_line",
      "angle": 0.0,
      "width": 488,
      "height": 191,
      "rows": 4,
      "cols": 4,
      "position": [
        114,
        657,
        602,
        657,
        602,
        848,
        114,
        848
      ],
      "height_of_rows": [
        65,
        30,
        31,
        36
      ],
      "width_of_cols": [
        122,
        122,
        118,
        122
      ],
      "table_cells": [
        {
          "start_row": 1,
          "end_row": 1,
          "start_col": 1,
          "end_col": 1,
          "text": "",
          "position": [
            2,
            2,
            124,
            2,
            124,
            67,
            2,
            67
          ],
          "lines": []
        },
        {
          "start_row": 2,
          "end_row": 2,
          "start_col": 1,
          "end_col": 1,
          "text": "Absorbed",
          "position": [
            2,
            64,
            125,
            64,
            125,
            95,
            2,
            95
          ],
          "lines": [
            {
              "text": "Absorbed",
              "score": 1.0,
              "position": [
                29,
                65,
                99,
                65,
                99,
                88,
                29,
                88
              ]
            }
          ]
        }
      ]
    }
  ],
  "html_items": [
    "<table border=\ "1\" width='488px' height='191px'>\n
  <tr>
    <th width='122px' height='65px'></th>
    <th width='122px' height='65px' style=\ "white-space: pre-line\">Absorbed</th>
    <th width='118px' height='65px' style=\ "white-space: pre-line\">Neuter</th>
    <th width='122px' height='65px' style=\ "white-space: pre-line\">Fatigue</th>
  </tr>\n
  <tr>
    <th width='122px' height='30px' style=\ "white-space: pre-line\">Absorbed</th>
    <th width='122px' height='30px'>
    </th>
    <th width='118px' height='30px' style=\ "white-space: pre-line\">2</th>
    <th width='122px' height='30px'>
    </th>
  </tr>\n
  <tr>
    <th width='122px' height='31px' style=\ "white-space: pre-line\">Neuter</th>
    <th width='122px' height='31px'>
    </th>
    <th width='118px' height='31px'>
    </th>
    <th width='122px' height='31px'>
    </th>
  </tr>\n
  <tr>
    <th width='122px' height='36px' style=\ "white-space: pre-line\">Fatigue</th>
    <th width='122px' height='36px'>
    </th>
    <th width='118px' height='36px'>
    </th>
    <th width='122px' height='36px' style=\ "white-space: pre-line\">8</th>\t</tr>\n</table>", "
<table border=\ "1\" width='489px' height='166px'>\n
  <tr>
    <th width='123px' height='61px' style=\ "white-space: pre-line\">Expression</th>
    <th width='117px' height='61px' style=\ "white-space: pre-line\">Image Num</th>
    <th width='118px' height='61px' style=\ "white-space: pre-line\">Correct</th>
    <th width='125px' height='61px' style=\ "white-space: pre-line\">Recognition Rate</th>
  </tr>\n
  <tr>
    <th width='123px' height='31px' style=\ "white-space: pre-line\">Absorbed</th>
    <th width='117px' height='31px' style=\ "white-space: pre-line\">9</th>
    <th width='118px' height='31px' style=\ "white-space: pre-line\">7</th>
    <th width='125px' height='31px' style=\ "white-space: pre-line\">77.8%</th>
  </tr>\n
  <tr>
    <th width='123px' height='30px' style=\ "white-space: pre-line\">Neuter</th>
    <th width='117px' height='30px' style=\ "white-space: pre-line\">9</th>
    <th width='118px' height='30px'>
    </th>
    <th width='125px' height='30px' style=\ "white-space: pre-line\">55.6%</th>
  </tr>\n
  <tr>
    <th width='123px' height='31px' style=\ "white-space: pre-line\">Fatigue</th>
    <th width='117px' height='31px' style=\ "white-space: pre-line\">9</th>
    <th width='118px' height='31px'>
    </th>
    <th width='125px' height='31px' style=\ "white-space: pre-line\">88.9%</th>
  </tr>\n
  <tr>
    <th width='483px' height='33px' colspan=\ "4\" style=\ "white-space: pre-line\">Average recognition rate: 74.1%</th>\t</tr>\n</table>"
  ]
}