Intelligent Document Extraction Tool Guide

Note：Before learning how to use different functions, we recommend that read the Request Workflow to know a basic PDF processing process. When using different functions, you can set their own special parameters when uploading files. Other basic steps are consistent.

Intelligent Document Extraction:

java

{
    "keys": ["Title"],
    "tableHandles": ["Invoice Number"],
    "extractType": "0"
}

Required Parameters:

keys: Text, e.g., ["Title"].

tableHandles: Table headers, e.g., ["Invoice Number"]

extractType: Full-text extraction (0: Default full text, 1: All text, 2: All tables)

Request Example:

You need to replace apiKey with the publicKey obtained from the console, file with the file you want to convert, and language with the desired interface error prompt language type.

curljava

curl

curl --location --request POST 'https://api-server.compdf.com/server/v2/process/idp/documentExtract' \
--header 'x-api-key: apiKey' \
--header 'Accept: */*' \
--header 'Connection: keep-alive' \
--header 'Content-Type: multipart/form-data' \
--form 'file=@"file"' \
--form 'password="" \
--form 'parameter="{ \"keys\":[], \"tableHandles\":[],\"extractType\":2}"' \
--form 'language="1"'

java

import java.io.*;
import okhttp3.*;
public class main {
  public static void main(String []args) throws IOException{
    OkHttpClient client = new OkHttpClient().newBuilder()
      .build();
    MediaType mediaType = MediaType.parse("text/plain");
    RequestBody body = new MultipartBody.Builder().setType(MultipartBody.FORM)
      .addFormDataPart("file","{{file}}",
 RequestBody.create(MediaType.parse("application/octet-stream"),
                                          new File("<file>")))
      .addFormDataPart("language","{{language}}")
      .addFormDataPart("password","")
      .addFormDataPart("parameter","{ \"ocrRecognitionLang\": \"AUTO\" , \"keys\":[], \"tableHandles\":[],\"extractType\":2}") 
      .build();
    Request request = new Request.Builder()
      .url("https://api-server.compdf.com/server/v2/process/idp/documentExtract")
      .method("POST", body)
      .addHeader("x-api-key", "{{apiKey}}")
      .build();
    Response response = client.newCall(request).execute();
  }
}

Response Information:

A successful request returns an HTTP 200 OK status code and a JSON response body showing the order details.

Response type:application/json

Response Parameter	Data Type	Description
code	String	HTTP request status, "200" indicates success
message	String	Request message
data	Object	Return result
+taskId	String	Task ID
+taskFileNum	int	Number of files processed in the task
+taskSuccessNum	int	Number of files successfully processed in the task
+taskFailNum	int	Number of files failed in the task
+taskStatus	String	Task status
+assetTypeId	int	Used asset type ID
+taskCost	int	Task cost
+taskTime	int	Task duration
+sourceType	String	Original format
+targetType	String	Target format
+fileInfoDTOList	Array	Task file information
++fileKey	String	File key
++taskId	String	Task ID
++fileName	String	Original file name
++downFileName	String	Download file name
++fileUrl	String	Original file URL
++downloadUrl	String	Processed result file download URL
++sourceType	String	Original format
++targetType	String	Target format
++fileSize	int	File size
++convertSize	int	Processed result file size
++convertTime	int	Processing time
++status	String	File processing status
++failureCode	String	File processing failure error code
++failureReason	String	File processing failure description
++fileParameter	String	Processing parameter

Response Example:

json

"code": "200",
"msg": "success",
"data": {
    "taskId": "f416dbcf-0c10-4f93-ab9e-a835c1f5dba1",
    "taskFileNum": 1,
    "taskSuccessNum": 1,
    "taskFailNum": 0,
    "taskStatus": "<taskStatus>",
    "assetTypeId": 0,
    "taskCost": 1,
    "taskTime": 1,
    "sourceType": "<sourceType>",
    "targetType": "<targetType>",
    "fileInfoDTOList": [
      {
        "fileKey": "<fileKey>",
        "taskId": "<taskId>",
        "fileName": "<fileName>",
        "downFileName": "<downFileName>",
        "fileUrl": "<fileUrl>",
        "downloadUrl": "<downloadUrl>",
        "sourceType": "<sourceType>",
        "targetType": "<targetType>",
        "fileSize": 24475,
        "convertSize": 6922,
        "convertTime": 8,
        "status": "<status>",
        "failureCode": "",
        "failureReason": "",
        "fileParameter": "<fileParameter>"
      }
    ]
}

Result:

File Type	File Description
.json	JSON file with intelligent document extraction completed

Return Data Structure Explanation:

JSON Content Explanation

Return Parameter	Data Type	Description
code	String	Error code, "200" indicates success
message	String	Error message
data	Object	Return result
+details	Object	Key information extraction result
++Page-index	Object	Extraction result for the corresponding page number
+++key	String	Key information field extraction result, key:value
+++tables	Array	Key information table extraction result, tables:[ [table1], [table2] ]

JSON Structure Example:

json

{
    "code": "200",
    "msg": "success",
    "data": {
        "details": {
            "Page-1": {
                "Order Date": "xxx",
                "Order #": "xxx",
                "Quote#": "xxx",
                "Your estimated delivery date is": "xxx",
                "tables": null
            }
        }
    }
}

Asynchronous Request

If you need to use the file asynchronous processing flow, please read the Asynchronous Request Instructions.

Intelligent Document Extraction Tool Guide ​

Intelligent Document Extraction Tool Guide