Skip to content

Intelligent Document Parsing Tool Guide

Output .Json format file

Note:Before learning how to use different functions, we recommend that read the Request Workflow to know a basic PDF processing process. When using different functions, you can set their own special parameters when uploading files. Other basic steps are consistent.

PDF to JSON:

java
{
        "enableAiLayout": 1,
        "isContainImg": 1,
        "isContainAnnot": 1,
        "enableOcr": 0,
        "ocrRecognitionLang": "AUTO",
        "pageRanges": "1,2,3-5",
        "resolveType": "EXTRACT"
        }

Required parameters

enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.

isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.

isContainAnnot: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default is 1.

enableOcr: Whether to use OCR (0: Disable; 1: Enable). Default is 0.

ocrLanguage: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.

pageRanges: Specify page number conversion, starting from 1. Default is empty.

resolveType: Extract JSON content type. TEXT; TABLE; EXTRACT; IMAGE. Default EXTRACT.

Request Example:

Replace apiKey with the publicKey obtained from the dashboard, file with the file you want to convert, and language with your preferred interface error prompt language type.

curl
  curl --location --request POST 'https://api-server.compdf.com/server/v2/process/pdf/csv' \
  --header 'x-api-key: apiKey' \
  --form 'file=@"test.pdf"' \
  --form 'language=""' \
  --form 'password=""' \
  --form 'parameter={  "enableOcr": 1 }'
java
import java.io.*;
import okhttp3.*;
public class main {
   public static void main(String []args) throws IOException{
      OkHttpClient client = new OkHttpClient().newBuilder()
              .build();
      MediaType mediaType = MediaType.parse("text/plain");
      RequestBody body = new MultipartBody.Builder().setType(MultipartBody.FORM)
              .addFormDataPart("file","{{file}}",
                      RequestBody.create(MediaType.parse("application/octet-stream"),
                              new File("<file>")))
              .addFormDataPart("language","{{language}}")
              .addFormDataPart("password","")
              .addFormDataPart("parameter","{   \"enableOcr\":1}")
              .build();
      Request request = new Request.Builder()
              .url("https://api-server.compdf.com/server/v2/process/pdf/json")
              .method("POST", body)
              .addHeader("x-api-key", "{{apiKey}}")
              .build();
      Response response = client.newCall(request).execute();
   }
}

Response Information:

A successful request returns an HTTP 200 OK status code and a JSON response body showing the order details.

Response type:application/json

Response ParameterData TypeDescription
codeStringHTTP request status, "200" indicates success
messageStringRequest message
dataObjectReturn result
+taskIdStringTask ID
+taskFileNumintNumber of files processed in the task
+taskSuccessNumintNumber of files successfully processed in the task
+taskFailNumintNumber of files failed in the task
+taskStatusStringTask status
+assetTypeIdintUsed asset type ID
+taskCostintTask cost
+taskTimeintTask duration
+sourceTypeStringOriginal format
+targetTypeStringTarget format
+fileInfoDTOListArrayTask file information
++fileKeyStringFile key
++taskIdStringTask ID
++fileNameStringOriginal file name
++downFileNameStringDownload file name
++fileUrlStringOriginal file URL
++downloadUrlStringProcessed result file download URL
++sourceTypeStringOriginal format
++targetTypeStringTarget format
++fileSizeintFile size
++convertSizeintProcessed result file size
++convertTimeintProcessing time
++statusStringFile processing status
++failureCodeStringFile processing failure error code
++failureReasonStringFile processing failure description
++fileParameterStringProcessing parameter

Response Example:

json
"code": "200",
"msg": "success",
"data": {
    "taskId": "f416dbcf-0c10-4f93-ab9e-a835c1f5dba1",
    "taskFileNum": 1,
    "taskSuccessNum": 1,
    "taskFailNum": 0,
    "taskStatus": "<taskStatus>",
    "assetTypeId": 0,
    "taskCost": 1,
    "taskTime": 1,
    "sourceType": "<sourceType>",
    "targetType": "<targetType>",
    "fileInfoDTOList": [
      {
        "fileKey": "<fileKey>",
        "taskId": "<taskId>",
        "fileName": "<fileName>",
        "downFileName": "<downFileName>",
        "fileUrl": "<fileUrl>",
        "downloadUrl": "<downloadUrl>",
        "sourceType": "<sourceType>",
        "targetType": "<targetType>",
        "fileSize": 24475,
        "convertSize": 6922,
        "convertTime": 8,
        "status": "<status>",
        "failureCode": "",
        "failureReason": "",
        "fileParameter": "<fileParameter>"
      }
    ]
}

Result:

File TypeDescription
.jsonParameter(type)∈{0,1};The JSON file after the transfer process is completed
.zipParameter(type)=2;Extracted zip file which containing text, tables and images
Content
ParameterDescription
rectThe position of the object on the page
pageThe page number where the object is located
order_indexThe reading order position of the object on the current page
typeUsed to identify the type of the object. Currently supported object types are:

Text: Ordinary text type object, containing text content.
Image: Image type object, containing the path of the image.
Table and UnstdTable: Table type object, containing the content and structure of the table.
Catalogue: Catalogue type object, containing the content of the catalogue
List and UnorderedList: List type object, containing the content of the list
Formula: Formula type object, containing the content of the formula
Header: Header type object, containing the content of the header
Footer: Footer type object, containing the content of the footer
PageNumber: Page number type object, containing the content of the page number
FigureTitle: Figure title type object, containing the content of the figure title
FigureCaption: Figure caption type object, containing the content of the figure caption
json
{
    "version": "1.0.0",
    "objects": [
        {
            "type": "Header",
            "rect": [
                49.0,
                43.5,
                171.5,
                76.0
            ],
            "text": "Intelligent Document Parsing",
            "page": 0,
            "order_index": 0
        }
   ]
}

Asynchronous Request

If you need to use the file asynchronous processing flow, please read the Asynchronous Request Instructions.