Intelligent Document Parsing Tool Guide

Output .Json format file

Note：Before learning how to use different functions, we recommend that read the Request Workflow to know a basic PDF processing process. When using different functions, you can set their own special parameters when uploading files. Other basic steps are consistent.

PDF to JSON：

java

{
        "enableAiLayout": 1,
        "isContainImg": 1,
        "isContainAnnot": 1,
        "enableOcr": 0,
        "ocrLanguage": 8,
        "pageRanges": "1,2,3-5",
        "resolveType": "EXTRACT"
        }

Required parameters

enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.

isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.

isContainAnnot: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default is 1.

enableOcr: Whether to use OCR (0: Disable; 1: Enable). Default is 0.

ocrLanguage: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.

pageRanges: Specify page number conversion, starting from 1. Default is empty.

resolveType: Extract JSON content type. TEXT; TABLE; EXTRACT; IMAGE. Default EXTRACT.

Java Example:

Replace apiKey with the publicKey obtained from the dashboard, file with the file you want to convert, and language with your preferred interface error prompt language type.

java

import java.io.*;
import okhttp3.*;
public class main {
   public static void main(String []args) throws IOException{
      OkHttpClient client = new OkHttpClient().newBuilder()
              .build();
      MediaType mediaType = MediaType.parse("text/plain");
      RequestBody body = new MultipartBody.Builder().setType(MultipartBody.FORM)
              .addFormDataPart("file","{{file}}",
                      RequestBody.create(MediaType.parse("application/octet-stream"),
                              new File("<file>")))
              .addFormDataPart("language","{{language}}")
              .addFormDataPart("password","")
              .addFormDataPart("parameter","{   \"enableOcr\":1}")
              .build();
      Request request = new Request.Builder()
              .url("https://api-server.compdf.com/server/v1/process/pdf/json")
              .method("POST", body)
              .addHeader("x-api-key", "{{apiKey}}")
              .build();
      Response response = client.newCall(request).execute();
   }
}

Result:

File Type	Description
.json	Parameter(type)∈{0，1}；The JSON file after the transfer process is completed
.zip	Parameter(type)=2；Extracted zip file which containing text, tables and images
Content

Parameter	Description
rect	The position of the object on the page
page	The page number where the object is located
order_index	The reading order position of the object on the current page
type	Used to identify the type of the object. Currently supported object types are: Text: Ordinary text type object, containing text content. Image: Image type object, containing the path of the image. Table and UnstdTable: Table type object, containing the content and structure of the table. Catalogue: Catalogue type object, containing the content of the catalogue List and UnorderedList: List type object, containing the content of the list Formula: Formula type object, containing the content of the formula Header: Header type object, containing the content of the header Footer: Footer type object, containing the content of the footer PageNumber: Page number type object, containing the content of the page number FigureTitle: Figure title type object, containing the content of the figure title FigureCaption: Figure caption type object, containing the content of the figure caption

java

{
    "version": "1.0.0",
    "objects": [
        {
            "type": "Header",
            "rect": [
                49.0,
                43.5,
                171.5,
                76.0
            ],
            "text": "Intelligent Document Parsing",
            "page": 0,
            "order_index": 0
        }
   ]
}

Intelligent Document Parsing Tool Guide ​

Output .Json format file ​

Intelligent Document Parsing Tool Guide

Output .Json format file