Skip to content

Intelligent Document Parsing Tool Guide

Output .Json format file

Note:Before learning how to use different functions, we recommend that read the Request Workflow to know a basic PDF processing process. When using different functions, you can set their own special parameters when uploading files. Other basic steps are consistent.

PDF to JSON:

java
{
        "enableAiLayout": 1,
        "isContainImg": 1,
        "isContainAnnot": 1,
        "enableOcr": 0,
        "ocrLanguage": 8,
        "pageRanges": "1,2,3-5",
        "resolveType": "EXTRACT"
        }

Required parameters

enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.

isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.

isContainAnnot: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default is 1.

enableOcr: Whether to use OCR (0: Disable; 1: Enable). Default is 0.

ocrLanguage: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.

pageRanges: Specify page number conversion, starting from 1. Default is empty.

resolveType: Extract JSON content type. TEXT; TABLE; EXTRACT; IMAGE. Default EXTRACT.

Java Example:

Replace apiKey with the publicKey obtained from the dashboard, file with the file you want to convert, and language with your preferred interface error prompt language type.

java
import java.io.*;
import okhttp3.*;
public class main {
   public static void main(String []args) throws IOException{
      OkHttpClient client = new OkHttpClient().newBuilder()
              .build();
      MediaType mediaType = MediaType.parse("text/plain");
      RequestBody body = new MultipartBody.Builder().setType(MultipartBody.FORM)
              .addFormDataPart("file","{{file}}",
                      RequestBody.create(MediaType.parse("application/octet-stream"),
                              new File("<file>")))
              .addFormDataPart("language","{{language}}")
              .addFormDataPart("password","")
              .addFormDataPart("parameter","{   \"enableOcr\":1}")
              .build();
      Request request = new Request.Builder()
              .url("https://api-server.compdf.com/server/v1/process/pdf/json")
              .method("POST", body)
              .addHeader("x-api-key", "{{apiKey}}")
              .build();
      Response response = client.newCall(request).execute();
   }
}

Result:

File TypeDescription
.jsonParameter(type)∈{0,1};The JSON file after the transfer process is completed
.zipParameter(type)=2;Extracted zip file which containing text, tables and images
Content
ParameterDescription
rectThe position of the object on the page
pageThe page number where the object is located
order_indexThe reading order position of the object on the current page
typeUsed to identify the type of the object. Currently supported object types are:

Text: Ordinary text type object, containing text content.
Image: Image type object, containing the path of the image.
Table and UnstdTable: Table type object, containing the content and structure of the table.
Catalogue: Catalogue type object, containing the content of the catalogue
List and UnorderedList: List type object, containing the content of the list
Formula: Formula type object, containing the content of the formula
Header: Header type object, containing the content of the header
Footer: Footer type object, containing the content of the footer
PageNumber: Page number type object, containing the content of the page number
FigureTitle: Figure title type object, containing the content of the figure title
FigureCaption: Figure caption type object, containing the content of the figure caption
java
{
    "version": "1.0.0",
    "objects": [
        {
            "type": "Header",
            "rect": [
                49.0,
                43.5,
                171.5,
                76.0
            ],
            "text": "Intelligent Document Parsing",
            "page": 0,
            "order_index": 0
        }
   ]
}