Intelligent Document Parsing Tool Guide
Output .Json format file
Note:Before learning how to use different functions, we recommend that read the Request Workflow to know a basic PDF processing process. When using different functions, you can set their own special parameters when uploading files. Other basic steps are consistent.
PDF to JSON:
{
"enableAiLayout": 1,
"isContainImg": 1,
"isContainAnnot": 1,
"enableOcr": 0,
"ocrLanguage": 8,
"pageRanges": "1,2,3-5",
"resolveType": "EXTRACT"
}
Required parameters
enableAiLayout
: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.
isContainImg
: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.
isContainAnnot
: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default is 1.
enableOcr
: Whether to use OCR (0: Disable; 1: Enable). Default is 0.
ocrLanguage
: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.
pageRanges
: Specify page number conversion, starting from 1. Default is empty.
resolveType
: Extract JSON content type. TEXT; TABLE; EXTRACT; IMAGE. Default EXTRACT.
Java Example:
Replace apiKey with the publicKey obtained from the dashboard, file with the file you want to convert, and language with your preferred interface error prompt language type.
import java.io.*;
import okhttp3.*;
public class main {
public static void main(String []args) throws IOException{
OkHttpClient client = new OkHttpClient().newBuilder()
.build();
MediaType mediaType = MediaType.parse("text/plain");
RequestBody body = new MultipartBody.Builder().setType(MultipartBody.FORM)
.addFormDataPart("file","{{file}}",
RequestBody.create(MediaType.parse("application/octet-stream"),
new File("<file>")))
.addFormDataPart("language","{{language}}")
.addFormDataPart("password","")
.addFormDataPart("parameter","{ \"enableOcr\":1}")
.build();
Request request = new Request.Builder()
.url("https://api-server.compdf.com/server/v1/process/pdf/json")
.method("POST", body)
.addHeader("x-api-key", "{{apiKey}}")
.build();
Response response = client.newCall(request).execute();
}
}
Result:
File Type | Description |
---|---|
.json | Parameter(type)∈{0,1};The JSON file after the transfer process is completed |
.zip | Parameter(type)=2;Extracted zip file which containing text, tables and images |
Content |
Parameter | Description |
---|---|
rect | The position of the object on the page |
page | The page number where the object is located |
order_index | The reading order position of the object on the current page |
type | Used to identify the type of the object. Currently supported object types are: Text: Ordinary text type object, containing text content. Image: Image type object, containing the path of the image. Table and UnstdTable: Table type object, containing the content and structure of the table. Catalogue: Catalogue type object, containing the content of the catalogue List and UnorderedList: List type object, containing the content of the list Formula: Formula type object, containing the content of the formula Header: Header type object, containing the content of the header Footer: Footer type object, containing the content of the footer PageNumber: Page number type object, containing the content of the page number FigureTitle: Figure title type object, containing the content of the figure title FigureCaption: Figure caption type object, containing the content of the figure caption |
{
"version": "1.0.0",
"objects": [
{
"type": "Header",
"rect": [
49.0,
43.5,
171.5,
76.0
],
"text": "Intelligent Document Parsing",
"page": 0,
"order_index": 0
}
]
}