Intelligent Document Parsing Tool Guide
Note:Before learning how to use different functions, we recommend that read the Request Workflow to know a basic PDF processing process. When using different functions, you can set their own special parameters when uploading files. Other basic steps are consistent.
Intelligent Document Parsing:
{
"getImage": "both",
"isAllowOcr": 1,
"imageOutputType": "base64str"
}
Required Parameters:
getImage
: Image extraction type: page returns the entire page image for each page, objects returns image objects within the page, both returns both entire page images and image objects
isAllowOcr
: Whether to use OCR (0: Disable; 1: Enable).
imageOutputType
: Image storage type: base64str, url base64str: Images are returned directly in base64 format in the API result (this method can result in large response volumes, not recommended for long documents). url: Images are returned as platform links, which you can download to your local storage or upload to your cloud storage.
Java Example:
You need to replace apiKey with the publicKey obtained from the console, file with the file you want to convert, and language with the desired interface error prompt language type.
import java.io.*;
import okhttp3.*;
public class main {
public static void main(String []args) throws IOException{
OkHttpClient client = new OkHttpClient().newBuilder()
.build();
MediaType mediaType = MediaType.parse("text/plain");
RequestBody body = new MultipartBody.Builder().setType(MultipartBody.FORM)
.addFormDataPart("file","{{file}}",
RequestBody.create(MediaType.parse("application/octet-stream"),
new File("<file>")))
.addFormDataPart("language","{{language}}")
.addFormDataPart("password","")
.addFormDataPart("parameter","{ \"getImage\": \"objects\",\"isAllowOcr\":1,\"imageOutputType\":\"url\"}")
.build();
Request request = new Request.Builder()
.url("https://api-server.compdf.com/server/v1/process/idp/documentParsing")
.method("POST", body)
.addHeader("x-api-key", "{{apiKey}}")
.build();
Response response = client.newCall(request).execute();
}
}
Result:
File Type | File Description |
---|---|
.json | JSON file with intelligent document parsing completed |
Return Data Structure Explanation:
◆ code (integer) Operation status code
◆ message (string) Description message
◆ version (string) Version number
◆ duration (integer) Total processing time (in milliseconds)
◆ x_request_id (string) Request ID
◆ image_process (array) Whether there is a watermark
◆ msg (string) Description message
◆ result (object) Core data
├─ markdown (string) Markdown-formatted text of the entire document
├─ total_count (integer) Total number of pages in the PDF document
├─ total_page_number (integer) Total number of pages in the PDF document
├─ success_count (integer) Total number of successfully processed pages
├─ total_count (integer) Total number of pages in the PDF document
├─ valid_page_number (integer) Number of successfully parsed valid pages
├─ excel_base64 (string) Excel file base64 encoding
├─ catalog (object) Table of contents tree structure
│ └─ toc (array)
│ ├─ pos (array): Coordinates of the four corners of the directory area, in order: left-top, right-top, right-bottom, left-bottom.
│ ├─ paragraph_id (integer): ID of the paragraph where the title is located
│ ├─ page_id (integer): Page number where the title is located (minimum page number is 1)
│ ├─ hierarchy (integer): Title level, 1 for level 1 title, 2 for level 2 title, and so on
│ ├─ pos_list (array): When title merging occurs, the coordinates of multiple titles before merging. When no title merging occurs, the coordinates of the title.
│ ├─ title (string): Title content
│ └─ sub_type (string): Title type: text_title, image_title, table_title
│
├─ pages (array) Paginated data container
│ ├─ status (string): Page processing status/error message
│ ├─ page_id (number): Current page number
│ ├─ durations (number): Page processing time (milliseconds)
│ ├─ image_id (string): Image address
│ ├─ width (integer): Document page width (pixels)
│ ├─ height (integer): Document page height (pixels)
│ ├─ angle (integer): Text orientation angle (0°: ▲ (upright)/90°: ▶ (right rotation)/180°: ▼ (inverted)/270°: ◀ (left rotation))
│ ├─ content (array): Basic data: text lines or images, refer to textline and image descriptions
│ └─ structured (array): Structured data, one of textblock, table, imageblock, footer, header
│
└─ detail (array) Markdown detailed information (structure reused "paragraph data" model)
├─ page_id (integer): Current paragraph page number
├─ paragraph_id (integer): Current paragraph ID
├─ outline_level (integer): Title level: (up to 5 levels supported) -1. Body text 0. Level 1 title 1. Level 2 title ...
├─ text (string): Text
├─ type (string): Type, paragraph (paragraph type, including body text, titles, formulas, etc.), image (image type), table (table type)
├─ image_url (string): Image address
├─ content (integer): Content type 0 Body text (paragraph, image, table) 1 Non-body text (header, footer, sidebar)
├─ position (array): Coordinates of the four corners of the directory area, in order: left-top, right-top, right-bottom, left-bottom.
├─ sub_type (string): Subtype. When type is paragraph, possible values are catalog (table of contents), header (page header), footer (page footer), sidebar (sidebar), text (body text), text_title (text title), image_title (image title), table_title (table title); when type is image, possible values are stamp (seal), chart (chart), qrcode (QR code), barcode (barcode); when type is table, possible values are bordered (bordered table), borderless (borderless table).
├─ tags (array): Indicates whether there are special texts within the paragraph, including formula and handwritten.
│─ cells (array): Cell array, returned only when type is table
│ ├─ row_span (integer): Cell row span, default is 1
│ ├─ text (integer):
│ ├─ type (integer):
│ ├─ col (integer): Cell column number
│ ├─ col_span (integer): Cell column span, default is 1
│ ├─ page_id (integer):
│ ├─ position (integer): Coordinates of the four corners of the cell, in order: left-top, right-top, right-bottom, left-bottom.
│ └─ row (integer): Cell row number
│
└─ caption_id (object): Original OCR text result
├─ page_id (integer): Page number where the title is located
└─ paragraph_id (integer): Paragraph ID where the title is located
◆ metrics (array) Page-level performance metrics
├─ page_image_width (integer): Current page rendering width (pixels)
├─ page_image_height (integer): Current page rendering height (pixels)
├─ dpi (integer): Image resolution
├─ durations (number): Page processing time (milliseconds)
├─ status (string): Page processing status
├─ page_id (number): Current page number
├─ angle (integer): Text orientation angle (0°: ▲ (upright)/90°: ▶ (right rotation)/180°: ▼ (inverted)/270°: ◀ (left rotation))
└─ image_id (string): Page image ID (download method same as pages.image_id)
Structured Data Specification:
Content (Text Line/Image)
Image Data
Parameter | Type | Description |
---|---|---|
id | integer | Data ID |
type | string | Data type (fixed value: image) |
pos | array | Text line four corner coordinates Format: [top-left (x,y), top-right (x,y), bottom-right (x,y), bottom-left (x,y)] |
size | array | Image dimensions [width, height] |
data | object | Image content object |
↳ data.region | array | Image region coordinates on the page |
↳ data.path | string | Image file path |
↳ data.base64 | string | Image file (jpg/png) base64 string |
Textline Data
Parameter | Type | Description |
---|---|---|
id | integer | Data ID (unique within the page) |
type | string | Data type (fixed value: line) |
text | string | Text line content (When sub_type=stamp, it is the seal text) |
pos | array | Text line four corner coordinates |
score | number | Character confidence (Generated only when OCR is performed on the input image) |
Structured Data
Textblock
Parameter | Type | Description |
---|---|---|
id | integer | Data ID |
type | string | Block type (fixed value: textblock) |
pos | array | Text block four corner coordinates |
content | array | Contained text line ID array |
sub_type | string | Subtype (title/list/formula, etc.) |
text | string | Block text content |
outline_level | integer | Title level: -1 =Body text, 0 =Level 1 title, 1 =Level 2 title... (Up to five levels supported) |
Table Data
Parameter | Type | Description |
---|---|---|
id | integer | Data ID |
type | string | Block type (fixed value: table) |
sub_type | string | Table type (Default value: bordered, borderless tables need special marking) |
pos | array | Table four corner coordinates |
rows | integer | Total number of rows |
cols | integer | Total number of columns |
columns_width | array | Column width array |
rows_height | array | Row height array |
text | string | Table content (HTML/Markdown format) |
Imageblock
Parameter | Type | Description |
---|---|---|
id | integer | Data ID |
type | string | Block type (fixed value: image) |
pos | array | Image block four corner coordinates |
text | string | Image annotation text (HTML/Markdown format) |
image_url | string | Image file path |
base64str | string | Image base64 encoded string |
Footer Block
Parameter | Type | Description |
---|---|---|
type | string | Block type (fixed value: footer) |
pos | array | Block four corner coordinates |
blocks | array | Content block array (Can contain textblock/imageblock/table) |
Header Block
Parameter | Type | Description |
---|---|---|
type | string | Block type (fixed value: header) |
pos | array | Block four corner coordinates |
image_url | string | Header image path |
base64str | string | Header image base64 encoding |
blocks | array | Content block array (Can contain textblock/imageblock/table) |