PDF to HTML Tool Guide
Note:Before learning how to use different functions, we recommend that read the Request Workflow to know a basic PDF processing process. When using different functions, you can set their own special parameters when uploading files. Other basic steps are consistent.
PDF to HTML:
{
"enableAiLayout": 1,
"isContainImg": 1,
"isContainAnnot": 1,
"enableOcr": 0,
"ocrLanguage": 8,
"pageRanges": "1,2,3-5",
"pageLayoutMode": "e_Flow",
"htmlOption": "e_SinglePage"
}
Required parameters
enableAiLayout
: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.
isContainImg
: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.
isContainAnnot
: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default 1.
enableOcr
: Whether to use OCR (0: Disable; 1: Enable). Default is 0.
ocrLanguage
: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.
pageRanges
: Specify page number conversion, starting from 1. Default is empty.
pageLayoutMode
: Specify layout mode. e_Box; e_Flow. Default is e_Flow.
htmlOption
: brief Html option. e_SinglePage
: Convert the entire PDF file into a single HTML file.; e_SinglePageWithBookmark
: Convert the PDF file into a single HTML file with an outline for navigation at the beginning of the HTML page.; e_MultiPage
: Convert the PDF file into multiple HTML files.; e_MultiPageWithBookmark
: Convert the PDF file into multiple HTML files. Each HTML file corresponds to a PDF page, and users can navigate to the next HTML file via a link at the bottom of the HTML page. Default is e_SinglePage
.
Example
Authentication
You need to replace and with accessToken in the publicKey and secretKey authentication getback values you get from the console.
curlcurl --location --request POST 'https://api-server.compdf.com/server/v1/oauth/token' \ --header 'Content-Type: application/json' \ --data-raw '{ "publicKey": "publicKey", "secretKey": "secretKey" }'
javaimport java.io.*; import okhttp3.*; public class main { public static void main(String []args) throws IOException{ OkHttpClient client = new OkHttpClient().newBuilder() .build(); MediaType mediaType = MediaType.parse("text/plain"); RequestBody body = RequestBody.create(mediaType, "{\n \"publicKey\": \"{{public_key}}\",\n \"secretKey\": \"{{secret_key}}\"\n}"); Request request = new Request.Builder() .url("https://api-server.compdf.com/server/v1/oauth/token") .method("POST", body) .build(); Response response = client.newCall(request).execute(); } }
Create Task
You need to replace with the accessToken which was obtained from the previous step, and replace with the language type you want to display the error information. After replacing them, you will get the taskId in the response data.
curlcurl --location --request GET 'https://api-server.compdf.com/server/v1/task/pdf/html' \ --header 'Authorization: Bearer accessToken'
javaimport java.io.*; import okhttp3.*; public class main { public static void main(String []args) throws IOException{ OkHttpClient client = new OkHttpClient().newBuilder() .build(); MediaType mediaType = MediaType.parse("text/plain"); RequestBody body = RequestBody.create(mediaType, ""); Request request = new Request.Builder() .url("https://api-server.compdf.com/server/v1/task/pdf/html?language={{language}}") .method("GET", body) .addHeader("Authorization", "Bearer {{accessToken}}") .build(); Response response = client.newCall(request).execute(); } }
Upload Files
Replace with the file you want to convert, with the taskId obtained in the previous step, with the language type you want to display the error information, and with the accessToken obtained in the first step.
curlcurl --location --request POST 'https://api-server.compdf.com/server/v1/file/upload' \ --header 'Authorization: Bearer accessToken' \ --form 'file=@"test.pdf"' \ --form 'taskId="taskId"' \ --form 'password=""' \ --form 'parameter="{ \"pageOptions\": 1 , \"isContainAnnot\": 1 , \"isContainImg\":0,\"isAllowOcr\":0,\"isContainOcrBg\":0,\"isOnlyAiTable\":0}"' \ --form 'language=""'
javaimport java.io.*; import okhttp3.*; public class main { public static void main(String []args) throws IOException{ OkHttpClient client = new OkHttpClient().newBuilder() .build(); MediaType mediaType = MediaType.parse("text/plain"); RequestBody body = new MultipartBody.Builder().setType(MultipartBody.FORM) .addFormDataPart("file","{{file}}", RequestBody.create(MediaType.parse("application/octet-stream"), new File("<file>"))) .addFormDataPart("taskId","{{taskId}}") .addFormDataPart("language","{{language}}") .addFormDataPart("password","") .addFormDataPart("parameter","{ \"pageOptions\": \"1\" , \"isContainAnnot\": \"1\" , \"isContainImg\": \"1\"}") .build(); Request request = new Request.Builder() .url("https://api-server.compdf.com/server/v1/file/upload") .method("POST", body) .addHeader("Authorization", "Bearer {{accessToken}}") .build(); Response response = client.newCall(request).execute(); } }
Process Files
Replace with the taskId you obtained from the Create task, and with the accessToken obtained in the first step, and replace with the language type you want to display the error information.
curlcurl --location -g --request GET 'https://api-server.compdf.com/server/v1/execute/start?taskId=taskId' \ --header 'Authorization: Bearer accessToken'
javaimport java.io.*; import okhttp3.*; public class main { public static void main(String []args) throws IOException{ OkHttpClient client = new OkHttpClient().newBuilder() .build(); MediaType mediaType = MediaType.parse("text/plain"); RequestBody body = RequestBody.create(mediaType, ""); Request request = new Request.Builder() .url("https://api-server.compdf.com/server/v1/execute/start?taskId={{taskId}}&language={{language}}") .method("GET", body) .addHeader("Authorization", "Bearer {{accessToken}}") .build(); Response response = client.newCall(request).execute(); } }
Get Task Information
Replace with you from Create the task obtained in the taskId, replaced by access_token obtained in the first step.
curlcurl --location -g --request GET 'https://api-server.compdf.com/server/v1/task/taskInfo?taskId=taskId' \ --header 'Authorization: Bearer accessToken'
javaimport java.io.*; import okhttp3.*; public class main { public static void main(String []args) throws IOException{ OkHttpClient client = new OkHttpClient().newBuilder() .build(); MediaType mediaType = MediaType.parse("text/plain"); RequestBody body = RequestBody.create(mediaType, ""); Request request = new Request.Builder() .url("https://api-server.compdf.com/server/v1/task/taskInfo?taskId={{taskId}}") .method("GET", body) .addHeader("Authorization", "Bearer {{accessToken}}") .build(); Response response = client.newCall(request).execute(); } }
Result
File Type | Description |
---|---|
.zip | The HTML folder after the transfer process is completed |