admin管理员组

文章数量:1022982

I'm running Apache Tika Server in a docker container and trying to extract the text from PDFs contained in a password protected ZIP file.

I've tried passing the password in the HTTP header as 'Password' and 'X-Tika-Password', however all it does is list the files in the ZIP folder without extracting the text.

If I remove the password from the ZIP file then it extracts the text from the PDFs perfectly.

I've tried this:

curl --location --request PUT '127.0.0.1:9998/tika' \
--header 'Accept: text/plain' \
--header 'Password: 123456' \
--header 'Content-Type: application/zip' \
--data-binary '@file/path/to.zip'

And just get back plain text with:

Name Of First File.pdf
Name of Second FIle.pdf

I'm running Apache Tika Server in a docker container and trying to extract the text from PDFs contained in a password protected ZIP file.

I've tried passing the password in the HTTP header as 'Password' and 'X-Tika-Password', however all it does is list the files in the ZIP folder without extracting the text.

If I remove the password from the ZIP file then it extracts the text from the PDFs perfectly.

I've tried this:

curl --location --request PUT '127.0.0.1:9998/tika' \
--header 'Accept: text/plain' \
--header 'Password: 123456' \
--header 'Content-Type: application/zip' \
--data-binary '@file/path/to.zip'

And just get back plain text with:

Name Of First File.pdf
Name of Second FIle.pdf

本文标签: How to use Apache Tika Server with password protected filesStack Overflow