Apache Tika
The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.
Apache Tika
Installation
Default install:
1
bash -c "$(curl -fsSL https://raw.githubusercontent.com/community-scripts/ProxmoxVE/main/ct/apache-tika.sh)"
Configuration
Config file:
1
/opt/apache-tika/tika-config.xml
Notes
Configuration file is not created at install time. Example is at:
https://cwiki.apache.org/confluence/display/TIKA/TikaServer+in+Tika+2.x