Apache Tika
The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.
Apache Tika
Installation
Default install:
1
bash -c "$(wget -qLO - https://github.com/community-scripts/ProxmoxVE/raw/main/ct/apache-tika.sh)"
Configuration
Config file:
1
/opt/apache-tika/tika-config.xml
Notes
Configuration file is not created at install time. Example is at:
https://cwiki.apache.org/confluence/display/TIKA/TikaServer+in+Tika+2.xWeb Interface
Links
This post is licensed under CC BY 4.0 by the author.