Post

Apache Tika

The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

Apache Tika

Installation

Default install:

1
bash -c "$(wget -qLO - https://github.com/community-scripts/ProxmoxVE/raw/main/ct/apache-tika.sh)"
CPU: 1 cores RAM: 2024 MB Disk: 10 GB OS: Debian 12

Configuration

Config file:

1
/opt/apache-tika/tika-config.xml

Notes

Configuration file is not created at install time. Example is at: https://cwiki.apache.org/confluence/display/TIKA/TikaServer+in+Tika+2.x

Web Interface

Port: 9998

This post is licensed under CC BY 4.0 by the author.