What is Tika Tuesdays?
Over the past few months I’ve finally accomplished the long time personal goal of being able to easily search PDF documents with in context hit highlighting using only open source projects with minimal coding effort! I believe this is a common use case that many different organizations could benefit from, but to date has been too difficult for most organizations to embrace.
This short series in the run up to Christmas 2019 will share some of the lessons (both good and bad!) that I’ve learned. I will publish a new blog post every Tuesday, and link them all back to this one as they are published.
To get you started, here is one inspired by the Solr mailing list that provoked some eye rolling from long time Tika folks:
- It’s okay to run Tika (and Tesseract!) inside of Solr 😉 If and Only If….
- Using Tika and Tesseract outside of Solr
- Using Tika and Tesseract as an API exposed by Solr
- Tesseract 3 and Tika
- Parsing Tika & Tesseract formatted HOCR output inside Solr ingestion pipeline
- All the different ways of deploying Tika-Server