|
|
TIKA-3890
|
Identifying an efficient approach for getting page count prior to running an extraction
|
Unassigned
|
Ethan Wilansky
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-3880
|
Tika not picking-up setByteArrayMaxOverride from tika-config
|
Unassigned
|
Ethan Wilansky
|
|
Closed |
Resolved
|
|
|
|
|
|
|
TIKA-3867
|
Add pipes reporter that updates stats in a file on disk
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3866
|
Update to PDFBox 2.0.27
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3865
|
Add a composite PipesReporter
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3863
|
Add a pipes reporter for OpenSearch
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3860
|
Pull tesseract 5 in our docker image for the next 2.x release
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3859
|
Wrong filename glob for Zstandard
|
Tim Allison
|
Robin Schimpf
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3856
|
Upgrade to jempbox 1.8.17
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3855
|
Implement upsert for OpenSearch emitter
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3854
|
Bump main's development version to 2.5.0-SNAPSHOT
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3853
|
Enable configuring digests via autodetectparserconfig
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3852
|
Extract signature info from PDFs
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3851
|
Add detection for e57
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3849
|
Throw UnsupportedFormatException or similar for really old mdb files
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3848
|
IllegalArgumentException in DBFColumnHeader.setType()
|
Unassigned
|
Tilman Hausherr
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3847
|
NullPointerException when processing pdf document(Allow proceed on RuntimeException)
|
Tilman Hausherr
|
Yurii
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3846
|
Improve JDBC emitter to handle attachments and batch updates
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3845
|
Add a callable wrapper for the pipesiterator
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3844
|
Improve extraction of PDF subset info
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3843
|
use commons-io byte array streams
|
Unassigned
|
PJ Fanning
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3842
|
Revert slf4j core back to 1.x?
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3840
|
Add extraction of ODF version
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3839
|
Property com.ctc.wstx.maxEntityCount is not supported
|
Unassigned
|
Lakatos Gyula
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3836
|
Add initial jdbc emitter
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3833
|
bzip2 MIME type is detected as bzip instead when using tika-core
|
Unassigned
|
Eduardas Kazakas
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3832
|
Required array length is too large (OOM) error when reading a PDF file
|
Unassigned
|
Lakatos Gyula
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3831
|
Allow for retries in S3Fetcher
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3825
|
ForkParser allow shutdown
|
Unassigned
|
Ben Gilbert
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3824
|
RegexCaptureParser should add metadata items, not set
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3820
|
Kafka Tika Pipes Support
|
Unassigned
|
Nicholas DiPiazza
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3818
|
Remove pdfdebugger from tika (2)
|
Tilman Hausherr
|
Tilman Hausherr
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3815
|
Inconsistent Date/Time information extracted from Exif data
|
Luís Filipe Nassif
|
Luís Filipe Nassif
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3812
|
Parser Order: image get parsed by GDALParser instead of TesseractOCRParser
|
Unassigned
|
Eugen Caruntu
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3810
|
Vtt file (encoding UTF-8 with BOM) seen as text/plain
|
Unassigned
|
Giorgiana Ciobanu
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3804
|
Improve configurability of renderers in the PDFParser
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3800
|
Consider wrapping 'unrar' commandline executable as a parser to handle rar v5
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3799
|
Refactor FuzzingCLI to use PipesParser
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3796
|
IncludeHeadersAndFooters is not being passed through via tika-config to the MSOffice parser
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3795
|
General upgrades for 2.5.0
|
Unassigned
|
Tilman Hausherr
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3794
|
ocrImageType is not configurable via headers in tika-server
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3767
|
Use junit's @TempDir where possible
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|