|
|
TIKA-2891
|
ForkClient "fillBootstrapJar()" lack few "MANIFEST.MF" properties
|
Unassigned
|
Quentin Laville
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2884
|
Tika Parse - Null Pointer
|
Unassigned
|
Ravi
|
|
Closed |
Invalid
|
|
|
|
|
|
|
TIKA-2877
|
Tika 1.20 suffer from 3 separate CVE vulnerabilities
|
Tim Allison
|
Pat cashman
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2873
|
Some password protected xlsx files no longer open with password
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2869
|
Can't parse pdf in version 1.20 - Pkcs7Parser (DEF length 465542 object truncated by 465479)
|
Tim Allison
|
Edans Sandes
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2868
|
Fix DL4JVGG16Net to work with dl4j-beta3
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2867
|
EpubParser -- add check for null zipEntry
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2866
|
EpubParser should allow .htm
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2865
|
Parameterize minConfidence for csv detection; bump default higher
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2864
|
Fix regression in RFC822 parsing time
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2863
|
Add comparison reports for time to process per mime
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2854
|
upgrade out-of-date dependencies with outstanding CVEs
|
Unassigned
|
Andrew Pavlin
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2852
|
Add reports for missing/unaligned files in tika-eval Compare mode
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2849
|
TikaInputStream copies the input stream locally
|
Tim Allison
|
Boris Petrov
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2846
|
Add per page unicode mapping stats to the metadata in the PDFParser
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2845
|
Override ProcessPages in PDFTextStripper
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2841
|
Improve robustness of parsers of zip-based files on truncated files
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2840
|
windows batch file not detected
|
Tim Allison
|
chandra
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2838
|
RTF document processing glues comment fields together with text without whitespace
|
Tim Allison
|
Karl Wright
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2836
|
Tika core API
|
Tim Allison
|
chandra
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2835
|
Upgrade to PDFBox 2.0.15 when available
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2834
|
Upgrade to PDFBox 2.0.14 when available
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2827
|
Improve tika-eval comparison reports to include mime types in A and B for diffs
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2826
|
Add a csv/tsv parser
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2825
|
Make interrupter in tika-batch's child process actually optional
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2823
|
Remove printstacktrace in XMLReaderUtils
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2822
|
Update common tokens files for tika-eval
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2819
|
Update jaxb & activation
|
Tim Allison
|
Hans Brende
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2816
|
Error when sending request to /tika with header X-Tika-OCRMinFileSizeToOcr
|
Tim Allison
|
Anssi Törmä
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2810
|
Back off to tagsoup when xml parser fails on Tika xhtml in tika-eval
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2809
|
Add reports for structure tags to tika-eval
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2807
|
.docx text extract leaves out rich text content-control inside of a text box
|
Tim Allison
|
Claudia Mickiewicz
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2801
|
Tika includes 2 vulnerable components
|
Tim Allison
|
Maxim Solodovnik
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2765
|
Regression extracting text from corrupted docx files
|
Tim Allison
|
Luís Filipe Nassif
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2756
|
Switch to commons-lang 3
|
Tim Allison
|
Robert Munteanu
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2726
|
Handle truncated ooxml more robustly
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Duplicate
|
|
|
|
|
|
|
TIKA-2601
|
Invalid XHTML output (overlapping a and formatting tags) for some WORD documents
|
Konstantin Gribov
|
Filip
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-2555
|
Text with [underline] + [another format] in word document generates overlapping html tags.
|
Konstantin Gribov
|
Serban Alexe
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2310
|
Try to order chapters in epub correctly
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|