|
|
TIKA-3198
|
extracting ppt with chart give excel in which data is missing
|
Unassigned
|
sagar
|
|
Open |
Unresolved
|
|
|
|
|
|
|
TIKA-2635
|
Require imageMagick path be specified on Windows OS
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2634
|
Upgrade Jackson to 2.9.5
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2620
|
Set sys property to get better rendering speed by default
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2618
|
LabelRecord and LabelSSTRecord text can be overwritten in xls
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2617
|
Ignore NPOIFS IOOBE in PPT attachments
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2616
|
message/news now incorrectly identified as rfc822
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2614
|
RFC822 treats non-multipart as attachment
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2613
|
Tesseract 4.0 has removed -psm, so Tika must update
|
Unassigned
|
Ewan Mellor
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2607
|
TIKA-2579
Exchange levigo-jbig2-imageio with pdfbox-jbig2-imageio:3.0.0
|
Unassigned
|
Andreas Meier
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2604
|
Error with certain jar paths on OS X
|
Tim Allison
|
Sasha Goodman
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2600
|
Don't use md5 checksum due to changes to the release distribuition policy
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2598
|
Fix dependency convergence
|
Tim Allison
|
Guillaume Smet
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2594
|
Mail detected as application/xhtml+xml
|
Unassigned
|
Andreas Meier
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2592
|
HTML with charset unicode handled as utf-16 instead utf-8
|
Unassigned
|
Andreas Meier
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2591
|
Some tiffs (Big Endian with fax compression) are showing up as x-tarr
|
Unassigned
|
daniel schmidt
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2590
|
ExcelExtractor: cannot choose listening to the selected records only
|
Unassigned
|
Grigoriy Alekseev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2588
|
Tika detecting/parsing pptx with embedded Excel worksheet(s)...
|
Tim Allison
|
Brian McColgan
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-2587
|
DKIM signed mails recognized as text/plain
|
Unassigned
|
Andreas Meier
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2585
|
TikaInputStream support for resetting via a factory of InputStreams
|
Unassigned
|
Nick Burch
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2584
|
Tika should have a way to pass arbitrary Tesseract options
|
Unassigned
|
Ewan Mellor
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2582
|
Tesseract 4.0 includes a FF character by default, breaking parsers
|
Unassigned
|
Ewan Mellor
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2580
|
SafeContentHandler documentation is incorrect about replacement character
|
Unassigned
|
Ewan Mellor
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2579
|
Update to PDFBox 2.0.9 when available
|
Tim Allison
|
David Pilato
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-2578
|
Mails not recognized when unknown X-headers are present
|
Tim Allison
|
Andreas Meier
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2576
|
Add application/zstd detection and parser
|
Unassigned
|
Andreas Meier
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2571
|
Swallows security exception and returns null
|
Unassigned
|
Nik Everett
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2570
|
Tika 1.17 uses vulnerable Jackson version 2.9.2
|
Unassigned
|
Julian Reschke
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2569
|
Grouped Text boxes in .ppt
|
Tim Allison
|
Richard A
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2568
|
Full encrypted 7Z file not detected as such
|
Luís Filipe Nassif
|
Luís Filipe Nassif
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2567
|
Tika mistakenly determines mimetype of .min.js file as matlab
|
Unassigned
|
Anto
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2564
|
Tika client cannot extract files from embedded archive formats
|
Tim Allison
|
Marc Prud'hommeaux
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2563
|
Extract embedded objects in HTML and javascript
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2561
|
Tika Parser includes oudated/vulnerable version of JSoup
|
Unassigned
|
Asela
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2559
|
Expose language metadata from PDF documents
|
Unassigned
|
Matt Sheppard
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2556
|
org.json package clash
|
Unassigned
|
Andrei Rebegea
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2547
|
RFC822 w multipart/mixed first text element should be treated as body, not attachment
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2541
|
Referenced version of Apache SIS (org.apache.sis) is branch EOL
|
Unassigned
|
Richard Jones
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2535
|
Use latest org.opengis:geoapi to avoid rejected/EOL'd jsr-275 dependency
|
Tim Allison
|
Richard Jones
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2528
|
Fix key location, keys file and download link
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2527
|
Typos in tika-mimetypes.xml
|
Unassigned
|
Andreas Meier
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2524
|
Create/integrate a parser for XPS
|
Tim Allison
|
Peter Davies
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2509
|
TesseractOCRParser ignores configured ImageMagickPath in processImage method
|
Dave Meikle
|
Richard Jones
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2390
|
Extract images embedded in Html
|
Unassigned
|
Luís Filipe Nassif
|
|
Resolved |
Duplicate
|
|
|
|
|
|
|
TIKA-2338
|
Change Scope of Jai-ImageIO-Core dependency
|
Luís Filipe Nassif
|
Luís Filipe Nassif
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1191
|
ForkParser / ClassLoaderProxy does not define package
|
Unassigned
|
Nicolas Belisle
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-879
|
Detection problem: message/rfc822 file is detected as text/plain.
|
Unassigned
|
Konstantin Gribov
|
|
Closed |
Duplicate
|
|
|
|
|