|
|
TIKA-3478
|
Extract "desc" metadata field from AppleUserBox in MP4
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3477
|
Fix new closed channel exception in MSOffice files in 2.x
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3476
|
Remove tag reports from default tika-eval reports
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3475
|
General upgrades for 2.0.0
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3474
|
tika-eval in 2.x should handle the exception key from 1.x
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3473
|
Upgrade OpenSearch -- 1.0 GA is now available
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3472
|
SimpleDateFormat is not threadsafe
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3470
|
Push jpeg2000 warning to trigger only when necessary
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3469
|
Consume bytes until 'ready' ping to forked pipes processor
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3467
|
Clean up poms in main
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3463
|
Add FileListIterator as a pipes-iterator
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3462
|
Clean up module names
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3461
|
Create sub modules in tika-pipes-integration tests
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3449
|
Remove sannies mp4 isoparser from Tika 2.x
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3441
|
tika server stuck in loop trying to bind
|
Unassigned
|
Cristian Zamfir
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3440
|
Add emitter for OpenSearch
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3436
|
Add multi-release for 2.x
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3435
|
Allow fetchers only when enableUnsecureFeatures is true in tika-server 2.x
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3434
|
Document removal of urlenabledinputstream in 2.x
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3430
|
Create release subdirectories for different versions
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3424
|
tika-app in 2.x should log to stderr
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3413
|
Avoid ZipBomb detection in bookmark text extraction in PDFs
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3410
|
Clean up logging in PipesServer
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3406
|
Add timeout on the client side of async processor
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3403
|
Create example for Transcription
|
Lewis John McGibbney
|
Lewis John McGibbney
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3402
|
Remove Redundant Local Variables
|
Unassigned
|
Furkan Kamaci
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3401
|
Remove Pointless Bitwise Expressions
|
Unassigned
|
Furkan Kamaci
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3399
|
Fix Non-Atomic Operations on Volatile Fields
|
Unassigned
|
Furkan Kamaci
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3398
|
Tidy Up Code for Performance Improvements
|
Unassigned
|
Furkan Kamaci
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3396
|
Rename parser modules in 2.0
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3395
|
Make Inner Classes Static If Possible to Prevent Memory Leaks
|
Unassigned
|
Furkan Kamaci
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3393
|
Refactor metadata filters to use new ConfigBase in 2.x
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3391
|
Refactor fetchiterators to pipesinterators in 2.x, clean up pipesiteratormanager
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3390
|
Migrate Language Level to Java 8
|
Unassigned
|
Furkan Kamaci
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3389
|
Close Open Resources
|
Unassigned
|
Furkan Kamaci
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3386
|
Add "times" to MockParser
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3382
|
Improve writelimitreached handling
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3378
|
Move tika-langdetect-commons to tika-langdetect-test-commons in 2.x
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3377
|
Remove pipes components from TikaConfig in 2.x
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3372
|
Fix writelimit in recursiveparserhandler
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3362
|
AsyncParser and EmitterResource have handler type hardcoded to text
|
Tim Allison
|
Giovanni De Stefano
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-3359
|
Extract swf from PDFs
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3343
|
Move Tika's legacy lang id to its own submodule for Tika 2.0
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3340
|
LanguageProfile for Myanmar
|
Unassigned
|
Arky
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3329
|
RTG Translator with many-to-eng translation
|
Chris Mattmann
|
Thamme Gowda
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3318
|
MP3 parser using wrong xmpDM:duration units (which aren't clearly documented)
|
Nick Burch
|
Nick Burch
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3313
|
Improve performance and usability of RereadableInputStream
|
Unassigned
|
Peter Kronenberg
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3311
|
Add github workflows to Tika
|
Lewis John McGibbney
|
Lewis John McGibbney
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3310
|
MP4 video detected as application/mp4
|
Unassigned
|
Peter Kronenberg
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3301
|
Simplify forking/monitoring in tika-server for 2.x
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3298
|
Add a "preloadLangs" parameter to TesseractOCRParser
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3297
|
Simplify parser configuration in 2.x
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3292
|
Remove GSON where possible in 2.x
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3287
|
Add http fetcher
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3286
|
Tika does not issue an error when language file doesn't exist; not supporting script files
|
Unassigned
|
Peter Kronenberg
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3283
|
Add an s3 emitter to tika-pipes
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3280
|
server-core not bundled w server-classic in 2.0.0-ALPHA
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3273
|
Further metadata cleanup for TIka 2.0.0
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3271
|
Change default image resize size in TesseractParser's pre-processing step
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3267
|
Method getEnableImageProcessing() in TesseractOCRConfig should be renamed
|
Tim Allison
|
Peter Kronenberg
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3266
|
Generalize OCRParser so that users can service load custom ocr parsers
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3259
|
Improve logging for TesseractOCRParser
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3258
|
Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3256
|
Update maven and maven min version
|
Tilman Hausherr
|
Tilman Hausherr
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3255
|
Parsing MP3 file with record size > 100000 fails
|
Unassigned
|
Peter Kronenberg
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3253
|
improve "attachments" tika-eval report directory
|
Unassigned
|
Tilman Hausherr
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3248
|
ClassCastException: class PDSimpleFileSpecification cannot be cast to PDComplexFileSpecification
|
Tilman Hausherr
|
Tilman Hausherr
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3247
|
Make spawnChild default mode for tika-server in 2.0
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3246
|
IllegalArgumentException when generation of appearances fails
|
Tilman Hausherr
|
Tilman Hausherr
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3244
|
General upgrades for 1.26
|
Unassigned
|
Tilman Hausherr
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3242
|
Allow users to send arbitrary metadata to tika-server per document
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3240
|
Modularize tika-eval into core and app for 2.0.0
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3237
|
Great optimization in ForkParser
|
Luís Filipe Nassif
|
Luís Filipe Nassif
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3226
|
Add custom connector endpoint
|
Tim Allison
|
Nicholas DiPiazza
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3218
|
Wrong comment for method sortLoadedClasses in ServiceLoaderUtils
|
Unassigned
|
Peter Lee
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3199
|
Improve fuzzing of PDF streams
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3196
|
PackageParser should attempt to parse entries from zip files with STORED entries with data descriptor
|
Unassigned
|
Trevor Bentley
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3193
|
Add mime detection for avif
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3192
|
TIKA 2.0.0 -- after the dust has settled, rat-check
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3190
|
Tika 2.0.0 -- move tika-eval's language detector into a langdetect submodule
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3185
|
tika-parsers-integration-test fails on windows with File being used by another process.
|
Bob Paulin
|
Bob Paulin
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3180
|
Tika 2.0.0 -- Modularize tika-server
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3179
|
Tika 2.0.0 -- Clean up parser module hierarchy
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3178
|
Tika 2.0.0 -- Add back OSGi bundles for Tika parsers
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3176
|
Tika 2.0.0 -- Modularize language detectors
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3166
|
Actually maven-modularize the packages for 2.0
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3093
|
Enable tika-server to forward parse results to another endpoint
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3025
|
增加一个新的pjepg parser
|
Unassigned
|
Shadow Liao
|
|
Closed |
Incomplete
|
|
|
|
|
|
|
TIKA-3004
|
OutlookPSTParser missing emails attached to other emails
|
Luís Filipe Nassif
|
Luís Filipe Nassif
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2972
|
Allow users to specify a list/map of ContentHandlerFactories in tika-config.xml
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2959
|
TabularFormatsTest test fails in Germany
|
Unassigned
|
Tilman Hausherr
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2958
|
XmlBeanDefinitionStoreException with SpringExample
|
Unassigned
|
Tilman Hausherr
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2949
|
Update Jackson to 2.9.10
|
Unassigned
|
Colm O hEigeartaigh
|
|
Resolved |
Duplicate
|
|
|
|
|
|
|
TIKA-2944
|
TikaConfig should support the parameters without XML type attribute
|
Sergey Beryozkin
|
Sergey Beryozkin
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2943
|
Modularize tika-parsers
|
Sergey Beryozkin
|
Sergey Beryozkin
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2892
|
ForkParser deadlock when InputStreamResource catches/returns IOException
|
Luís Filipe Nassif
|
Luís Filipe Nassif
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2841
|
Improve robustness of parsers of zip-based files on truncated files
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2838
|
RTF document processing glues comment fields together with text without whitespace
|
Tim Allison
|
Karl Wright
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2827
|
Improve tika-eval comparison reports to include mime types in A and B for diffs
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2826
|
Add a csv/tsv parser
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2816
|
Error when sending request to /tika with header X-Tika-OCRMinFileSizeToOcr
|
Tim Allison
|
Anssi Törmä
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2810
|
Back off to tagsoup when xml parser fails on Tika xhtml in tika-eval
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2809
|
Add reports for structure tags to tika-eval
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2807
|
.docx text extract leaves out rich text content-control inside of a text box
|
Tim Allison
|
Claudia Mickiewicz
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2800
|
Include num of unique common/alphabetic tokens (types) in tika-eval
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2799
|
Consider reverting jackcess
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2798
|
Consider reverting junrar
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2795
|
Error starting Tika 2.0 server with -spawnChild on Ubuntu
|
Tim Allison
|
Mario Bisonti
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2791
|
Add structure tags to tika-eval
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2788
|
Upgrade to PDFBox 2.0.13 when available
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2787
|
Make WriteLimitReachedException public and not subclass of SAXException
|
Unassigned
|
Dmitry Goldenberg
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2785
|
Switch parent/child IPC to mmap file from stdout/stderr in tika-server
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2784
|
Add static grabbing of stdout/err to MockParser
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2782
|
Protect IPC via stdout in child process in tika-server in -spawnChild mode
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2780
|
Intermittent failures in batch mode when STDIN = /tmp/null
|
Tim Allison
|
Jeroen
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2779
|
Integrate/parameterize new rotated text handling in PDFBox
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2778
|
Upgrade jaxb-runtime and javax.activation
|
Tim Allison
|
Hans Brende
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2777
|
Unbounded regex in Optimaize can lead to really, really slow processing
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2776
|
Tika server child restart
|
Tim Allison
|
Mario Bisonti
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2773
|
Upgrade Sqlite to 3.25.2
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2770
|
Convert EnviHeader "map info" from UTM to LatLon
|
Lewis John McGibbney
|
Kristen Cheung
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2765
|
Regression extracting text from corrupted docx files
|
Tim Allison
|
Luís Filipe Nassif
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2764
|
Allow configuration to include/not deleted text in WordPerfect 6.x files
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2762
|
Capture short fields (<150 chars) in EnviParserHeader Metadata
|
Lewis John McGibbney
|
Lewis John McGibbney
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2761
|
XML Structured Text Is Missing Metadata Fields for mp3 files
|
Tim Allison
|
Nick Sincaglia
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2759
|
ScriptsExtractor incorrectly reports Javascript to characters() in SAX ContentHandler
|
Tim Allison
|
Markus Jelsma
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2756
|
Switch to commons-lang 3
|
Tim Allison
|
Robert Munteanu
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2754
|
Log file name in tika-server on exception/error
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2753
|
ChildProcess does not use the JAVA_HOME
|
Tim Allison
|
Julien Massiera
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2751
|
Upgrade to POI 4.0.1 when available
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2748
|
trivial tika-server bug w -maxFiles in new -spawnChild mode
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2745
|
Upgrade to PDFBox 2.0.12 when available
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2743
|
Replace com.sun.xml.bind:jaxb-impl and jaxb-core by org.glassfish.jaxb:jaxb-runtime and jaxb-core
|
Tim Allison
|
Thomas Mortagne
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2742
|
Tika 1.19 trigger a dependency on slf4j-log4j12
|
Tim Allison
|
Thomas Mortagne
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2739
|
ForkParser child processes should be headless
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2738
|
tika-app's -f (ForkParser) option isn't working
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2736
|
improve tika-eval comparison reports to more clearly flag major regressions
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2732
|
Allow configuration of XMLReaderUtils via TikaConfig
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2730
|
parseToString fails for a simple mp3
|
Tim Allison
|
Boris Petrov
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2729
|
add -Djava.awt.headless=true to child process in tika-server
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2727
|
Parsing and detect mime type of XML file stuck in infinite loop
|
Tim Allison
|
Slava G
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2726
|
Handle truncated ooxml more robustly
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Duplicate
|
|
|
|
|
|
|
TIKA-2725
|
Make tika-server robust against ooms/infinite loops/memory leaks
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2721
|
Exclude Spring (transitive dependency) from tika-parsers
|
Konstantin Gribov
|
Konstantin Gribov
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-2716
|
Sonatype Nexus auditor is reporting that spring framework vesrion used by Tika 1.18 is vulnerable
|
Konstantin Gribov
|
Abhijit Rajwade
|
|
Closed |
Won't Fix
|
|
|
|
|
|
|
TIKA-2707
|
Upgrade to commons-compress 1.18
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2706
|
Store exceptions from VBAMacroReader as we do other embedded exceptions
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2705
|
Allow configuration of TesseractOCRParser as we do for other parsers
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2704
|
MPEGStream should throw an EOF if appropriate in skipFrame
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2695
|
Upgrade Lucene in tika-eval and tika-example
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2693
|
Tika 1.17 uses the wrong classloader for reflection
|
Unassigned
|
Karl Wright
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2692
|
Blanket upgrades in prep for 1.19
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2691
|
Can't create a RPM
|
Tim Allison
|
Celpan Valeria
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2690
|
Exclude commons-logging & commons-logging-api from uimafit-core
|
Unassigned
|
Hans Brende
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2688
|
MBOX not recognized when unknown X-headers are present
|
Tim Allison
|
Yury Kats
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2687
|
Avoid potential to overwrite attachments
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2686
|
pdfbox fontbox 2.0.8 has security vulnerability CVE-2018-8036 and should be upgraded to 2.0.11
|
Unassigned
|
Abhijit Rajwade
|
|
Resolved |
Duplicate
|
|
|
|
|
|
|
TIKA-2682
|
Upgrade jempbox to 1.8.15
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2681
|
Upgrade to PDFBox 2.0.11
|
Konstantin Gribov
|
Konstantin Gribov
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-2677
|
ConcurrentModificationException in org.apache.tika.mime.MediaTypeRegistry.getAliases
|
Tim Allison
|
Yuriy Koval
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2675
|
OpenDocumentParser should fail on invalid zip files
|
Tim Allison
|
Sebastian Nagel
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2673
|
HtmlEncodingDetector doesn't follow the specification
|
Tim Allison
|
Gerard Bouchar
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2672
|
Upgrade dl4j to 1.0.0-beta2
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2669
|
Tika JAX-RS PDF parser option / custom config issue
|
Tim Allison
|
Annie Didier
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2668
|
Fix 'can't overwrite cause' exception in TaggedSAXException in Java 11-ea
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2667
|
Upgrade jmatio to 1.4
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2664
|
Upgrade junrar to 1.0.1
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2662
|
Add a streaming out option for the Json serialization
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2661
|
Upgrade commons-compress to 1.17
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2658
|
Add magic numbers of Olympus ORF Files
|
Unassigned
|
Selim Dincer
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2657
|
Add System.exit() and heavy gc hang to MockParser
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2656
|
Allow users to specify timeout for parsing and/or waiting in ForkParser
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2655
|
Allow the RecursiveParserWrapper to work with the ForkParser
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2653
|
Allow users to specify a directory of jars for classloading in ForkParser
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2647
|
Create a "security" page on our website
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2645
|
Reuse SAXParsers where possible
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2644
|
Improve RecursiveParserWrapper API
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2637
|
ParsingReader.read throws exception when no bytes are available
|
Tim Allison
|
Boris Petrov
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2635
|
Require imageMagick path be specified on Windows OS
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2634
|
Upgrade Jackson to 2.9.5
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2629
|
Add image/x-dpx media-type detection
|
Unassigned
|
Andreas Meier
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2628
|
Add image/aces media-type detection
|
Unassigned
|
Andreas Meier
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2620
|
Set sys property to get better rendering speed by default
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2618
|
LabelRecord and LabelSSTRecord text can be overwritten in xls
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2617
|
Ignore NPOIFS IOOBE in PPT attachments
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2616
|
message/news now incorrectly identified as rfc822
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2614
|
RFC822 treats non-multipart as attachment
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2613
|
Tesseract 4.0 has removed -psm, so Tika must update
|
Unassigned
|
Ewan Mellor
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2608
|
tika matlab parser incorrectly identifies content type of minified javascript file
|
Unassigned
|
pdwalker
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-2607
|
TIKA-2579
Exchange levigo-jbig2-imageio with pdfbox-jbig2-imageio:3.0.0
|
Unassigned
|
Andreas Meier
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2604
|
Error with certain jar paths on OS X
|
Tim Allison
|
Sasha Goodman
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2601
|
Invalid XHTML output (overlapping a and formatting tags) for some WORD documents
|
Konstantin Gribov
|
Filip
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-2600
|
Don't use md5 checksum due to changes to the release distribuition policy
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2598
|
Fix dependency convergence
|
Tim Allison
|
Guillaume Smet
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2594
|
Mail detected as application/xhtml+xml
|
Unassigned
|
Andreas Meier
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2592
|
HTML with charset unicode handled as utf-16 instead utf-8
|
Unassigned
|
Andreas Meier
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2591
|
Some tiffs (Big Endian with fax compression) are showing up as x-tarr
|
Unassigned
|
daniel schmidt
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2590
|
ExcelExtractor: cannot choose listening to the selected records only
|
Unassigned
|
Grigoriy Alekseev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2588
|
Tika detecting/parsing pptx with embedded Excel worksheet(s)...
|
Tim Allison
|
Brian McColgan
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-2587
|
DKIM signed mails recognized as text/plain
|
Unassigned
|
Andreas Meier
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2584
|
Tika should have a way to pass arbitrary Tesseract options
|
Unassigned
|
Ewan Mellor
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2582
|
Tesseract 4.0 includes a FF character by default, breaking parsers
|
Unassigned
|
Ewan Mellor
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2580
|
SafeContentHandler documentation is incorrect about replacement character
|
Unassigned
|
Ewan Mellor
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2579
|
Update to PDFBox 2.0.9 when available
|
Tim Allison
|
David Pilato
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-2578
|
Mails not recognized when unknown X-headers are present
|
Tim Allison
|
Andreas Meier
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2576
|
Add application/zstd detection and parser
|
Unassigned
|
Andreas Meier
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2571
|
Swallows security exception and returns null
|
Unassigned
|
Nik Everett
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2570
|
Tika 1.17 uses vulnerable Jackson version 2.9.2
|
Unassigned
|
Julian Reschke
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2569
|
Grouped Text boxes in .ppt
|
Tim Allison
|
Richard A
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2568
|
Full encrypted 7Z file not detected as such
|
Luís Filipe Nassif
|
Luís Filipe Nassif
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2566
|
TIKA-2085
Move logging in tika-core to slf4j-api (with log4j in test scope) as we do in the rest of Tika
|
Konstantin Gribov
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2564
|
Tika client cannot extract files from embedded archive formats
|
Tim Allison
|
Marc Prud'hommeaux
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2563
|
Extract embedded objects in HTML and javascript
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2561
|
Tika Parser includes oudated/vulnerable version of JSoup
|
Unassigned
|
Asela
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2559
|
Expose language metadata from PDF documents
|
Unassigned
|
Matt Sheppard
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2556
|
org.json package clash
|
Unassigned
|
Andrei Rebegea
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2555
|
Text with [underline] + [another format] in word document generates overlapping html tags.
|
Konstantin Gribov
|
Serban Alexe
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2552
|
Upgrade to POI 4.0.0 when available
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2551
|
TIka Server uses HtmlParser for XML no matter what config is given, even if XML is disabled in Config
|
Unassigned
|
Nick Burch
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2550
|
ToTextHandler includes <style/> element content
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2549
|
NoSuchMethodException "CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)" parsing certain .docx files
|
Unassigned
|
Adam Rauch
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2548
|
Add Python Path configuration to TesseractOCRParser
|
Tim Allison
|
Dave Meikle
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2547
|
RFC822 w multipart/mixed first text element should be treated as body, not attachment
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2541
|
Referenced version of Apache SIS (org.apache.sis) is branch EOL
|
Unassigned
|
Richard Jones
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2535
|
Use latest org.opengis:geoapi to avoid rejected/EOL'd jsr-275 dependency
|
Tim Allison
|
Richard Jones
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2528
|
Fix key location, keys file and download link
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2527
|
Typos in tika-mimetypes.xml
|
Unassigned
|
Andreas Meier
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2524
|
Create/integrate a parser for XPS
|
Tim Allison
|
Peter Davies
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2479
|
Handle empty cells in tables uniformly
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2462
|
Add a parser for sas7bdat
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2446
|
Tainted Zip file can provoke OOM errors
|
Unassigned
|
Thorsten Schäfer
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2390
|
Extract images embedded in Html
|
Unassigned
|
Luís Filipe Nassif
|
|
Resolved |
Duplicate
|
|
|
|
|
|
|
TIKA-2385
|
Tesseract OCR rotation.py not run
|
Dave Meikle
|
Peter Weiss
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2354
|
Missing many embedded images in .doc files
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2352
|
Incorrect EOF exception in WordPerfect parser
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2350
|
Add catch block when opening Action on document open in PDFParser
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2349
|
Try to match digests when finding equivalent embedded files in tika-eval Compare
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2348
|
Improve error reporting in wmf/emf
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2343
|
--text-main in tika-server
|
Unassigned
|
Nino Skopac
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2339
|
Remove test file flagged by anti-virus code
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2338
|
Change Scope of Jai-ImageIO-Core dependency
|
Luís Filipe Nassif
|
Luís Filipe Nassif
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2329
|
Upgrade to POI 3.16-final
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2325
|
Allow specification of default lang for common words
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2323
|
Improve commandline parameterization of thresholds
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2314
|
Migrate logging to slf4j in master (2.x) branch
|
Konstantin Gribov
|
Konstantin Gribov
|
|
Resolved |
Resolved
|
|
|
|
|
|
|
TIKA-2311
|
Preserve "x-tika-ooxml" mime value for truncated ooxml files
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2309
|
New Detector and Parser classes for Time Stamped Data Envelope file format
|
Unassigned
|
Fabio
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2307
|
Accidentally swallowing UnsupportedZipFeatureException in rare cases
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2300
|
Can't tell if a zip file is encrypted
|
Tim Allison
|
Aeham Abushwashi
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2295
|
Image not extracted via -z or -J in ODT
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2290
|
PDFParser 'ocr' properties cannot be set via headers when using Tika JAXRS
|
Tim Allison
|
Kevin Oberlag
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2287
|
Allow general jdbc connectivity for tika-eval
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2286
|
Add parameterization for image quality when rendering PDF page for OCR
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2281
|
Let's extract the MAPI subtype (NOTE, STICKY, etc.) for msg files
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2279
|
Simplify token counting in tika-eval
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2276
|
Try to be more parsimonious creating TikaConfigs and ParseContexts
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2275
|
EmbeddedDocumentUtil should check parseContext for a TikaConfig
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2269
|
NPE with FeedParser
|
Unassigned
|
Julien Nioche
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-2267
|
Add common tokens files for tika-eval
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2247
|
Extract text from WMF/EMF files
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2246
|
Extract files embedded within EMF files
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2244
|
excessive memory usage when parsing a large nested package file
|
Unassigned
|
Joshua Hight
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2242
|
opendocument parsing produces malformed xml
|
Tim Allison
|
Jan Van Raemdonck
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2240
|
MS Write File
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2238
|
Add mime detection for embedded MSEquation files
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2237
|
UnsupportedOperationException due to SingletonList.set in ProbabilisticMimeDetectionSelector
|
Unassigned
|
Jasper Hafkenscheid
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2236
|
Upgrade to PDFBox 2.0.5 when available
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2235
|
Use Tesseract's recommended DPI for PDF images
|
Unassigned
|
Matthew Caruana Galizia
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2234
|
Remove ThreadLocal from dateformat
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2232
|
Add JBIG2 image parsing support
|
Tim Allison
|
Pascal Essiembre
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2231
|
Invalid language code exception
|
Unassigned
|
Peter Weiss
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2230
|
Add paragraph markup to WordPerfect parser(s)
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2229
|
NullPointerException at org.apache.tika.parser.microsoft.ooxml.XWPFListManager.getFormattedNumber(XWPFListManager.java:64)
|
Unassigned
|
Jorge Spinsanti
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2228
|
WordPerfect parser update to support 5.x
|
Unassigned
|
Pascal Essiembre
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2226
|
Add UnsupportedFormatException (extends TikaException)
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2223
|
Extra ß characters in some WordPerfect files
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2221
|
poi.EncryptedDocumentException not wrapped in tika.exception.EncryptedDocumentException
|
Unassigned
|
Matthew Caruana Galizia
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2219
|
CharsetDetector no longer detects windows-1252 charset
|
Unassigned
|
Pascal Essiembre
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2218
|
Add a few more places where PPTX relationships might include an attachment
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2215
|
TikaException about "Invalid embedded resource" on a valid PPT file
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2212
|
Update mimes for OOXMLParser
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2211
|
ePub formatting instructions appear in plain text output
|
Unassigned
|
Adam Carroll
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2210
|
Add experimental SAX/Streaming XSLF/pptx extractor
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2209
|
Update PDFBox to 2.0.4
|
Konstantin Gribov
|
Konstantin Gribov
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-2208
|
Catch missing libraires
|
Unassigned
|
David Pilato
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2207
|
ArrayIndexOutOfBoundsException on a valid Excel file
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2204
|
IndexOutOfBoundsException on a valid Powerpoint file
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2198
|
NullPointerException on a valid Word file
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2192
|
Extract embedded files from headers, footers, footnotes, etc from docx/m
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2191
|
Apply current .docx unit tests to experimental SAX parser and fix or document as necessary
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2190
|
Add "preserve_interword_spaces" option of tesseract
|
Tim Allison
|
Bipul Kumar
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2187
|
Align default behavior of experimental docx parser with that of doc parser in handling delText
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2181
|
Upgrade to POI 3.16-beta2 when available
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2179
|
WordMLParser fails to parse a word xml file
|
Tim Allison
|
Sean Story
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2175
|
Enable extraction of inlined jp2/jpx from PDF
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2174
|
Too few formats in support declared by TesseractOCRParser
|
Unassigned
|
Matthew Caruana Galizia
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2170
|
Tika 1.13 ForkParser fails intermittently with very large MS Word docx
|
Unassigned
|
Tim Kingsbury
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2169
|
Fix xhtml in combination OCR+metadata extraction from images
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2167
|
Image processing causes OCR to fail
|
Unassigned
|
Matthew Caruana Galizia
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2166
|
TaggedIOException from a ZipException on a valid Word file
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2164
|
HSLFException from ZipException "invalid stored block lengths" on a valid Powerpoint file
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2162
|
"Unknown compression method" on a Powerpoint file
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2161
|
EOFException on a valid Powerpoint file
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2160
|
POIXMLException from NullPointerException on a valid Word file
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2159
|
Handle pre-parse embedded object exceptions uniformly and more robustly
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2158
|
NullPointerException on a valid Word file
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2155
|
IndexOutOfBoundsException on a valid Excel file
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2153
|
TaggedIOException on a valid Powerpoint file
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2152
|
NullPointerException on a valid Word file
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2151
|
Imposed Write Limit Causes Lost Data With Pdfs
|
Unassigned
|
Josh Cummings
|
|
Resolved |
Duplicate
|
|
|
|
|
|
|
TIKA-2145
|
InvalidFormatException on a valid Word file
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2142
|
ArrayIndexOutOfBoundsException
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2137
|
NullPointerException on a valid Word file
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2136
|
External file links in PPTX misparsed
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2134
|
Different NullPointerException on a valid Excel file
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2132
|
NullPointerException on a valid Excel file
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2130
|
TaggedIOException from ZipException on a valid PowerPoint file
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2129
|
IllegalArgumentException/"Unknown shape type" on a valid Powerpoint file
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2127
|
NullPointerException on a valid PPTX
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2125
|
XmlValueOutOfRangeException on a good Word document
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2123
|
CommonsDigester calculates wrong hashes on large files
|
Unassigned
|
Yahav Amsalem
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2122
|
Extract all email headers from Outlook .msg files into Metadata
|
Unassigned
|
Chris Knott
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2118
|
Misleading exception on a password protected XLS
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2117
|
NullPointerException on PDF (fixed in PDFBox)
|
Unassigned
|
Seva Alekseyev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2116
|
Upgrade to POI 3.16-beta1 when available
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2115
|
OOM caused by corrupt embedded OLE object
|
Unassigned
|
Thomas Galla
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2113
|
Upgrade metadata-extractor to 2.9.1
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2111
|
Executable Parser adds Content-Type instead of setting
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2109
|
OutOfMemory when parsing 5MB word document
|
Unassigned
|
Julian
|
|
Resolved |
Not A Bug
|
|
|
|
|
|
|
TIKA-2104
|
Upgrade to a version of POI that fixes common bugs in macro extraction, when available
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2100
|
Html Parser does not keep the html tag attributes
|
Unassigned
|
Gerard Bouchar
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2098
|
Tika.parseToString() with maxLength doesn't work correctly for PDF files
|
Tim Allison
|
Alexander Kazakov
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2097
|
Fix NPE in mbox parser
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2096
|
Supply AutoDetectParser for embedded documents if user forgets to pass it in via ParseContext
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2095
|
Include version of Tika in tika-server's GREETING
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2090
|
Extract javascript from PDActions in PDFs
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2083
|
TIKA-2085
Tika 2.0 - Audit master branch against 2.x branch
|
Bob Paulin
|
Bob Paulin
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-2082
|
Upgrade to PDFBox 2.0.3
|
Unassigned
|
Luís Filipe Nassif
|
|
Closed |
Duplicate
|
|
|
|
|
|
|
TIKA-2081
|
Add back 'fileUrl' functionality to TikaJAXRS Server subject to security controls
|
Tim Allison
|
John Dougrez-Lewis
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2078
|
Account for potentially multiple runs within a hyperlink in DOCX
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2069
|
Extract Macro text from Microsoft Office documents
|
Unassigned
|
Jeff Swindle
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2067
|
Upgrade maven plugin versions
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2066
|
Upgrade commons-io to 2.5
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2065
|
Upgrade forbiddenapis to 2.2
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2058
|
Memory Leak in Tika version 1.13 when parsing millions of files
|
Unassigned
|
Tim Barrett
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2057
|
Extract PDF DocInfo fields into separate metadata fields
|
Tim Allison
|
John Haynes
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2055
|
Exception on parsing .docx file
|
Unassigned
|
Sebastian Iturra
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2051
|
Upgrade to PDFBox 2.0.3 when available
|
Tim Allison
|
Tim Allison
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-2048
|
Add space for <br/> elements in MSWord 2003XML
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2047
|
TXTParser overwrites mime type/masks types that are subtype of text
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2045
|
TIKA crashes / runs out of memory on simple PDF
|
Unassigned
|
Egbert
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2041
|
Charset detection doesn't appear to be thread-safe
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2040
|
OOM when parsing a corrupted CHM
|
Tim Allison
|
Luís Filipe Nassif
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2039
|
Upgrade jackcess to 2.1.4
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2026
|
Handle OLE 2.0 embedded non-Office document in PPT/X and XLSX
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2025
|
Extraction of long sequences of digits from Excel spreadsheets using Tika 1.13 doesn’t yield the expected results
|
Tim Allison
|
Aeham Abushwashi
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2024
|
Extract original filename/path when possible
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2022
|
Add applefile parser
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2020
|
Tika 2.0 - remove AbstractParser's 3 parameter parse
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2019
|
WordMLParser and SpreadsheetMLParser incorrectly concatenate tokens with ToTextHandler
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2015
|
MAPIMessage String fileName constructor leaves file open
|
Unassigned
|
Tim Barrett
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2013
|
Upgrade to POI 3.15 when available
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2011
|
Add mime detection for Endnote Import File (PRONOM: fmt/328)
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2009
|
Add magic for djvu
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2008
|
Add mime detection (and parser?) for MSOffice Owner File (PRONOM fmt/473)
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2006
|
Add magic for vCalendar and iCalendar
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-2004
|
Add mime detection for Windows Media Metafile, PRONOM: application/x-puid-fmt-584
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1999
|
org.apache.tika.sax.ToXMLContentHandler$ElementInfo.getPrefix(ToXMLContentHandler.java:58)
|
Tim Allison
|
Egbert
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1996
|
Upgrade to PDFBox 2.0.2 when available
|
Tim Allison
|
Tim Allison
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-1994
|
Integrate OCR with PDFParser
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1990
|
Broken .jpg inline image from .pdf files
|
Tim Allison
|
Kukushkin Alexander
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1983
|
TIKA-2085
Tika 2.0 - remove tika-app's legacy server
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1980
|
HTML head tags found after first script not parsed by HtmlParser (regression)
|
Tim Allison
|
Joseph Naegele
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1978
|
Invocation of java.net.URL.equals(Object), which blocks to do domain name resolution, in org.apache.tika.parser.geo.topic.GeoParser.initialize(URL)
|
Lewis John McGibbney
|
Lewis John McGibbney
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1977
|
RFC822Parser 'adds' dc:title causing rare exceptions if > 1 'subject'
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1976
|
Add more robust date parsing fallbacks for RFC822 parser
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1974
|
TIKA-2085
Tika 2.0 - remove deprecated metadata properties
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1971
|
Email saved as .eml with no body not detected as rfc822, while same email saved as plain txt is.
|
Unassigned
|
Philipp Steinkrueger
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1970
|
Date not extracted from email saved as plain txt
|
Unassigned
|
Philipp Steinkrueger
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1961
|
OutOfMemory when parsing shapes xml from xlsx files with multi-byte Unicode characters
|
Tim Allison
|
Andrei Rebegea
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-1959
|
Upgrade to PDFBox 2.0.1/JempBox 1.8.12
|
Unassigned
|
Tim Allison
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-1958
|
Add mime detection and lightweight parsers for Office 2003 Word and Excel formats
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1956
|
NPE in WordParser when trying to getPicOffset
|
Tim Allison
|
Ramit Wadhwa
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1949
|
Upgrade to Commons Compress 1.11
|
Tim Allison
|
Nick Burch
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1948
|
Catch exceptions per page in PDFParser
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1946
|
Add mime detection and parser for WordPerfect
|
Unassigned
|
Nick C
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1938
|
HtmlParser drops <script> elements found inside <head>
|
Kenneth William Krugler
|
Joseph Naegele
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1937
|
LinkContentHandler skips script tags
|
Unassigned
|
Joseph Naegele
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1935
|
TIKA-1936
ISArchiveParser not releasing resources
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1934
|
TIKA-1936
GeographicInformationParserTest leaving behind temp file in trunk
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1932
|
TIKA-1936
Clear resources in ParserDecorator
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1924
|
Upgrade com.googlecode.mp4parser's isoparser to 1.1.18
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1918
|
Shouldn't have to specify outputSuffix in tika-batch
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1906
|
ExternalParser No Longer Supports Commands in Array Format
|
Ray Gauss II
|
Ray Gauss II
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1895
|
Upgrade to POI 3.15-beta1 when available
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1879
|
Extract recipient information in MSG files with more granularity
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1866
|
Out of memory error on Word document
|
Unassigned
|
Shawn Johnson
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1865
|
Save sender email address in Outlook MSG metadata
|
Unassigned
|
Luís Filipe Nassif
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1851
|
TIKA-1824
Tika 2.0 - Move test resources from core to test-resources
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Won't Fix
|
|
|
|
|
|
|
TIKA-1847
|
TIKA-1824
Tika 2.0 - Clean up tika-parsers pom dependencies and a few other things
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1846
|
Set up Hudson (or similar?) with new Git repo
|
Lewis John McGibbney
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1844
|
PooledTimeSeriesParser takes precedence over MP4Parser
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1822
|
NullPointerException when parsing a .doc file
|
Tim Allison
|
Panagiotis Mpailis
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1805
|
Default parser/detector loading should warn on missing/empty classes
|
Unassigned
|
Nick Burch
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1706
|
Bring back commons-io to tika-core
|
Unassigned
|
Yaniv Kunda
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1658
|
unable to parse microsoft visio files with tika
|
Unassigned
|
senthil
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1513
|
Add mime detection and parsing for dbf files
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1473
|
Apache Tika is not working for .docx documents
|
Unassigned
|
Franco Catto
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1436
|
improvement to PDFParser
|
Unassigned
|
Stefano Fornari
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1332
|
TIKA-1302
Create tika-eval module
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1321
|
Add experimental SAX/Streaming XWPF/docx extractor
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1301
|
Establish TikaServer on Apache hosted VM
|
Lewis John McGibbney
|
Lewis John McGibbney
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1255
|
WordExtractor - bold hyperlink not closed properly
|
Tim Allison
|
Alan Hunter
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1195
|
XLSB support
|
Unassigned
|
Frederic Ronny
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-879
|
Detection problem: message/rfc822 file is detected as text/plain.
|
Unassigned
|
Konstantin Gribov
|
|
Closed |
Duplicate
|
|
|
|
|
|
|
TIKA-456
|
Support timeouts for parsers
|
Tim Allison
|
Kenneth William Krugler
|
|
Resolved |
Fixed
|
|
|
|
|