|
|
TIKA-1457
|
NullPointerException in tika-app, parsing PDF content
|
Unassigned
|
Tadeu Alves
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1407
|
Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@5d11346a
|
Unassigned
|
Matthieu Neamar
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1396
|
Embedded images in PDF documents
|
Unassigned
|
Damiano
|
|
Closed |
Not A Problem
|
|
|
|
|
|
|
TIKA-1381
|
Add Lingo24Translate implementation of Translate API
|
Dave Meikle
|
Dave Meikle
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-1380
|
Upgrade to Apache POI 3.11 beta 1
|
Unassigned
|
Nick Burch
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1378
|
MicrosoftTranslator setClient and setId NPE
|
Chris A. Mattmann
|
Chris A. Mattmann
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1376
|
Improve embedded file name extraction in PDFParser
|
Tim Allison
|
Tim Allison
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-1375
|
Decrease memory consumption when extracting images from PDFs
|
Tim Allison
|
Tim Allison
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-1374
|
Need to add code to look for OS-specific keys for embedded files within PDFs
|
Tim Allison
|
Tim Allison
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-1370
|
CachedTranslator Implementation
|
Tyler Bui-Palsulich
|
Tyler Bui-Palsulich
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1363
|
.mat files not parsing
|
Chris A. Mattmann
|
Ann Burgess
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1362
|
Add GoogleTranslate implementation of Translation API
|
Chris A. Mattmann
|
Chris A. Mattmann
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1361
|
Update MP4Parser to 1.0.2
|
Unassigned
|
Matthias Krueger
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1360
|
Update description and fix typos in site
|
Unassigned
|
Tyler Bui-Palsulich
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1359
|
Wrong getting started link on site
|
Unassigned
|
Tyler Bui-Palsulich
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1357
|
Buffered text in EnviHeaderParser
|
Tyler Bui-Palsulich
|
Ann Burgess
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1353
|
OpenDocumentParser doesn't correctly process metadata
|
Unassigned
|
Steve R
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1352
|
Upgrade to PDFBox 1.8.6
|
Unassigned
|
Tim Allison
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-1350
|
OutlookPSTParser: Unknown message type: IPM.Note
|
Unassigned
|
Jonathan Evans
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1341
|
double invocation of handler.endDocument() in PDFParser
|
Tim Allison
|
Christian Reuschling
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1339
|
Upgrade rome dependency to 1.0
|
Chris A. Mattmann
|
Pradeep Singh
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1338
|
Converted README to Markdown
|
Chris A. Mattmann
|
Chris A. Mattmann
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1337
|
LanguageProfile for Persian/Farsi
|
Chris A. Mattmann
|
Omid Pourhadi
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1336
|
Provide a Detector JAXRS endpoint
|
Chris A. Mattmann
|
Nick Burch
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1335
|
mime type for CSV files incorrectly detected as text/plain
|
Chris A. Mattmann
|
Kaijian Xu
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1327
|
New parser for Matlab .mat files
|
Chris A. Mattmann
|
Ann Burgess
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1324
|
Use a common path for the Tika Server unpacker resources
|
Unassigned
|
Nick Burch
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1322
|
XML file parse errors within archives trigger Zip bomb detection
|
Unassigned
|
Matthias Krueger
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1319
|
Translation
|
Chris A. Mattmann
|
Tyler Bui-Palsulich
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1317
|
Tika does not read text from Headers, Cover Pages, and SDT components of DOCX documents
|
Tim Allison
|
Vladimir Glina
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-1316
|
Old Site Code in Trunk
|
Chris A. Mattmann
|
Tyler Bui-Palsulich
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1313
|
XSL-FO detection
|
Unassigned
|
Marco Quaranta
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1312
|
FDF files detection
|
Unassigned
|
Marco Quaranta
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1311
|
Centralize JSON handling of Metadata
|
Unassigned
|
Tim Allison
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-1306
|
ClassCastException WARN [main] (COSDocument.java:303) - java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName in o.a.t.parser.pdf.PDFParserTest
|
Unassigned
|
Lewis John McGibbney
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1305
|
New list processing changes appear to be causing RTFParser exception
|
Unassigned
|
Chris Bamford
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1303
|
Parsing Html page (not well formed) containing two title tags results in metadata (title) to be overwritten
|
Kenneth William Krugler
|
Hassan Akram
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1297
|
Images not being extracted from PDFs
|
Unassigned
|
James Baker
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1294
|
Add ability to turn off extraction of PDXObjectImages (TIKA-1268) from PDFs
|
Tim Allison
|
Tim Allison
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-1292
|
Inconsistent priorities in bundled tika-mimetypes.xml
|
Unassigned
|
Tamas Cservenak
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1291
|
Invalid JSON output on CLI
|
Tim Allison
|
Steffen
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-1284
|
TikaException for Microsoft Powerpoint Document [ ppt ]
|
Unassigned
|
Chetan Laddha
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1282
|
Additional Gzip types:
|
Unassigned
|
Avi
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1281
|
Additional XML type: application/x-xml
|
Unassigned
|
Avi
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1280
|
GZip now has an official mimetype
|
Unassigned
|
Nick Burch
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1279
|
Missing return lines at output of SourceCodeParser
|
Hong-Thai Nguyen
|
Hong-Thai Nguyen
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1278
|
Expose PDF Avg Char and Spacing Tolerance Config Params
|
Ray Gauss II
|
Ray Gauss II
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1277
|
Magic bytes from Wikipedia
|
Jukka Zitting
|
Jukka Zitting
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1275
|
Upgrade Commons compress to 1.8.1
|
Unassigned
|
Fabian Lange
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1274
|
ENVI header parser
|
Chris A. Mattmann
|
Ann Burgess
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1272
|
tika-server version is incorrectly defined
|
Sergey Beryozkin
|
Lewis John McGibbney
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-1271
|
Move TrackingHandler into TikaTest and add a few other helper classes for embedded document tests
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1270
|
JAX-RS server should have endpoints which are like the "--list-<>" options to the CLI
|
Unassigned
|
Nick Burch
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1268
|
Extract images from PDF documents
|
Jukka Zitting
|
Jukka Zitting
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1265
|
[patch] Text output for NetCDF
|
Chris A. Mattmann
|
Ann Burgess
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1264
|
Improve PST file detection
|
Unassigned
|
LuÃs Filipe Nassif
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1263
|
Atom feed failed to detect
|
Unassigned
|
Sebastian Nagel
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1259
|
More ogg based mime entries
|
Unassigned
|
Nick Burch
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1258
|
Update NetCDF dependency
|
Unassigned
|
Konstantin Gribov
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-1257
|
MS Word Filter out control characters on ouput
|
Unassigned
|
Hong-Thai Nguyen
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1251
|
RuntimeException when parsing word (.doc) documents. Works in Tika 1.4 but not 1.5
|
Tyler Bui-Palsulich
|
Andreas
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1249
|
Vcard files detection
|
Unassigned
|
Marco Quaranta
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1248
|
CharsetDetector.getReader method doesn't support empty/null declaredEncoding
|
Kenneth William Krugler
|
Nicolas Gavalda
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1244
|
Better parsing of Mbox files
|
Hong-Thai Nguyen
|
LuÃs Filipe Nassif
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1241
|
Tika does not recognise empty nor spanning ZIP files magic
|
Unassigned
|
Tamas Cservenak
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1232
|
Add PDF version to PDFParser output
|
Tim Allison
|
William Palmer
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1231
|
Safely handle null embedded files in PDFs
|
Tim Allison
|
Tim Allison
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-1225
|
MDI files detection
|
Unassigned
|
Marco Quaranta
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1223
|
Extract thumbnail of OOXML Office files
|
Unassigned
|
Hong-Thai Nguyen
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1221
|
XPS detection
|
Unassigned
|
Marco Quaranta
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1205
|
Allow PDFParser to fallback to other parser if there is an exception
|
Tim Allison
|
Tim Allison
|
|
Closed |
Won't Fix
|
|
|
17/Jan/14
|
|
|
|
TIKA-1189
|
Fails to parse PPT file
|
Unassigned
|
Aimee Dev
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1182
|
Out of memory exception when parsing TTF file
|
Unassigned
|
Erik Hetzner
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1175
|
MS Money files wrongly detected as True Type Font
|
Unassigned
|
Boris Naguet
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1158
|
Wrong info on site for Container Aware Detector
|
Unassigned
|
Sasa Milenkovic
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1151
|
Maven Build Should Automatically Produce test-jar Artifacts
|
Ray Gauss II
|
Ray Gauss II
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1113
|
Parsing for OGV file results in java.lang.ClassCastException
|
Unassigned
|
Alexander Chow
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1112
|
Parsing for OGV file with invalid checksum
|
Unassigned
|
Alexander Chow
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1093
|
[OfficeParser] NullPointerException
|
Unassigned
|
Martin Kalcher
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1050
|
Charset detection gives wrong results for GB18030 encoding
|
Tyler Bui-Palsulich
|
Amit Gupta
|
|
Closed |
Cannot Reproduce
|
|
|
|
|
|
|
TIKA-967
|
Tika comes with transitive Maven dependency to a test artifact of vorbis-java-core
|
Unassigned
|
Andreas Hubold
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-941
|
Detecting KML / KMZ files
|
Jukka Zitting
|
Marco Quaranta
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-674
|
CompositeParser should indicate which parser was actually selected for parsing
|
Chris A. Mattmann
|
Andrzej Bialecki
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-623
|
Add support for Outlook PST
|
Unassigned
|
Nam-Quang Tran
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-411
|
Generate list of supported and detected types automatically
|
Tyler Bui-Palsulich
|
Jukka Zitting
|
|
Resolved |
Fixed
|
|
|
|
|