ASF JIRA

Tika
1.6
Key descending
185 of 85 as at: 19/Apr/24 13:11
T Patch Info Key Summary Assignee Reporter P Status Resolution Created Updated Due Development
Bug TIKA-1457

NullPointerException in tika-app, parsing PDF content

Unassigned Tadeu Alves Major Resolved Fixed  
Bug TIKA-1407

Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@5d11346a

Unassigned Matthieu Neamar Major Resolved Fixed  
Bug TIKA-1396

Embedded images in PDF documents

Unassigned Damiano Critical Closed Not A Problem  
New Feature TIKA-1381

Add Lingo24Translate implementation of Translate API

Dave Meikle Dave Meikle Minor Closed Fixed  
Improvement TIKA-1380

Upgrade to Apache POI 3.11 beta 1

Unassigned Nick Burch Major Resolved Fixed  
Bug TIKA-1378

MicrosoftTranslator setClient and setId NPE

Chris A. Mattmann Chris A. Mattmann Major Resolved Fixed  
Improvement TIKA-1376

Improve embedded file name extraction in PDFParser

Tim Allison Tim Allison Trivial Closed Fixed  
Improvement TIKA-1375

Decrease memory consumption when extracting images from PDFs

Tim Allison Tim Allison Minor Closed Fixed  
Improvement TIKA-1374

Need to add code to look for OS-specific keys for embedded files within PDFs

Tim Allison Tim Allison Minor Closed Fixed  
Bug TIKA-1370

CachedTranslator Implementation

Tyler Bui-Palsulich Tyler Bui-Palsulich Major Resolved Fixed  
Bug TIKA-1363

.mat files not parsing

Chris A. Mattmann Ann Burgess Major Resolved Fixed  
Bug TIKA-1362

Add GoogleTranslate implementation of Translation API

Chris A. Mattmann Chris A. Mattmann Major Resolved Fixed  
Improvement TIKA-1361

Update MP4Parser to 1.0.2

Unassigned Matthias Krueger Major Resolved Fixed  
Improvement TIKA-1360

Update description and fix typos in site

Unassigned Tyler Bui-Palsulich Minor Resolved Fixed  
Bug TIKA-1359

Wrong getting started link on site

Unassigned Tyler Bui-Palsulich Minor Resolved Fixed  
Improvement TIKA-1357

Buffered text in EnviHeaderParser

Tyler Bui-Palsulich Ann Burgess Minor Resolved Fixed  
Bug TIKA-1353

OpenDocumentParser doesn't correctly process metadata

Unassigned Steve R Major Resolved Fixed  
Improvement TIKA-1352

Upgrade to PDFBox 1.8.6

Unassigned Tim Allison Minor Closed Fixed  
Bug TIKA-1350

OutlookPSTParser: Unknown message type: IPM.Note

Unassigned Jonathan Evans Major Resolved Fixed  
Bug TIKA-1341

double invocation of handler.endDocument() in PDFParser

Tim Allison Christian Reuschling Critical Resolved Fixed  
Improvement TIKA-1339

Upgrade rome dependency to 1.0

Chris A. Mattmann Pradeep Singh Major Resolved Fixed  
Improvement TIKA-1338

Converted README to Markdown

Chris A. Mattmann Chris A. Mattmann Major Resolved Fixed  
Improvement TIKA-1337

LanguageProfile for Persian/Farsi

Chris A. Mattmann Omid Pourhadi Major Resolved Fixed  
Improvement TIKA-1336

Provide a Detector JAXRS endpoint

Chris A. Mattmann Nick Burch Major Resolved Fixed  
Bug TIKA-1335

mime type for CSV files incorrectly detected as text/plain

Chris A. Mattmann Kaijian Xu Major Resolved Fixed  
Improvement TIKA-1327

New parser for Matlab .mat files

Chris A. Mattmann Ann Burgess Major Resolved Fixed  
Improvement TIKA-1324

Use a common path for the Tika Server unpacker resources

Unassigned Nick Burch Major Resolved Fixed  
Bug TIKA-1322

XML file parse errors within archives trigger Zip bomb detection

Unassigned Matthias Krueger Minor Resolved Fixed  
New Feature TIKA-1319

Translation

Chris A. Mattmann Tyler Bui-Palsulich Minor Resolved Fixed  
Bug TIKA-1317

Tika does not read text from Headers, Cover Pages, and SDT components of DOCX documents

Tim Allison Vladimir Glina Major Closed Fixed  
Improvement TIKA-1316

Old Site Code in Trunk

Chris A. Mattmann Tyler Bui-Palsulich Trivial Resolved Fixed  
Improvement TIKA-1313

XSL-FO detection

Unassigned Marco Quaranta Minor Resolved Fixed  
Improvement TIKA-1312

FDF files detection

Unassigned Marco Quaranta Minor Resolved Fixed  
Task TIKA-1311

Centralize JSON handling of Metadata

Unassigned Tim Allison Minor Closed Fixed  
Bug TIKA-1306

ClassCastException WARN [main] (COSDocument.java:303) - java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName in o.a.t.parser.pdf.PDFParserTest

Unassigned Lewis John McGibbney Minor Resolved Fixed  
Bug TIKA-1305

New list processing changes appear to be causing RTFParser exception

Unassigned Chris Bamford Minor Resolved Fixed  
Bug TIKA-1303

Parsing Html page (not well formed) containing two title tags results in metadata (title) to be overwritten

Kenneth William Krugler Hassan Akram Minor Resolved Fixed  
Bug TIKA-1297

Images not being extracted from PDFs

Unassigned James Baker Major Resolved Fixed  
Improvement TIKA-1294

Add ability to turn off extraction of PDXObjectImages (TIKA-1268) from PDFs

Tim Allison Tim Allison Trivial Closed Fixed  
Bug TIKA-1292

Inconsistent priorities in bundled tika-mimetypes.xml

Unassigned Tamas Cservenak Major Resolved Fixed  
Bug TIKA-1291

Invalid JSON output on CLI

Tim Allison Steffen Major Closed Fixed  
Bug TIKA-1284

TikaException for Microsoft Powerpoint Document [ ppt ]

Unassigned Chetan Laddha Major Resolved Fixed  
Improvement TIKA-1282

Additional Gzip types:

Unassigned Avi Minor Resolved Fixed  
Improvement TIKA-1281

Additional XML type: application/x-xml

Unassigned Avi Minor Resolved Fixed  
Improvement TIKA-1280

GZip now has an official mimetype

Unassigned Nick Burch Major Resolved Fixed  
Bug TIKA-1279

Missing return lines at output of SourceCodeParser

Hong-Thai Nguyen Hong-Thai Nguyen Trivial Resolved Fixed  
Improvement TIKA-1278

Expose PDF Avg Char and Spacing Tolerance Config Params

Ray Gauss II Ray Gauss II Major Resolved Fixed  
Improvement TIKA-1277

Magic bytes from Wikipedia

Jukka Zitting Jukka Zitting Major Resolved Fixed  
Bug TIKA-1275

Upgrade Commons compress to 1.8.1

Unassigned Fabian Lange Major Resolved Fixed  
New Feature TIKA-1274

ENVI header parser

Chris A. Mattmann Ann Burgess Major Resolved Fixed  
Bug TIKA-1272

tika-server version is incorrectly defined

Sergey Beryozkin Lewis John McGibbney Trivial Closed Fixed  
Improvement TIKA-1271

Move TrackingHandler into TikaTest and add a few other helper classes for embedded document tests

Tim Allison Tim Allison Trivial Resolved Fixed  
Improvement TIKA-1270

JAX-RS server should have endpoints which are like the "--list-<>" options to the CLI

Unassigned Nick Burch Major Resolved Fixed  
New Feature TIKA-1268

Extract images from PDF documents

Jukka Zitting Jukka Zitting Major Resolved Fixed  
Improvement TIKA-1265

[patch] Text output for NetCDF

Chris A. Mattmann Ann Burgess Major Resolved Fixed  
Improvement TIKA-1264

Improve PST file detection

Unassigned Luís Filipe Nassif Trivial Resolved Fixed  
Bug TIKA-1263

Atom feed failed to detect

Unassigned Sebastian Nagel Minor Resolved Fixed  
Improvement TIKA-1259

More ogg based mime entries

Unassigned Nick Burch Major Resolved Fixed  
Improvement TIKA-1258

Update NetCDF dependency

Unassigned Konstantin Gribov Major Closed Fixed  
Bug TIKA-1257

MS Word Filter out control characters on ouput

Unassigned Hong-Thai Nguyen Major Resolved Fixed  
Bug TIKA-1251

RuntimeException when parsing word (.doc) documents. Works in Tika 1.4 but not 1.5

Tyler Bui-Palsulich Andreas Critical Resolved Fixed  
Improvement TIKA-1249

Vcard files detection

Unassigned Marco Quaranta Minor Resolved Fixed  
Bug TIKA-1248

CharsetDetector.getReader method doesn't support empty/null declaredEncoding

Kenneth William Krugler Nicolas Gavalda Minor Resolved Fixed  
Improvement TIKA-1244

Better parsing of Mbox files

Hong-Thai Nguyen Luís Filipe Nassif Major Resolved Fixed  
Improvement TIKA-1241

Tika does not recognise empty nor spanning ZIP files magic

Unassigned Tamas Cservenak Minor Resolved Fixed  
Improvement TIKA-1232

Add PDF version to PDFParser output

Tim Allison William Palmer Minor Resolved Fixed  
Bug TIKA-1231

Safely handle null embedded files in PDFs

Tim Allison Tim Allison Minor Closed Fixed  
Improvement TIKA-1225

MDI files detection

Unassigned Marco Quaranta Minor Resolved Fixed  
Improvement TIKA-1223

Extract thumbnail of OOXML Office files

Unassigned Hong-Thai Nguyen Minor Resolved Fixed  
Bug TIKA-1221

XPS detection

Unassigned Marco Quaranta Major Resolved Fixed  
Improvement TIKA-1205

Allow PDFParser to fallback to other parser if there is an exception

Tim Allison Tim Allison Trivial Closed Won't Fix 17/Jan/14
Bug TIKA-1189

Fails to parse PPT file

Unassigned Aimee Dev Major Resolved Fixed  
Bug TIKA-1182

Out of memory exception when parsing TTF file

Unassigned Erik Hetzner Major Resolved Fixed  
Bug TIKA-1175

MS Money files wrongly detected as True Type Font

Unassigned Boris Naguet Minor Resolved Fixed  
Bug TIKA-1158

Wrong info on site for Container Aware Detector

Unassigned Sasa Milenkovic Trivial Resolved Fixed  
Improvement TIKA-1151

Maven Build Should Automatically Produce test-jar Artifacts

Ray Gauss II Ray Gauss II Major Resolved Fixed  
Bug TIKA-1113

Parsing for OGV file results in java.lang.ClassCastException

Unassigned Alexander Chow Major Resolved Fixed  
Bug TIKA-1112

Parsing for OGV file with invalid checksum

Unassigned Alexander Chow Major Resolved Fixed  
Bug TIKA-1093

[OfficeParser] NullPointerException

Unassigned Martin Kalcher Major Resolved Fixed  
Bug TIKA-1050

Charset detection gives wrong results for GB18030 encoding

Tyler Bui-Palsulich Amit Gupta Critical Closed Cannot Reproduce  
Bug TIKA-967

Tika comes with transitive Maven dependency to a test artifact of vorbis-java-core

Unassigned Andreas Hubold Minor Resolved Fixed  
Improvement TIKA-941

Detecting KML / KMZ files

Jukka Zitting Marco Quaranta Minor Resolved Fixed  
Improvement TIKA-674

CompositeParser should indicate which parser was actually selected for parsing

Chris A. Mattmann Andrzej Bialecki Major Resolved Fixed  
New Feature TIKA-623

Add support for Outlook PST

Unassigned Nam-Quang Tran Major Resolved Fixed  
Improvement TIKA-411

Generate list of supported and detected types automatically

Tyler Bui-Palsulich Jukka Zitting Minor Resolved Fixed