ASF JIRA

Tika
1.15
Key descending
1139 of 139 as at: 29/Mar/24 02:11
T Patch Info Key Summary Assignee Reporter P Status Resolution Created Updated Due Development
Bug TIKA-2373

Fix licenses via rat before 1.15 release

Unassigned Tim Allison Blocker Resolved Fixed  
Bug TIKA-2370

Close Font in TrueTypeParser

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-2367

Avoid npe in wmf

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2364

Clean up printstacktrace

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-2361

Upgrade to PDFBox 2.0.6

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-2360

Handle SentimentParser resource failure more robustly

Unassigned Tim Allison Blocker Resolved Fixed  
Improvement TIKA-2358

Avoid bundling dl4j with tika-app and tika-server

Unassigned Tim Allison Major Resolved Fixed  
Improvement TIKA-2357

Allow Tesseract PSM up to 13

Dave Meikle Dave Meikle Minor Resolved Fixed  
Bug TIKA-2356

Temporarily prevent duplication of sheets in some xlsx POI-61034

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-2354

Missing many embedded images in .doc files

Unassigned Tim Allison Blocker Resolved Fixed  
Bug TIKA-2352

Incorrect EOF exception in WordPerfect parser

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2350

Add catch block when opening Action on document open in PDFParser

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2349

Try to match digests when finding equivalent embedded files in tika-eval Compare

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2348

Improve error reporting in wmf/emf

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2345

TikaConfigSerializer should expose EncodingDetector details

Unassigned Nick Burch Major Resolved Fixed  
Improvement TIKA-2343

--text-main in tika-server

Unassigned Nino Skopac Major Resolved Fixed  
Improvement TIKA-2339

Remove test file flagged by anti-virus code

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2334

Upgrade SQLite to 3.16.1

Unassigned Tim Allison Trivial Resolved Fixed  
Sub-task TIKA-2333

TIKA-2330 Upgrade commons-compress to 1.13

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-2331

Upgrade RTFParser to allow configuration of max bytes per embedded object

Unassigned Tim Allison Major Resolved Fixed  
Improvement TIKA-2330

Prevent preventable OOM in CompressorInputStream

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-2329

Upgrade to POI 3.16-final

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-2325

Allow specification of default lang for common words

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2323

Improve commandline parameterization of thresholds

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2322

Video labeling using existing ObjectRecognition

Chris A. Mattmann Madhav Sharan Major Resolved Fixed  
Bug TIKA-2311

Preserve "x-tika-ooxml" mime value for truncated ooxml files

Unassigned Tim Allison Major Resolved Fixed  
Improvement TIKA-2309

New Detector and Parser classes for Time Stamped Data Envelope file format

Unassigned Fabio Minor Resolved Fixed  
Bug TIKA-2307

Accidentally swallowing UnsupportedZipFeatureException in rare cases

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2305

REST api documentation can't be viewed on the website because your MireDot license has expired

Konstantin Gribov Laszlo Marai Major Closed Fixed  
Bug TIKA-2300

Can't tell if a zip file is encrypted

Tim Allison Aeham Abushwashi Major Resolved Fixed  
Improvement TIKA-2297

Add Lingo24 Language Detector

Dave Meikle Dave Meikle Major Resolved Fixed  
Bug TIKA-2295

Image not extracted via -z or -J in ODT

Tim Allison Tim Allison Minor Resolved Fixed  
Task TIKA-2292

Update CXF version to 3.0.12

Dave Meikle Sergey Beryozkin Minor Resolved Fixed  
Bug TIKA-2291

REST API documentation is down

Lewis John McGibbney Mike Liu Major Closed Fixed  
Bug TIKA-2290

PDFParser 'ocr' properties cannot be set via headers when using Tika JAXRS

Tim Allison Kevin Oberlag Major Resolved Fixed  
Improvement TIKA-2287

Allow general jdbc connectivity for tika-eval

Unassigned Tim Allison Major Resolved Fixed  
Improvement TIKA-2286

Add parameterization for image quality when rendering PDF page for OCR

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2281

Let's extract the MAPI subtype (NOTE, STICKY, etc.) for msg files

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2279

Simplify token counting in tika-eval

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2277

Remove ParseContext field from AbstractParser

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2276

Try to be more parsimonious creating TikaConfigs and ParseContexts

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-2275

EmbeddedDocumentUtil should check parseContext for a TikaConfig

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2269

NPE with FeedParser

Unassigned Julien Nioche Major Closed Fixed  
Improvement TIKA-2267

Add common tokens files for tika-eval

Tim Allison Tim Allison Minor Resolved Fixed  
Improvement TIKA-2255

Test files for SAS mimetypes

Unassigned Nick Burch Major Resolved Fixed  
Task TIKA-2253

Obtain new Miredot license key and upgrade plugin version in tika-server

Lewis John McGibbney Lewis John McGibbney Minor Closed Fixed  
Bug TIKA-2250

Remove the x- prefix for some Microsoft image format mimetypes, eg BMP

Unassigned Nick Burch Major Resolved Fixed  
Improvement TIKA-2247

Extract text from WMF/EMF files

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-2246

Extract files embedded within EMF files

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-2245

Standardise logging

Konstantin Gribov Matthew Caruana Galizia Major Resolved Fixed  
Bug TIKA-2244

excessive memory usage when parsing a large nested package file

Unassigned Joshua Hight Minor Resolved Fixed  
Bug TIKA-2242

opendocument parsing produces malformed xml

Tim Allison Jan Van Raemdonck Major Resolved Fixed  
Improvement TIKA-2240

MS Write File

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2238

Add mime detection for embedded MSEquation files

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2237

UnsupportedOperationException due to SingletonList.set in ProbabilisticMimeDetectionSelector

Unassigned Jasper Hafkenscheid Major Resolved Fixed  
Improvement TIKA-2236

Upgrade to PDFBox 2.0.5 when available

Tim Allison Tim Allison Minor Resolved Fixed  
Improvement TIKA-2235

Use Tesseract's recommended DPI for PDF images

Unassigned Matthew Caruana Galizia Minor Resolved Fixed  
Improvement TIKA-2234

Remove ThreadLocal from dateformat

Unassigned Tim Allison Trivial Resolved Fixed  
New Feature TIKA-2232

Add JBIG2 image parsing support

Tim Allison Pascal Essiembre Minor Resolved Fixed  
Bug TIKA-2231

Invalid language code exception

Unassigned Peter Weiss Minor Resolved Fixed  
Improvement TIKA-2230

Add paragraph markup to WordPerfect parser(s)

Tim Allison Tim Allison Trivial Resolved Fixed  
Bug TIKA-2229

NullPointerException at org.apache.tika.parser.microsoft.ooxml.XWPFListManager.getFormattedNumber(XWPFListManager.java:64)

Unassigned Jorge Spinsanti Major Resolved Fixed  
Improvement TIKA-2228

WordPerfect parser update to support 5.x

Unassigned Pascal Essiembre Minor Resolved Fixed  
Improvement TIKA-2226

Add UnsupportedFormatException (extends TikaException)

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-2223

Extra ß characters in some WordPerfect files

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-2221

poi.EncryptedDocumentException not wrapped in tika.exception.EncryptedDocumentException

Unassigned Matthew Caruana Galizia Minor Resolved Fixed  
Improvement TIKA-2220

Refactor/merge new experimental docx/pptx components

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2219

CharsetDetector no longer detects windows-1252 charset

Unassigned Pascal Essiembre Minor Resolved Fixed  
Improvement TIKA-2218

Add a few more places where PPTX relationships might include an attachment

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-2215

TikaException about "Invalid embedded resource" on a valid PPT file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2212

Update mimes for OOXMLParser

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2211

ePub formatting instructions appear in plain text output

Unassigned Adam Carroll Major Resolved Fixed  
Improvement TIKA-2210

Add experimental SAX/Streaming XSLF/pptx extractor

Tim Allison Tim Allison Minor Resolved Fixed  
Bug TIKA-2209

Update PDFBox to 2.0.4

Konstantin Gribov Konstantin Gribov Trivial Closed Fixed  
Improvement TIKA-2208

Catch missing libraires

Unassigned David Pilato Major Resolved Fixed  
Bug TIKA-2207

ArrayIndexOutOfBoundsException on a valid Excel file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2204

IndexOutOfBoundsException on a valid Powerpoint file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2202

StringIndexOutOfBoundsException on a valid Word document

Unassigned Seva Alekseyev Major Resolved Cannot Reproduce  
Bug TIKA-2198

NullPointerException on a valid Word file

Unassigned Seva Alekseyev Major Resolved Fixed  
Improvement TIKA-2195

Consolidate MockParser's service loading file and custom-mimetype entry into tika-core's tests jar

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2193

java.io.NotSerializableException while using ForkParser

Unassigned Michal Hlavac Major Closed Duplicate  
Improvement TIKA-2192

Extract embedded files from headers, footers, footnotes, etc from docx/m

Unassigned Tim Allison Major Resolved Fixed  
Improvement TIKA-2191

Apply current .docx unit tests to experimental SAX parser and fix or document as necessary

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-2190

Add "preserve_interword_spaces" option of tesseract

Tim Allison Bipul Kumar Major Resolved Fixed  
Bug TIKA-2189

Default value mismatch for "enableImageProcessing" in TesseractOCRConfig.properties and TesseractOCRConfig.java

Unassigned Bipul Kumar Minor Resolved Fixed  
Improvement TIKA-2187

Align default behavior of experimental docx parser with that of doc parser in handling delText

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-2181

Upgrade to POI 3.16-beta2 when available

Tim Allison Tim Allison Minor Resolved Fixed  
Bug TIKA-2179

WordMLParser fails to parse a word xml file

Tim Allison Sean Story Minor Resolved Fixed  
Bug TIKA-2175

Enable extraction of inlined jp2/jpx from PDF

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-2174

Too few formats in support declared by TesseractOCRParser

Unassigned Matthew Caruana Galizia Major Resolved Fixed  
Improvement TIKA-2171

Upgrade SQLite to 3.15.1

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2170

Tika 1.13 ForkParser fails intermittently with very large MS Word docx

Unassigned Tim Kingsbury Major Resolved Fixed  
Bug TIKA-2169

Fix xhtml in combination OCR+metadata extraction from images

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-2167

Image processing causes OCR to fail

Unassigned Matthew Caruana Galizia Critical Resolved Fixed  
Bug TIKA-2166

TaggedIOException from a ZipException on a valid Word file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2164

HSLFException from ZipException "invalid stored block lengths" on a valid Powerpoint file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2162

"Unknown compression method" on a Powerpoint file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2161

EOFException on a valid Powerpoint file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2160

POIXMLException from NullPointerException on a valid Word file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2159

Handle pre-parse embedded object exceptions uniformly and more robustly

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-2158

NullPointerException on a valid Word file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2155

IndexOutOfBoundsException on a valid Excel file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2153

TaggedIOException on a valid Powerpoint file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2152

NullPointerException on a valid Word file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2145

InvalidFormatException on a valid Word file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2142

ArrayIndexOutOfBoundsException

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2137

NullPointerException on a valid Word file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2136

External file links in PPTX misparsed

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2134

Different NullPointerException on a valid Excel file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2132

NullPointerException on a valid Excel file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2130

TaggedIOException from ZipException on a valid PowerPoint file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2129

IllegalArgumentException/"Unknown shape type" on a valid Powerpoint file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2127

NullPointerException on a valid PPTX

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2125

XmlValueOutOfRangeException on a good Word document

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2118

Misleading exception on a password protected XLS

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2117

NullPointerException on PDF (fixed in PDFBox)

Unassigned Seva Alekseyev Major Resolved Fixed  
Improvement TIKA-2116

Upgrade to POI 3.16-beta1 when available

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-2115

OOM caused by corrupt embedded OLE object

Unassigned Thomas Galla Major Resolved Fixed  
Bug TIKA-2111

Executable Parser adds Content-Type instead of setting

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2109

OutOfMemory when parsing 5MB word document

Unassigned Julian Major Resolved Not A Bug  
Bug TIKA-2104

Upgrade to a version of POI that fixes common bugs in macro extraction, when available

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-2099

Tar files without magic bytes are sporadically detected as text

Tim Allison Robin Schimpf Major Resolved Fixed  
Improvement TIKA-2096

Supply AutoDetectParser for embedded documents if user forgets to pass it in via ParseContext

Unassigned Tim Allison Major Resolved Fixed  
Improvement TIKA-2090

Extract javascript from PDActions in PDFs

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-2056

Installing exiftool causes ForkParserIntegration test errors

Konstantin Gribov Chris A. Mattmann Major Resolved Fixed  
New Feature TIKA-2016

A parser that combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.

Chris A. Mattmann Anastasija Mensikova Major Resolved Fixed  
Improvement TIKA-1946

Add mime detection and parser for WordPerfect

Unassigned Nick C Major Resolved Fixed  
Improvement TIKA-1879

Extract recipient information in MSG files with more granularity

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-1865

Save sender email address in Outlook MSG metadata

Unassigned Luís Filipe Nassif Major Resolved Fixed  
Bug TIKA-1822

NullPointerException when parsing a .doc file

Tim Allison Panagiotis Mpailis Major Resolved Fixed  
Bug TIKA-1815

Text content from parser is empty when NamedEntityParser is enabled

Chris A. Mattmann Thamme Gowda Major Resolved Fixed  
Bug TIKA-1658

unable to parse microsoft visio files with tika

Unassigned senthil Major Resolved Fixed  
Bug TIKA-1631

OutOfMemoryException in ZipContainerDetector

Unassigned Pavel Micka Major Resolved Fixed  
Improvement TIKA-1508

Add uniformity to parser parameter configuration

Chris A. Mattmann Tim Allison Major Resolved Fixed  
New Feature TIKA-1343

Create a Tika Translator implementation that uses JoshuaDecoder

Lewis John McGibbney Chris A. Mattmann Major Resolved Fixed  
Sub-task TIKA-1332

TIKA-1302 Create tika-eval module

Tim Allison Tim Allison Major Resolved Fixed  
New Feature TIKA-1321

Add experimental SAX/Streaming XWPF/docx extractor

Tim Allison Tim Allison Minor Resolved Fixed  
Improvement TIKA-1195

XLSB support

Unassigned Frederic Ronny Major Resolved Fixed  
Improvement TIKA-456

Support timeouts for parsers

Tim Allison Kenneth William Krugler Major Resolved Fixed