ASF JIRA

Tika
1.17
Key descending
1118 of 118 as at: 24/Apr/24 07:05
T Patch Info Key Summary Assignee Reporter P Status Resolution Created Updated Due Development
Improvement TIKA-3029

to extract information from ppt formats along with tables and image content

Unassigned aashika Major Open Unresolved  
Bug TIKA-2723

Issue with parsing .mht container

Unassigned Ghenadie Major Open Unresolved  
Bug TIKA-2521

SAX-based docx/pptx should start a new line before second paragraph within a cell

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2519

Issue parsing multiple CHM files concurrently

Unassigned Eamonn Saunders Blocker Resolved Fixed  
Task TIKA-2516

Upgrade CFX version to > 3.0.13

Unassigned Julian Reschke Major Resolved Fixed  
Improvement TIKA-2512

Add underline and strikethrough to SAX-based docx/pptx parsers

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-2511

Slowness parsing SQLite database file

Unassigned Eamonn Saunders Major Resolved Fixed  
Bug TIKA-2510

Embedded MP3 file in PPTX document no longer identified

Tim Allison Eamonn Saunders Minor Resolved Fixed  
Bug TIKA-2506

Nullpointer in tika-dl test on windows

Bob Paulin Bob Paulin Major Resolved Fixed  
Sub-task TIKA-2504

TIKA-2499 Upgrade or remove plexus-utils

Unassigned Tim Allison Major Resolved Fixed  
Sub-task TIKA-2503

TIKA-2499 Try to upgrade httpclient to >=4.5.3

Tim Allison Tim Allison Major Resolved Fixed  
Sub-task TIKA-2502

TIKA-2499 Upgrade OpenNLP to 1.8.3

Unassigned Tim Allison Major Resolved Fixed  
Sub-task TIKA-2501

TIKA-2499 Upgrade jackson to 2.9.2

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2499

Sonatype Nexus Auditor is reporting that Tika 1.13 is using a number of vulnerable Third party components.

Tim Allison Abhijit Rajwade Blocker Resolved Fixed  
Bug TIKA-2497

Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser

Unassigned Advokat Major Resolved Fixed  
Improvement TIKA-2492

Remove pdfdebugger from tika

Unassigned Tilman Hausherr Minor Closed Fixed  
Bug TIKA-2491

Cannot use TikaConfig

Unassigned Markus Jelsma Trivial Resolved Fixed  
Bug TIKA-2490

Turn off stderr warnings in Tika-app

Tim Allison Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2489

Upgrade to PDFBox 2.0.8

Unassigned Tim Allison Trivial Resolved Fixed  
Task TIKA-2486

Upgrade metadata-extractor to 2.10.1

Unassigned Julian Reschke Blocker Resolved Fixed  
Improvement TIKA-2485

EncodingDetectors markLimits to be configurable

Tim Allison Markus Jelsma Minor Resolved Fixed  
Bug TIKA-2483

Using PackageParser in ForkParser causes NPE

Unassigned TzeKai Lee Major Resolved Fixed  
Bug TIKA-2478

RFC822 includes redundant copies of the text

Tim Allison Robert Letzler Minor Resolved Fixed  
Improvement TIKA-2476

Metadata.toString always returns a trailing space

Sergey Beryozkin Sergey Beryozkin Trivial Resolved Fixed  
Improvement TIKA-2472

Implement Metadata.hashCode

Sergey Beryozkin Sergey Beryozkin Trivial Resolved Fixed  
Bug TIKA-2470

Illegal reflective Access -- more cleanup for Java 9

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-2469

False positives with x-ms-owner detection

Tim Allison Luís Filipe Nassif Minor Resolved Fixed  
Improvement TIKA-2466

Remove JAXB usage

Unassigned Robert Munteanu Major Resolved Fixed  
Improvement TIKA-2465

Add explicit unit tests for xxe

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2464

No PIL found while running the docker image 'InceptionVideoRestDockerfile'

Chris A. Mattmann Aman R Mathur Minor Resolved Fixed  
Bug TIKA-2459

Missing text in .doc file (but can be extracted by POI)

Unassigned Dustin Spicuzza Major Resolved Fixed  
Bug TIKA-2456

Emails extracted from MBOX not detected as rfc822

Unassigned Luís Filipe Nassif Major Resolved Fixed  
Improvement TIKA-2455

Flag in metadata for alternative email bodies

Unassigned Matthew Caruana Galizia Minor Resolved Fixed  
Bug TIKA-2454

Emails extracted from PSTs detected as unexpected file types

Unassigned Matthew Caruana Galizia Major Resolved Fixed  
Improvement TIKA-2451

Detect image frame counts for tiff files

Unassigned Mike Cantrell Minor Resolved Fixed  
Bug TIKA-2450

OfficeParser.parse called for zero-byte file with .doc extension

Unassigned Matthew Caruana Galizia Minor Resolved Fixed  
Improvement TIKA-2449

Enabling extraction of standard references from text

Giuseppe Totaro Giuseppe Totaro Major Resolved Fixed  
Improvement TIKA-2448

Handle phonetic strings in the SAX docx parser

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-2447

PSDParser creates unnecessary large byte array and discards it

Unassigned Jan Burkhardt Critical Resolved Fixed  
Bug TIKA-2445

Windows BAT / CMD detection

Unassigned Nick Burch Major Resolved Fixed  
Bug TIKA-2442

Non-terminal interactive form fields not handled recursively

Unassigned Christopher Creutzig Major Resolved Fixed  
Improvement TIKA-2440

Phonetic strings handling for multilingual environments.

Unassigned Takahiro Ochi Minor Resolved Fixed  
Improvement TIKA-2439

Avoid NullPointerException in org.apache.tika.langdetect.OptimaizeLangDetector if models haven't been loaded

Unassigned Karl-Philipp Richter Major Resolved Fixed  
Bug TIKA-2438

Test failure at OOXMLParserTest.testBigIntegersWGeneralFormat:1350->TikaTest.assertContains:102

Unassigned Karl-Philipp Richter Major Resolved Fixed  
Bug TIKA-2435

docx parser missing content when multiple body sections

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-2433

Tika 1.16 - Nullpointer Exception after update - Asking for help

Unassigned Karl Buchta Major Resolved Fixed  
Improvement TIKA-2431

Upgrade to PDFBox 2.0.7

Tim Allison Tim Allison Minor Resolved Fixed  
Improvement TIKA-2430

Add at least dev test capability to run Tika against fuzzed files

Tim Allison Tim Allison Major Resolved Fixed  
Improvement TIKA-2429

Upgrade to POI 3.17-final when available

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-2428

EMFParser loops forever with corrupted files

Unassigned Luís Filipe Nassif Major Resolved Fixed  
Bug TIKA-2426

Fix locale-dependent test in xlsb unit test

Unassigned Tim Allison Trivial Resolved Fixed  
Sub-task TIKA-2402

TIKA-2398 Support all image formats in Object Recognition REST Parser

Chris A. Mattmann Thejan Wijesinghe Minor Resolved Fixed  
Sub-task TIKA-2400

TIKA-2398 Standardizing current Object Recognition REST parsers

Chris A. Mattmann Thejan Wijesinghe Minor Resolved Fixed  
Wish TIKA-2389

Warn log level is pretty strong for missing JBIG2ImageReader

Unassigned Thomas Mortagne Major Resolved Fixed  
Bug TIKA-2385

Tesseract OCR rotation.py not run

Dave Meikle Peter Weiss Major Resolved Fixed  
Bug TIKA-2369

Define a clean Recogniser interface: for objects from binary data; and for text classification

Chris A. Mattmann Chris A. Mattmann Major Open Unresolved  
Improvement TIKA-2355

Cache trained mode while running ObjectRecognition server from Docker builds

Chris A. Mattmann Madhav Sharan Major Resolved Fixed  
Bug TIKA-2347

Underlined text is not decorated as such when extracting from word documents

Dave Meikle Stuart Hendren Major Closed Fixed  
Improvement TIKA-2346

Allow Office format parsers to exclude parsing shapes

Unassigned Nick Burch Major Reopened Unresolved  
Improvement TIKA-2340

Add explicit deps to tika-parsers which are currently used from transitive scope

Konstantin Gribov Konstantin Gribov Major Open Unresolved  
New Feature TIKA-2332

Output SNOMED codes for CUIs in CTAKES output?

Chris A. Mattmann Dillon Welch Major Resolved Fixed  
Improvement TIKA-2312

[Mp3Parser] expose fields form ID3TagsAndAudio

Unassigned Łukasz Ozimek Trivial Open Unresolved  
Improvement TIKA-2262

Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types

Chris A. Mattmann Thamme Gowda Major Resolved Fixed  
Bug TIKA-2034

Upgrade XMPCore to 5.1.3

Tim Allison Tim Allison Blocker Resolved Fixed  
New Feature TIKA-1988

Age Detection Tika Recogniser

Chris A. Mattmann Madhav Sharan Major Reopened Unresolved  
Bug TIKA-1953

tika-server NullPointerException while processing rtfs

Tim Allison Ravi Major Resolved Fixed  
Bug TIKA-1952

Access Date is getting modified while capturing the MetaData information using AutoDetectParser

Unassigned RameshKalidindi Major Open Unresolved  
Improvement TIKA-1840

No way to link slide notes to slide in PPT output.

Chris A. Mattmann Sam H Major Reopened Unresolved  
Bug TIKA-1829

org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:92) NPE

Tim Allison frank Critical Open Unresolved  
Bug TIKA-1800

MediaType#parse does not decode escaped special characters

Unassigned Roberto Benedetti Major Open Unresolved  
Bug TIKA-1788

message/rfc822 parser doesn't identify attachment filenames from Content-Disposition header

Tim Allison Sergey Tsalkov Major Resolved Fixed  
Bug TIKA-1738

ForkClient does not always delete temporary bootstrap jar

Unassigned Yaniv Kunda Minor Open Unresolved  
New Feature TIKA-1724

Create parser for .obo file format.

Lewis John McGibbney Lewis John McGibbney Major Open Unresolved  
Task TIKA-1705

Update ASM dependency to 5.0.4

Dave Meikle Uwe Schindler Major Reopened Unresolved  
New Feature TIKA-1697

Parser Implementation for AkomaNtoso Legal XML Documents

Lewis John McGibbney Lewis John McGibbney Major Open Unresolved  
Improvement TIKA-1688

Tika Version in Metadata

Unassigned Paul Ramirez Minor Open Unresolved  
New Feature TIKA-1674

Add example to show how to extract embedded files

Unassigned Tim Allison Minor Open Unresolved  
Improvement TIKA-1672

Integrate tika-java7 component

Unassigned Tyler Bui-Palsulich Major Open Unresolved  
Improvement TIKA-1640

Make ExternalParser support aliases for key names in extracted metadata

Chris A. Mattmann Chris A. Mattmann Major Open Unresolved  
New Feature TIKA-1616

Tika Parser for GIBS Metadata

Lewis John McGibbney Lewis John McGibbney Major Open Unresolved  
New Feature TIKA-1609

Leverage Google's LibPhonenumber for enhanced phone number extraction and metadata modeling

Lewis John McGibbney Lewis John McGibbney Major Open Unresolved  
Sub-task TIKA-1607

TIKA-2085 Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

Lewis John McGibbney Lewis John McGibbney Critical Open Unresolved  
New Feature TIKA-1598

Parser Implementation for Streaming Video

Lewis John McGibbney Lewis John McGibbney Major Open Unresolved  
Improvement TIKA-1577

NetCDF Data Extraction

Ann Burgess Ann Burgess Major Open Unresolved  
New Feature TIKA-1540

New Tika plugin for image based feature extraction using computer vision techniques

Lewis John McGibbney Aashish Chaudhary Major Open Unresolved  
New Feature TIKA-1518

Docker with Tika Server

Dave Meikle Paul Ramirez Major Reopened Unresolved  
Bug TIKA-1505

chmparser breaks down when extracting from file of CHM format v3

Unassigned Bin Hawking Major Open Unresolved  
Improvement TIKA-1465

Implement extraction of non-global variables from netCDF3 and netCDF4

Lewis John McGibbney Lewis John McGibbney Major Open Unresolved  
Bug TIKA-1456

Visual Sentiment API parser

Chris A. Mattmann Chris A. Mattmann Major Open Unresolved  
Bug TIKA-1454

Extracting as HTML loses links in xlsx, ppt, and pptx files

Tim Allison Chris Bryant Major Resolved Fixed  
Improvement TIKA-1425

Automatic batching of Microsoft service calls

Lewis John McGibbney Lewis John McGibbney Major Open Unresolved  
Improvement TIKA-1417

Create Extract Embedded Images from PDFs Example

Unassigned Tyler Bui-Palsulich Minor Open Unresolved  
Sub-task TIKA-1395

TIKA-1390 Create embedded image extraction example

Unassigned Tyler Bui-Palsulich Minor Open Unresolved  
Bug TIKA-1390

Create tika-example module

Unassigned Tyler Bui-Palsulich Major Open Unresolved  
Bug TIKA-1379

error in Tika().detect for xml files with xades signature

Unassigned Alessandro De Angelis Major Open Unresolved  
Improvement TIKA-1367

Tika documentation should list tika-parsers parser dependencies

Unassigned Sergey Beryozkin Major Resolved Invalid  
Improvement TIKA-1366

Update some of Tika Server services to support JAX-RS 2.0 AsyncResponse

Unassigned Sergey Beryozkin Minor Open Unresolved  
Sub-task TIKA-1329

TIKA-1390 Add RecursiveParserWrapper aka Jukka's (and Nick's) RecursiveMetadataParser

Unassigned Tim Allison Minor Reopened Unresolved  
New Feature TIKA-1328

Translate Metadata and Content

Unassigned Tyler Bui-Palsulich Major Open Unresolved  
Bug TIKA-1318

Use of Deprecated Word6Extractor.getParagraphText() Method

Unassigned Tyler Bui-Palsulich Minor Open Unresolved  
Improvement TIKA-1308

Support in memory parse mode(don't create temp file): to support run Tika in GAE

Unassigned jefferyyuan Major Open Unresolved  
Bug TIKA-1295

Make some Dublin Core items multi-valued

Tim Allison Tim Allison Minor Open Unresolved  
Bug TIKA-1276

Missing embedded dependencies in tika-bundle

Unassigned Rupert Westenthaler Major Reopened Unresolved  
New Feature TIKA-1220

Parser implementration for IFC files

Lewis John McGibbney Lewis John McGibbney Minor Open Unresolved  
Sub-task TIKA-1208

TIKA-1207 Migrate Any23 mime contributions to Tika

Unassigned Lewis John McGibbney Major Open Unresolved  
Improvement TIKA-1108

Represent individual slides in pptx

Unassigned Daniel Bonniot de Ruisselet Major Open Unresolved  
Improvement TIKA-1059

Better Handling of InterruptedException in ExternalParser and ExternalEmbedder

Unassigned Ray Gauss II Major Open Unresolved  
Improvement TIKA-988

We don't extract a placeholder for a Word document embedded in an Excel document

Unassigned Michael McCandless Major Open Unresolved  
Bug TIKA-987

Embedded drawing (SHAPE MERGEFORMAT) sometimes not extracted

Unassigned Michael McCandless Major Open Unresolved  
Improvement TIKA-985

Support for HTML5 elements

Unassigned Markus Jelsma Major Open Unresolved  
New Feature TIKA-980

MicrodataContentHandler for Apache Tika

Kenneth William Krugler Markus Jelsma Major Open Unresolved  
Improvement TIKA-894

Add webapp mode for Tika Server, simplifies deployment

Unassigned Graham Charters Major Open Unresolved  
Improvement TIKA-891

Use POST in addition to PUT on method calls in tika-server

Chris A. Mattmann Chris A. Mattmann Trivial Open Unresolved  
New Feature TIKA-819

Make Option to Exclude Embedded Files' Text for Text Content

Unassigned Albert L. Major Open Unresolved  
New Feature TIKA-776

ExifTool Embedder

Chris A. Mattmann Ray Gauss II Major Open Unresolved  
New Feature TIKA-774

ExifTool Parser

Chris A. Mattmann Ray Gauss II Major Open Unresolved  
Bug TIKA-715

Some parsers produce non-well-formed XHTML SAX events

Unassigned Michael McCandless Major Open Unresolved  
Improvement TIKA-539

Encoding detection is too biased by encoding in meta tag

Kenneth William Krugler Reinhard Pötz Minor Reopened Unresolved