ASF JIRA

Tika
1.2
Resolved
Reporter ascending, then Priority descending
164 of 64 as at: 19/Apr/24 10:32
T Patch Info Key Summary Assignee Reporter P Status Resolution Created Updated Due Development
Bug TIKA-873

Tika --extract fails for DOC

Unassigned Albert L. Major Resolved Fixed  
Bug TIKA-816

(XLS/XLSX) Improperly formatted date/time in text content.

Unassigned Albert L. Major Resolved Fixed  
New Feature TIKA-847

Add regular expression support to the MagicDetector

Jukka Zitting Andrew Jackson Major Resolved Fixed  
Bug TIKA-900

Tika fails to detect ISO9660 disk images

Jukka Zitting Andrew Jackson Minor Resolved Fixed  
Bug TIKA-863

MailContentHandler should not create AutoDetectParser on each call

Unassigned Andrzej Bialecki Major Resolved Fixed  
Improvement TIKA-561

Support EMLX file detection

Jukka Zitting Antoni Mylka Major Resolved Fixed  
Bug TIKA-945

Upgrade tika-server to CXF 2.6.1

Chris A. Mattmann Chris A. Mattmann Critical Resolved Fixed  
Improvement TIKA-892

Tika does not use the HTML5 meta charset tag when determining charset

Jukka Zitting Chris Jones Major Resolved Fixed  
Bug TIKA-877

Embedded document not extracted (regression)

Maxim Valyanskiy Daniel Bonniot de Ruisselet Blocker Resolved Fixed  
Bug TIKA-939

Windows Media Video file detected as Windows Media Audio

Unassigned Emil Burzo Minor Resolved Fixed  
Bug TIKA-431

Tika currently misuses the HTTP Content-Encoding header, and does not seem to use the charset part of the Content-Type header properly.

Jukka Zitting Erik Hetzner Major Resolved Fixed  
Bug TIKA-923

iWork keynote content on master slides are not being parsed

Michael McCandless Erik Peterson Critical Resolved Fixed  
Bug TIKA-924

iWork number table titles not being parsed

Michael McCandless Erik Peterson Minor Resolved Fixed  
New Feature TIKA-876

Signed pdf parsing

Jukka Zitting Fausto Cruzeiro de Moraes Major Resolved Fixed  
Bug TIKA-910

Text contained in text boxes or shapes in Keynote docs runs together

Michael McCandless Gabriel Valencia Major Resolved Fixed  
Improvement TIKA-907

Comments embedded in Pages documents not supported

Unassigned Gabriel Valencia Major Resolved Fixed  
Improvement TIKA-906

Headers, footers, and footnotes not extracted from Pages documents

Unassigned Gabriel Valencia Major Resolved Fixed  
Improvement TIKA-905

Embedded text boxes and shapes with text not supported

Unassigned Gabriel Valencia Major Resolved Duplicate  
Improvement TIKA-904

Pages documents created in Layout mode not supported

Michael McCandless Gabriel Valencia Major Resolved Fixed  
Bug TIKA-834

server problem only 1st result is correct additional runs include data from 1st run

Jukka Zitting George Kappel Major Resolved Fixed  
New Feature TIKA-901

Provide version number in tika-server

Chris A. Mattmann Ingo Renner Trivial Resolved Fixed  
New Feature TIKA-943

Add parameter to tika-app to supply password for decryption

Jukka Zitting Jan Høydahl Major Resolved Fixed  
Bug TIKA-827

ForkServer fails to report issues if an exception is not properly serializable

Unassigned Jerome Lacoste Major Resolved Fixed  
Improvement TIKA-832

ForkParser is unfriendly to code that prints things to its output

Jukka Zitting Jerome Lacoste Minor Resolved Fixed  
Bug TIKA-935

TikaException thrown when trying to parse archive (*.ar) files

Chris A. Mattmann John Mastarone Major Resolved Fixed  
Bug TIKA-853

java.io.IOException with TikaGUI and testMP4.m4a

Unassigned John Mastarone Major Resolved Fixed  
Improvement TIKA-896

OSGi deployment without declarative services

Jukka Zitting Jörg Ehrlich Major Resolved Fixed  
Improvement TIKA-908

Adding XMP specification part one namespaces and properties

Jukka Zitting Jörg Ehrlich Major Resolved Fixed  
New Feature TIKA-507

Parser for font files

Unassigned Jukka Zitting Major Resolved Fixed  
Improvement TIKA-884

Dynamic loading of Parser and Detector services

Jukka Zitting Jukka Zitting Major Resolved Fixed  
Improvement TIKA-951

Bundle activation policy for Eclipse

Jukka Zitting Jukka Zitting Major Resolved Fixed  
New Feature TIKA-593

Tika network server

Chris A. Mattmann Jukka Zitting Major Resolved Fixed  
Improvement TIKA-932

Upgrade to Commons Compress 1.4.1

Jukka Zitting Jukka Zitting Minor Resolved Fixed  
Improvement TIKA-322

Improve encoding detection speed and accuracy

Jukka Zitting Jukka Zitting Minor Resolved Fixed  
Improvement TIKA-471

Avoid Charset name bottleneck when multiple threads are using HtmlParser

Jukka Zitting Kenneth William Krugler Minor Resolved Fixed  
Improvement TIKA-502

Add programming language mime-types

Jukka Zitting Kenneth William Krugler Minor Resolved Fixed  
Improvement TIKA-941

Detecting KML / KMZ files

Jukka Zitting Marco Quaranta Minor Resolved Fixed  
Improvement TIKA-940

Support detecting 7-zip format

Unassigned Marco Quaranta Minor Resolved Fixed  
Improvement TIKA-883

Extract embedded images in PPT

Maxim Valyanskiy Maxim Valyanskiy Major Resolved Fixed  
Bug TIKA-882

IllegalArgumentException: No part found for relationship

Maxim Valyanskiy Maxim Valyanskiy Minor Resolved Fixed  
New Feature TIKA-931

Tika's PDFParser fails to parse documents embedded in a PDF Package

Jukka Zitting Michael McCandless Major Resolved Fixed  
Improvement TIKA-757

Address TODOs when we upgrade to next POI release (3.8 beta 5)

Unassigned Michael McCandless Major Resolved Fixed  
Bug TIKA-948

Embedded PDF extracted incorrectly as MS Works file from Word 97-2003 doc

Michael McCandless Michael McCandless Minor Resolved Fixed  
Improvement TIKA-929

Consistent, namespaced definitions for office file related metadata

Jukka Zitting Nick Burch Major Resolved Fixed  
Improvement TIKA-890

Improve detection of Android Packages (APK)

Unassigned Nick Burch Major Resolved Fixed  
Bug TIKA-886

OOXMLExtractorFactory can leave files open

Nick Burch Nick Burch Major Resolved Fixed  
Improvement TIKA-949

Mimetype magic needed for mapping formats such as XMind Pro and MindMapper

Nick Burch Nick Burch Major Resolved Fixed  
Improvement TIKA-700

Upgrade to POI 3.8 as available

Nick Burch Nick Burch Minor Resolved Fixed  
Improvement TIKA-747

Ogg Vorbis and FLAC Parsers

Nick Burch Nick Burch Minor Resolved Fixed  
Bug TIKA-875

Temporary file leak in ImageParser

Michael McCandless Niels Beekman Major Resolved Fixed  
Improvement TIKA-874

Identify FITS (Flexible Image Transport System) files

Chris A. Mattmann Peter May Minor Resolved Fixed  
Improvement TIKA-930

Consolidation of Some Tika Core Properties

Unassigned Ray Gauss II Major Resolved Fixed  
Improvement TIKA-927

Composite Properties

Unassigned Ray Gauss II Major Resolved Fixed  
Improvement TIKA-926

Data Typed Metadata.set(...) Value Methods Should Call Metadata.set(Property...)

Unassigned Ray Gauss II Major Resolved Fixed  
Improvement TIKA-925

Remove DublinCore From Metadata and Deprecate String Properties

Unassigned Ray Gauss II Major Resolved Fixed  
Improvement TIKA-842

IPTC Properties Should be Defined Completely and Independently of the Drew Library

Unassigned Ray Gauss II Major Resolved Fixed  
Improvement TIKA-859

DublinCore Metadata Keys Should be Prefixed and Property Objects

Unassigned Ray Gauss II Major Resolved Fixed  
Bug TIKA-947

AbstractMetadataHandler addMetadata Does not Check Property.isMultiValuePermitted

Unassigned Ray Gauss II Major Resolved Fixed  
Bug TIKA-916

NullPointerException processing XPS file

Unassigned Rob Tulloh Major Resolved Fixed  
New Feature TIKA-861

Parse links in PDF

Unassigned Sasha Goodman Minor Resolved Fixed  
Improvement TIKA-870

Allow to use call parseToString with a additional parameter of MaxStringLength, so it can be changed per call

Michael McCandless Shay Banon Major Resolved Fixed  
Improvement TIKA-482

Refactor image and jpeg parsers for access to MetadataExtractor API

Unassigned Staffan Olsson Major Resolved Fixed  
Bug TIKA-913

MagicMime detection of msdos executables does not work

Unassigned Torsten Krah Major Resolved Fixed  
Bug TIKA-355

DublinCore constants should be prefixed with "dc."

Unassigned Vivek Magotra Major Resolved Fixed