ASF JIRA

Tika
2.0.0
Key descending
1416 of 416 as at: 29/Mar/24 01:10
T Patch Info Key Summary Assignee Reporter P Status Resolution Created Updated Due Development
Task TIKA-3478

Extract "desc" metadata field from AppleUserBox in MP4

Tim Allison Tim Allison Trivial Resolved Fixed  
Task TIKA-3477

Fix new closed channel exception in MSOffice files in 2.x

Unassigned Tim Allison Major Resolved Fixed  
Task TIKA-3476

Remove tag reports from default tika-eval reports

Unassigned Tim Allison Major Resolved Fixed  
Task TIKA-3475

General upgrades for 2.0.0

Unassigned Tim Allison Trivial Resolved Fixed  
Task TIKA-3474

tika-eval in 2.x should handle the exception key from 1.x

Tim Allison Tim Allison Trivial Resolved Fixed  
Task TIKA-3473

Upgrade OpenSearch -- 1.0 GA is now available

Unassigned Tim Allison Trivial Resolved Fixed  
Task TIKA-3472

SimpleDateFormat is not threadsafe

Unassigned Tim Allison Trivial Resolved Fixed  
Task TIKA-3470

Push jpeg2000 warning to trigger only when necessary

Tim Allison Tim Allison Trivial Resolved Fixed  
Task TIKA-3469

Consume bytes until 'ready' ping to forked pipes processor

Tim Allison Tim Allison Trivial Resolved Fixed  
Task TIKA-3467

Clean up poms in main

Unassigned Tim Allison Major Resolved Fixed  
Task TIKA-3463

Add FileListIterator as a pipes-iterator

Tim Allison Tim Allison Trivial Resolved Fixed  
Task TIKA-3462

Clean up module names

Unassigned Tim Allison Minor Resolved Fixed  
Task TIKA-3461

Create sub modules in tika-pipes-integration tests

Tim Allison Tim Allison Major Resolved Fixed  
Task TIKA-3449

Remove sannies mp4 isoparser from Tika 2.x

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-3441

tika server stuck in loop trying to bind

Unassigned Cristian Zamfir Major Resolved Fixed  
New Feature TIKA-3440

Add emitter for OpenSearch

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-3436

Add multi-release for 2.x

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-3435

Allow fetchers only when enableUnsecureFeatures is true in tika-server 2.x

Tim Allison Tim Allison Minor Resolved Fixed  
Bug TIKA-3434

Document removal of urlenabledinputstream in 2.x

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-3430

Create release subdirectories for different versions

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-3424

tika-app in 2.x should log to stderr

Tim Allison Tim Allison Major Resolved Fixed  
Task TIKA-3413

Avoid ZipBomb detection in bookmark text extraction in PDFs

Unassigned Tim Allison Major Resolved Fixed  
Task TIKA-3410

Clean up logging in PipesServer

Tim Allison Tim Allison Minor Resolved Fixed  
Task TIKA-3406

Add timeout on the client side of async processor

Tim Allison Tim Allison Major Resolved Fixed  
Improvement TIKA-3403

Create example for Transcription

Lewis John McGibbney Lewis John McGibbney Major Resolved Fixed  
Improvement TIKA-3402

Remove Redundant Local Variables

Unassigned Furkan Kamaci Minor Resolved Fixed  
Improvement TIKA-3401

Remove Pointless Bitwise Expressions

Unassigned Furkan Kamaci Minor Resolved Fixed  
Bug TIKA-3399

Fix Non-Atomic Operations on Volatile Fields

Unassigned Furkan Kamaci Major Resolved Fixed  
Improvement TIKA-3398

Tidy Up Code for Performance Improvements

Unassigned Furkan Kamaci Major Resolved Fixed  
Task TIKA-3396

Rename parser modules in 2.0

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-3395

Make Inner Classes Static If Possible to Prevent Memory Leaks

Unassigned Furkan Kamaci Major Resolved Fixed  
Task TIKA-3393

Refactor metadata filters to use new ConfigBase in 2.x

Tim Allison Tim Allison Minor Resolved Fixed  
Task TIKA-3391

Refactor fetchiterators to pipesinterators in 2.x, clean up pipesiteratormanager

Tim Allison Tim Allison Major Resolved Fixed  
Improvement TIKA-3390

Migrate Language Level to Java 8

Unassigned Furkan Kamaci Minor Resolved Fixed  
Bug TIKA-3389

Close Open Resources

Unassigned Furkan Kamaci Major Resolved Fixed  
Task TIKA-3386

Add "times" to MockParser

Tim Allison Tim Allison Trivial Resolved Fixed  
Task TIKA-3382

Improve writelimitreached handling

Tim Allison Tim Allison Major Resolved Fixed  
Task TIKA-3378

Move tika-langdetect-commons to tika-langdetect-test-commons in 2.x

Unassigned Tim Allison Major Resolved Fixed  
Task TIKA-3377

Remove pipes components from TikaConfig in 2.x

Tim Allison Tim Allison Major Resolved Fixed  
Task TIKA-3372

Fix writelimit in recursiveparserhandler

Tim Allison Tim Allison Major Resolved Fixed  
Improvement TIKA-3362

AsyncParser and EmitterResource have handler type hardcoded to text

Tim Allison Giovanni De Stefano Major Closed Fixed  
Task TIKA-3359

Extract swf from PDFs

Unassigned Tim Allison Major Resolved Fixed  
Task TIKA-3343

Move Tika's legacy lang id to its own submodule for Tika 2.0

Unassigned Tim Allison Major Resolved Fixed  
Improvement TIKA-3340

LanguageProfile for Myanmar

Unassigned Arky Major Resolved Fixed  
Improvement TIKA-3329

RTG Translator with many-to-eng translation

Chris Mattmann Thamme Gowda Major Resolved Fixed  
Bug TIKA-3318

MP3 parser using wrong xmpDM:duration units (which aren't clearly documented)

Nick Burch Nick Burch Minor Resolved Fixed  
Improvement TIKA-3313

Improve performance and usability of RereadableInputStream

Unassigned Peter Kronenberg Major Resolved Fixed  
Improvement TIKA-3311

Add github workflows to Tika

Lewis John McGibbney Lewis John McGibbney Major Resolved Fixed  
Improvement TIKA-3310

MP4 video detected as application/mp4

Unassigned Peter Kronenberg Major Resolved Fixed  
Task TIKA-3301

Simplify forking/monitoring in tika-server for 2.x

Unassigned Tim Allison Minor Resolved Fixed  
Task TIKA-3298

Add a "preloadLangs" parameter to TesseractOCRParser

Unassigned Tim Allison Minor Resolved Fixed  
Task TIKA-3297

Simplify parser configuration in 2.x

Tim Allison Tim Allison Major Resolved Fixed  
Task TIKA-3292

Remove GSON where possible in 2.x

Tim Allison Tim Allison Minor Resolved Fixed  
Task TIKA-3287

Add http fetcher

Tim Allison Tim Allison Major Resolved Fixed  
Improvement TIKA-3286

Tika does not issue an error when language file doesn't exist; not supporting script files

Unassigned Peter Kronenberg Major Resolved Fixed  
Task TIKA-3283

Add an s3 emitter to tika-pipes

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-3280

server-core not bundled w server-classic in 2.0.0-ALPHA

Tim Allison Tim Allison Blocker Resolved Fixed  
Improvement TIKA-3273

Further metadata cleanup for TIka 2.0.0

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-3271

Change default image resize size in TesseractParser's pre-processing step

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-3267

Method getEnableImageProcessing() in TesseractOCRConfig should be renamed

Tim Allison Peter Kronenberg Minor Resolved Fixed  
Improvement TIKA-3266

Generalize OCRParser so that users can service load custom ocr parsers

Unassigned Tim Allison Major Resolved Fixed  
Improvement TIKA-3259

Improve logging for TesseractOCRParser

Tim Allison Tim Allison Minor Resolved Fixed  
Improvement TIKA-3258

Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

Tim Allison Tim Allison Major Resolved Fixed  
Task TIKA-3256

Update maven and maven min version

Tilman Hausherr Tilman Hausherr Minor Resolved Fixed  
Bug TIKA-3255

Parsing MP3 file with record size > 100000 fails

Unassigned Peter Kronenberg Major Resolved Fixed  
Improvement TIKA-3253

improve "attachments" tika-eval report directory

Unassigned Tilman Hausherr Minor Resolved Fixed  
Bug TIKA-3248

ClassCastException: class PDSimpleFileSpecification cannot be cast to PDComplexFileSpecification

Tilman Hausherr Tilman Hausherr Major Resolved Fixed  
Task TIKA-3247

Make spawnChild default mode for tika-server in 2.0

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-3246

IllegalArgumentException when generation of appearances fails

Tilman Hausherr Tilman Hausherr Major Resolved Fixed  
Task TIKA-3244

General upgrades for 1.26

Unassigned Tilman Hausherr Major Resolved Fixed  
Task TIKA-3242

Allow users to send arbitrary metadata to tika-server per document

Unassigned Tim Allison Minor Resolved Fixed  
Task TIKA-3240

Modularize tika-eval into core and app for 2.0.0

Tim Allison Tim Allison Minor Resolved Fixed  
Improvement TIKA-3237

Great optimization in ForkParser

Luís Filipe Nassif Luís Filipe Nassif Major Resolved Fixed  
New Feature TIKA-3226

Add custom connector endpoint

Tim Allison Nicholas DiPiazza Major Resolved Fixed  
Bug TIKA-3218

Wrong comment for method sortLoadedClasses in ServiceLoaderUtils

Unassigned Peter Lee Minor Resolved Fixed  
Task TIKA-3199

Improve fuzzing of PDF streams

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-3196

PackageParser should attempt to parse entries from zip files with STORED entries with data descriptor

Unassigned Trevor Bentley Major Resolved Fixed  
Task TIKA-3193

Add mime detection for avif

Tim Allison Tim Allison Major Resolved Fixed  
Task TIKA-3192

TIKA 2.0.0 -- after the dust has settled, rat-check

Tim Allison Tim Allison Major Resolved Fixed  
Task TIKA-3190

Tika 2.0.0 -- move tika-eval's language detector into a langdetect submodule

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-3185

tika-parsers-integration-test fails on windows with File being used by another process.

Bob Paulin Bob Paulin Minor Resolved Fixed  
Improvement TIKA-3180

Tika 2.0.0 -- Modularize tika-server

Unassigned Tim Allison Major Resolved Fixed  
Improvement TIKA-3179

Tika 2.0.0 -- Clean up parser module hierarchy

Tim Allison Tim Allison Critical Resolved Fixed  
Improvement TIKA-3178

Tika 2.0.0 -- Add back OSGi bundles for Tika parsers

Unassigned Tim Allison Blocker Resolved Fixed  
Improvement TIKA-3176

Tika 2.0.0 -- Modularize language detectors

Tim Allison Tim Allison Blocker Resolved Fixed  
Bug TIKA-3166

Actually maven-modularize the packages for 2.0

Unassigned Tim Allison Major Resolved Fixed  
Task TIKA-3093

Enable tika-server to forward parse results to another endpoint

Tim Allison Tim Allison Major Resolved Fixed  
New Feature TIKA-3025

增加一个新的pjepg parser

Unassigned Shadow Liao Trivial Closed Incomplete  
Bug TIKA-3004

OutlookPSTParser missing emails attached to other emails

Luís Filipe Nassif Luís Filipe Nassif Major Resolved Fixed  
Improvement TIKA-2972

Allow users to specify a list/map of ContentHandlerFactories in tika-config.xml

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-2959

TabularFormatsTest test fails in Germany

Unassigned Tilman Hausherr Minor Resolved Fixed  
Bug TIKA-2958

XmlBeanDefinitionStoreException with SpringExample

Unassigned Tilman Hausherr Trivial Resolved Fixed  
Bug TIKA-2949

Update Jackson to 2.9.10

Unassigned Colm O hEigeartaigh Major Resolved Duplicate  
Improvement TIKA-2944

TikaConfig should support the parameters without XML type attribute

Sergey Beryozkin Sergey Beryozkin Major Resolved Fixed  
Improvement TIKA-2943

Modularize tika-parsers

Sergey Beryozkin Sergey Beryozkin Critical Resolved Fixed  
Bug TIKA-2892

ForkParser deadlock when InputStreamResource catches/returns IOException

Luís Filipe Nassif Luís Filipe Nassif Major Resolved Fixed  
Task TIKA-2841

Improve robustness of parsers of zip-based files on truncated files

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-2838

RTF document processing glues comment fields together with text without whitespace

Tim Allison Karl Wright Major Resolved Fixed  
Task TIKA-2827

Improve tika-eval comparison reports to include mime types in A and B for diffs

Tim Allison Tim Allison Major Resolved Fixed  
Task TIKA-2826

Add a csv/tsv parser

Tim Allison Tim Allison Minor Resolved Fixed  
Bug TIKA-2816

Error when sending request to /tika with header X-Tika-OCRMinFileSizeToOcr

Tim Allison Anssi Törmä Major Resolved Fixed  
Improvement TIKA-2810

Back off to tagsoup when xml parser fails on Tika xhtml in tika-eval

Tim Allison Tim Allison Major Resolved Fixed  
Improvement TIKA-2809

Add reports for structure tags to tika-eval

Tim Allison Tim Allison Minor Resolved Fixed  
Bug TIKA-2807

.docx text extract leaves out rich text content-control inside of a text box

Tim Allison Claudia Mickiewicz Critical Resolved Fixed  
Improvement TIKA-2800

Include num of unique common/alphabetic tokens (types) in tika-eval

Tim Allison Tim Allison Major Resolved Fixed  
Improvement TIKA-2799

Consider reverting jackcess

Tim Allison Tim Allison Major Resolved Fixed  
Improvement TIKA-2798

Consider reverting junrar

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-2795

Error starting Tika 2.0 server with -spawnChild on Ubuntu

Tim Allison Mario Bisonti Major Resolved Fixed  
Improvement TIKA-2791

Add structure tags to tika-eval

Tim Allison Tim Allison Major Resolved Fixed  
Task TIKA-2788

Upgrade to PDFBox 2.0.13 when available

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-2787

Make WriteLimitReachedException public and not subclass of SAXException

Unassigned Dmitry Goldenberg Major Resolved Fixed  
Task TIKA-2785

Switch parent/child IPC to mmap file from stdout/stderr in tika-server

Tim Allison Tim Allison Major Resolved Fixed  
Task TIKA-2784

Add static grabbing of stdout/err to MockParser

Tim Allison Tim Allison Trivial Resolved Fixed  
Task TIKA-2782

Protect IPC via stdout in child process in tika-server in -spawnChild mode

Tim Allison Tim Allison Blocker Resolved Fixed  
Bug TIKA-2780

Intermittent failures in batch mode when STDIN = /tmp/null

Tim Allison Jeroen Major Resolved Fixed  
Task TIKA-2779

Integrate/parameterize new rotated text handling in PDFBox

Tim Allison Tim Allison Major Resolved Fixed  
Task TIKA-2778

Upgrade jaxb-runtime and javax.activation

Tim Allison Hans Brende Major Resolved Fixed  
Task TIKA-2777

Unbounded regex in Optimaize can lead to really, really slow processing

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-2776

Tika server child restart

Tim Allison Mario Bisonti Blocker Resolved Fixed  
Task TIKA-2773

Upgrade Sqlite to 3.25.2

Tim Allison Tim Allison Minor Resolved Fixed  
Improvement TIKA-2770

Convert EnviHeader "map info" from UTM to LatLon

Lewis John McGibbney Kristen Cheung Major Resolved Fixed  
Improvement TIKA-2765

Regression extracting text from corrupted docx files

Tim Allison Luís Filipe Nassif Minor Resolved Fixed  
Task TIKA-2764

Allow configuration to include/not deleted text in WordPerfect 6.x files

Tim Allison Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2762

Capture short fields (<150 chars) in EnviParserHeader Metadata

Lewis John McGibbney Lewis John McGibbney Major Resolved Fixed  
Bug TIKA-2761

XML Structured Text Is Missing Metadata Fields for mp3 files

Tim Allison Nick Sincaglia Minor Resolved Fixed  
Bug TIKA-2759

ScriptsExtractor incorrectly reports Javascript to characters() in SAX ContentHandler

Tim Allison Markus Jelsma Major Resolved Fixed  
Improvement TIKA-2756

Switch to commons-lang 3

Tim Allison Robert Munteanu Major Resolved Fixed  
Task TIKA-2754

Log file name in tika-server on exception/error

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-2753

ChildProcess does not use the JAVA_HOME

Tim Allison Julien Massiera Critical Resolved Fixed  
Task TIKA-2751

Upgrade to POI 4.0.1 when available

Tim Allison Tim Allison Major Resolved Fixed  
Task TIKA-2748

trivial tika-server bug w -maxFiles in new -spawnChild mode

Unassigned Tim Allison Trivial Resolved Fixed  
Task TIKA-2745

Upgrade to PDFBox 2.0.12 when available

Tim Allison Tim Allison Major Resolved Fixed  
Improvement TIKA-2743

Replace com.sun.xml.bind:jaxb-impl and jaxb-core by org.glassfish.jaxb:jaxb-runtime and jaxb-core

Tim Allison Thomas Mortagne Major Resolved Fixed  
Bug TIKA-2742

Tika 1.19 trigger a dependency on slf4j-log4j12

Tim Allison Thomas Mortagne Major Resolved Fixed  
Task TIKA-2739

ForkParser child processes should be headless

Tim Allison Tim Allison Major Resolved Fixed  
Task TIKA-2738

tika-app's -f (ForkParser) option isn't working

Tim Allison Tim Allison Major Resolved Fixed  
Task TIKA-2736

improve tika-eval comparison reports to more clearly flag major regressions

Tim Allison Tim Allison Major Resolved Fixed  
Task TIKA-2732

Allow configuration of XMLReaderUtils via TikaConfig

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-2730

parseToString fails for a simple mp3

Tim Allison Boris Petrov Major Resolved Fixed  
Task TIKA-2729

add -Djava.awt.headless=true to child process in tika-server

Tim Allison Tim Allison Trivial Resolved Fixed  
Bug TIKA-2727

Parsing and detect mime type of XML file stuck in infinite loop

Tim Allison Slava G Blocker Resolved Fixed  
Task TIKA-2726

Handle truncated ooxml more robustly

Tim Allison Tim Allison Major Resolved Duplicate  
Task TIKA-2725

Make tika-server robust against ooms/infinite loops/memory leaks

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-2721

Exclude Spring (transitive dependency) from tika-parsers

Konstantin Gribov Konstantin Gribov Minor Closed Fixed  
Bug TIKA-2716

Sonatype Nexus auditor is reporting that spring framework vesrion used by Tika 1.18 is vulnerable

Konstantin Gribov Abhijit Rajwade Major Closed Won't Fix  
Task TIKA-2707

Upgrade to commons-compress 1.18

Unassigned Tim Allison Major Resolved Fixed  
Task TIKA-2706

Store exceptions from VBAMacroReader as we do other embedded exceptions

Unassigned Tim Allison Trivial Resolved Fixed  
Task TIKA-2705

Allow configuration of TesseractOCRParser as we do for other parsers

Tim Allison Tim Allison Major Resolved Fixed  
Task TIKA-2704

MPEGStream should throw an EOF if appropriate in skipFrame

Unassigned Tim Allison Trivial Resolved Fixed  
Task TIKA-2695

Upgrade Lucene in tika-eval and tika-example

Tim Allison Tim Allison Trivial Resolved Fixed  
Bug TIKA-2693

Tika 1.17 uses the wrong classloader for reflection

Unassigned Karl Wright Major Resolved Fixed  
Task TIKA-2692

Blanket upgrades in prep for 1.19

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-2691

Can't create a RPM

Tim Allison Celpan Valeria Major Resolved Fixed  
Improvement TIKA-2690

Exclude commons-logging & commons-logging-api from uimafit-core

Unassigned Hans Brende Major Resolved Fixed  
Bug TIKA-2688

MBOX not recognized when unknown X-headers are present

Tim Allison Yury Kats Major Resolved Fixed  
Task TIKA-2687

Avoid potential to overwrite attachments

Tim Allison Tim Allison Trivial Resolved Fixed  
Bug TIKA-2686

pdfbox fontbox 2.0.8 has security vulnerability CVE-2018-8036 and should be upgraded to 2.0.11

Unassigned Abhijit Rajwade Major Resolved Duplicate  
Task TIKA-2682

Upgrade jempbox to 1.8.15

Tim Allison Tim Allison Trivial Resolved Fixed  
Bug TIKA-2681

Upgrade to PDFBox 2.0.11

Konstantin Gribov Konstantin Gribov Major Closed Fixed  
Bug TIKA-2677

ConcurrentModificationException in org.apache.tika.mime.MediaTypeRegistry.getAliases

Tim Allison Yuriy Koval Major Resolved Fixed  
Bug TIKA-2675

OpenDocumentParser should fail on invalid zip files

Tim Allison Sebastian Nagel Major Resolved Fixed  
Bug TIKA-2673

HtmlEncodingDetector doesn't follow the specification

Tim Allison Gerard Bouchar Major Resolved Fixed  
Task TIKA-2672

Upgrade dl4j to 1.0.0-beta2

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-2669

Tika JAX-RS PDF parser option / custom config issue

Tim Allison Annie Didier Major Resolved Fixed  
Task TIKA-2668

Fix 'can't overwrite cause' exception in TaggedSAXException in Java 11-ea

Unassigned Tim Allison Major Resolved Fixed  
Task TIKA-2667

Upgrade jmatio to 1.4

Unassigned Tim Allison Trivial Resolved Fixed  
Task TIKA-2664

Upgrade junrar to 1.0.1

Unassigned Tim Allison Trivial Resolved Fixed  
Task TIKA-2662

Add a streaming out option for the Json serialization

Unassigned Tim Allison Minor Resolved Fixed  
Task TIKA-2661

Upgrade commons-compress to 1.17

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-2658

Add magic numbers of Olympus ORF Files

Unassigned Selim Dincer Minor Resolved Fixed  
Task TIKA-2657

Add System.exit() and heavy gc hang to MockParser

Unassigned Tim Allison Trivial Resolved Fixed  
Task TIKA-2656

Allow users to specify timeout for parsing and/or waiting in ForkParser

Unassigned Tim Allison Major Resolved Fixed  
New Feature TIKA-2655

Allow the RecursiveParserWrapper to work with the ForkParser

Tim Allison Tim Allison Major Resolved Fixed  
New Feature TIKA-2653

Allow users to specify a directory of jars for classloading in ForkParser

Tim Allison Tim Allison Major Resolved Fixed  
New Feature TIKA-2647

Create a "security" page on our website

Tim Allison Tim Allison Major Resolved Fixed  
Task TIKA-2645

Reuse SAXParsers where possible

Tim Allison Tim Allison Major Resolved Fixed  
Task TIKA-2644

Improve RecursiveParserWrapper API

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-2637

ParsingReader.read throws exception when no bytes are available

Tim Allison Boris Petrov Critical Resolved Fixed  
Task TIKA-2635

Require imageMagick path be specified on Windows OS

Unassigned Tim Allison Minor Resolved Fixed  
Task TIKA-2634

Upgrade Jackson to 2.9.5

Unassigned Tim Allison Major Resolved Fixed  
Improvement TIKA-2629

Add image/x-dpx media-type detection

Unassigned Andreas Meier Minor Resolved Fixed  
Improvement TIKA-2628

Add image/aces media-type detection

Unassigned Andreas Meier Minor Resolved Fixed  
Task TIKA-2620

Set sys property to get better rendering speed by default

Unassigned Tim Allison Trivial Resolved Fixed  
Task TIKA-2618

LabelRecord and LabelSSTRecord text can be overwritten in xls

Unassigned Tim Allison Major Resolved Fixed  
Task TIKA-2617

Ignore NPOIFS IOOBE in PPT attachments

Unassigned Tim Allison Major Resolved Fixed  
Task TIKA-2616

message/news now incorrectly identified as rfc822

Unassigned Tim Allison Major Resolved Fixed  
Task TIKA-2614

RFC822 treats non-multipart as attachment

Unassigned Tim Allison Blocker Resolved Fixed  
Improvement TIKA-2613

Tesseract 4.0 has removed -psm, so Tika must update

Unassigned Ewan Mellor Major Resolved Fixed  
Bug TIKA-2608

tika matlab parser incorrectly identifies content type of minified javascript file

Unassigned pdwalker Minor Closed Fixed  
Sub-task TIKA-2607

TIKA-2579 Exchange levigo-jbig2-imageio with pdfbox-jbig2-imageio:3.0.0

Unassigned Andreas Meier Major Resolved Fixed  
Bug TIKA-2604

Error with certain jar paths on OS X

Tim Allison Sasha Goodman Blocker Resolved Fixed  
Bug TIKA-2601

Invalid XHTML output (overlapping a and formatting tags) for some WORD documents

Konstantin Gribov Filip Major Closed Fixed  
Task TIKA-2600

Don't use md5 checksum due to changes to the release distribuition policy

Tim Allison Tim Allison Blocker Resolved Fixed  
Improvement TIKA-2598

Fix dependency convergence

Tim Allison Guillaume Smet Blocker Resolved Fixed  
Bug TIKA-2594

Mail detected as application/xhtml+xml

Unassigned Andreas Meier Major Resolved Fixed  
Improvement TIKA-2592

HTML with charset unicode handled as utf-16 instead utf-8

Unassigned Andreas Meier Minor Resolved Fixed  
Bug TIKA-2591

Some tiffs (Big Endian with fax compression) are showing up as x-tarr

Unassigned daniel schmidt Major Resolved Fixed  
Bug TIKA-2590

ExcelExtractor: cannot choose listening to the selected records only

Unassigned Grigoriy Alekseev Critical Resolved Fixed  
Bug TIKA-2588

Tika detecting/parsing pptx with embedded Excel worksheet(s)...

Tim Allison Brian McColgan Major Closed Fixed  
Bug TIKA-2587

DKIM signed mails recognized as text/plain

Unassigned Andreas Meier Major Resolved Fixed  
Improvement TIKA-2584

Tika should have a way to pass arbitrary Tesseract options

Unassigned Ewan Mellor Minor Resolved Fixed  
Bug TIKA-2582

Tesseract 4.0 includes a FF character by default, breaking parsers

Unassigned Ewan Mellor Major Resolved Fixed  
Bug TIKA-2580

SafeContentHandler documentation is incorrect about replacement character

Unassigned Ewan Mellor Minor Resolved Fixed  
Improvement TIKA-2579

Update to PDFBox 2.0.9 when available

Tim Allison David Pilato Major Closed Fixed  
Bug TIKA-2578

Mails not recognized when unknown X-headers are present

Tim Allison Andreas Meier Major Resolved Fixed  
Improvement TIKA-2576

Add application/zstd detection and parser

Unassigned Andreas Meier Minor Resolved Fixed  
Bug TIKA-2571

Swallows security exception and returns null

Unassigned Nik Everett Minor Resolved Fixed  
Task TIKA-2570

Tika 1.17 uses vulnerable Jackson version 2.9.2

Unassigned Julian Reschke Minor Resolved Fixed  
Bug TIKA-2569

Grouped Text boxes in .ppt

Tim Allison Richard A Major Resolved Fixed  
Bug TIKA-2568

Full encrypted 7Z file not detected as such

Luís Filipe Nassif Luís Filipe Nassif Minor Resolved Fixed  
Sub-task TIKA-2566

TIKA-2085 Move logging in tika-core to slf4j-api (with log4j in test scope) as we do in the rest of Tika

Konstantin Gribov Tim Allison Minor Resolved Fixed  
Bug TIKA-2564

Tika client cannot extract files from embedded archive formats

Tim Allison Marc Prud'hommeaux Major Resolved Fixed  
Improvement TIKA-2563

Extract embedded objects in HTML and javascript

Tim Allison Tim Allison Trivial Resolved Fixed  
Bug TIKA-2561

Tika Parser includes oudated/vulnerable version of JSoup

Unassigned Asela Major Resolved Fixed  
Improvement TIKA-2559

Expose language metadata from PDF documents

Unassigned Matt Sheppard Major Resolved Fixed  
Improvement TIKA-2556

org.json package clash

Unassigned Andrei Rebegea Major Resolved Fixed  
Bug TIKA-2555

Text with [underline] + [another format] in word document generates overlapping html tags.

Konstantin Gribov Serban Alexe Minor Resolved Fixed  
Improvement TIKA-2552

Upgrade to POI 4.0.0 when available

Tim Allison Tim Allison Blocker Resolved Fixed  
Bug TIKA-2551

TIka Server uses HtmlParser for XML no matter what config is given, even if XML is disabled in Config

Unassigned Nick Burch Major Resolved Fixed  
Bug TIKA-2550

ToTextHandler includes <style/> element content

Tim Allison Tim Allison Trivial Resolved Fixed  
Bug TIKA-2549

NoSuchMethodException "CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)" parsing certain .docx files

Unassigned Adam Rauch Major Resolved Fixed  
Improvement TIKA-2548

Add Python Path configuration to TesseractOCRParser

Tim Allison Dave Meikle Minor Resolved Fixed  
Bug TIKA-2547

RFC822 w multipart/mixed first text element should be treated as body, not attachment

Unassigned Tim Allison Major Resolved Fixed  
Improvement TIKA-2541

Referenced version of Apache SIS (org.apache.sis) is branch EOL

Unassigned Richard Jones Major Resolved Fixed  
Improvement TIKA-2535

Use latest org.opengis:geoapi to avoid rejected/EOL'd jsr-275 dependency

Tim Allison Richard Jones Major Resolved Fixed  
Improvement TIKA-2528

Fix key location, keys file and download link

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-2527

Typos in tika-mimetypes.xml

Unassigned Andreas Meier Minor Resolved Fixed  
Improvement TIKA-2524

Create/integrate a parser for XPS

Tim Allison Peter Davies Major Resolved Fixed  
Bug TIKA-2479

Handle empty cells in tables uniformly

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-2462

Add a parser for sas7bdat

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-2446

Tainted Zip file can provoke OOM errors

Unassigned Thorsten Schäfer Major Resolved Fixed  
Improvement TIKA-2390

Extract images embedded in Html

Unassigned Luís Filipe Nassif Minor Resolved Duplicate  
Bug TIKA-2385

Tesseract OCR rotation.py not run

Dave Meikle Peter Weiss Major Resolved Fixed  
Bug TIKA-2354

Missing many embedded images in .doc files

Unassigned Tim Allison Blocker Resolved Fixed  
Bug TIKA-2352

Incorrect EOF exception in WordPerfect parser

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2350

Add catch block when opening Action on document open in PDFParser

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2349

Try to match digests when finding equivalent embedded files in tika-eval Compare

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2348

Improve error reporting in wmf/emf

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2343

--text-main in tika-server

Unassigned Nino Skopac Major Resolved Fixed  
Improvement TIKA-2339

Remove test file flagged by anti-virus code

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2338

Change Scope of Jai-ImageIO-Core dependency

Luís Filipe Nassif Luís Filipe Nassif Major Resolved Fixed  
Improvement TIKA-2329

Upgrade to POI 3.16-final

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-2325

Allow specification of default lang for common words

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2323

Improve commandline parameterization of thresholds

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2314

Migrate logging to slf4j in master (2.x) branch

Konstantin Gribov Konstantin Gribov Major Resolved Resolved  
Bug TIKA-2311

Preserve "x-tika-ooxml" mime value for truncated ooxml files

Unassigned Tim Allison Major Resolved Fixed  
Improvement TIKA-2309

New Detector and Parser classes for Time Stamped Data Envelope file format

Unassigned Fabio Minor Resolved Fixed  
Bug TIKA-2307

Accidentally swallowing UnsupportedZipFeatureException in rare cases

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2300

Can't tell if a zip file is encrypted

Tim Allison Aeham Abushwashi Major Resolved Fixed  
Bug TIKA-2295

Image not extracted via -z or -J in ODT

Tim Allison Tim Allison Minor Resolved Fixed  
Bug TIKA-2290

PDFParser 'ocr' properties cannot be set via headers when using Tika JAXRS

Tim Allison Kevin Oberlag Major Resolved Fixed  
Improvement TIKA-2287

Allow general jdbc connectivity for tika-eval

Unassigned Tim Allison Major Resolved Fixed  
Improvement TIKA-2286

Add parameterization for image quality when rendering PDF page for OCR

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2281

Let's extract the MAPI subtype (NOTE, STICKY, etc.) for msg files

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2279

Simplify token counting in tika-eval

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2276

Try to be more parsimonious creating TikaConfigs and ParseContexts

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-2275

EmbeddedDocumentUtil should check parseContext for a TikaConfig

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2269

NPE with FeedParser

Unassigned Julien Nioche Major Closed Fixed  
Improvement TIKA-2267

Add common tokens files for tika-eval

Tim Allison Tim Allison Minor Resolved Fixed  
Improvement TIKA-2247

Extract text from WMF/EMF files

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-2246

Extract files embedded within EMF files

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-2244

excessive memory usage when parsing a large nested package file

Unassigned Joshua Hight Minor Resolved Fixed  
Bug TIKA-2242

opendocument parsing produces malformed xml

Tim Allison Jan Van Raemdonck Major Resolved Fixed  
Improvement TIKA-2240

MS Write File

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2238

Add mime detection for embedded MSEquation files

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2237

UnsupportedOperationException due to SingletonList.set in ProbabilisticMimeDetectionSelector

Unassigned Jasper Hafkenscheid Major Resolved Fixed  
Improvement TIKA-2236

Upgrade to PDFBox 2.0.5 when available

Tim Allison Tim Allison Minor Resolved Fixed  
Improvement TIKA-2235

Use Tesseract's recommended DPI for PDF images

Unassigned Matthew Caruana Galizia Minor Resolved Fixed  
Improvement TIKA-2234

Remove ThreadLocal from dateformat

Unassigned Tim Allison Trivial Resolved Fixed  
New Feature TIKA-2232

Add JBIG2 image parsing support

Tim Allison Pascal Essiembre Minor Resolved Fixed  
Bug TIKA-2231

Invalid language code exception

Unassigned Peter Weiss Minor Resolved Fixed  
Improvement TIKA-2230

Add paragraph markup to WordPerfect parser(s)

Tim Allison Tim Allison Trivial Resolved Fixed  
Bug TIKA-2229

NullPointerException at org.apache.tika.parser.microsoft.ooxml.XWPFListManager.getFormattedNumber(XWPFListManager.java:64)

Unassigned Jorge Spinsanti Major Resolved Fixed  
Improvement TIKA-2228

WordPerfect parser update to support 5.x

Unassigned Pascal Essiembre Minor Resolved Fixed  
Improvement TIKA-2226

Add UnsupportedFormatException (extends TikaException)

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-2223

Extra ß characters in some WordPerfect files

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-2221

poi.EncryptedDocumentException not wrapped in tika.exception.EncryptedDocumentException

Unassigned Matthew Caruana Galizia Minor Resolved Fixed  
Bug TIKA-2219

CharsetDetector no longer detects windows-1252 charset

Unassigned Pascal Essiembre Minor Resolved Fixed  
Improvement TIKA-2218

Add a few more places where PPTX relationships might include an attachment

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-2215

TikaException about "Invalid embedded resource" on a valid PPT file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2212

Update mimes for OOXMLParser

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2211

ePub formatting instructions appear in plain text output

Unassigned Adam Carroll Major Resolved Fixed  
Improvement TIKA-2210

Add experimental SAX/Streaming XSLF/pptx extractor

Tim Allison Tim Allison Minor Resolved Fixed  
Bug TIKA-2209

Update PDFBox to 2.0.4

Konstantin Gribov Konstantin Gribov Trivial Closed Fixed  
Improvement TIKA-2208

Catch missing libraires

Unassigned David Pilato Major Resolved Fixed  
Bug TIKA-2207

ArrayIndexOutOfBoundsException on a valid Excel file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2204

IndexOutOfBoundsException on a valid Powerpoint file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2198

NullPointerException on a valid Word file

Unassigned Seva Alekseyev Major Resolved Fixed  
Improvement TIKA-2192

Extract embedded files from headers, footers, footnotes, etc from docx/m

Unassigned Tim Allison Major Resolved Fixed  
Improvement TIKA-2191

Apply current .docx unit tests to experimental SAX parser and fix or document as necessary

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-2190

Add "preserve_interword_spaces" option of tesseract

Tim Allison Bipul Kumar Major Resolved Fixed  
Improvement TIKA-2187

Align default behavior of experimental docx parser with that of doc parser in handling delText

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-2181

Upgrade to POI 3.16-beta2 when available

Tim Allison Tim Allison Minor Resolved Fixed  
Bug TIKA-2179

WordMLParser fails to parse a word xml file

Tim Allison Sean Story Minor Resolved Fixed  
Bug TIKA-2175

Enable extraction of inlined jp2/jpx from PDF

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-2174

Too few formats in support declared by TesseractOCRParser

Unassigned Matthew Caruana Galizia Major Resolved Fixed  
Bug TIKA-2170

Tika 1.13 ForkParser fails intermittently with very large MS Word docx

Unassigned Tim Kingsbury Major Resolved Fixed  
Bug TIKA-2169

Fix xhtml in combination OCR+metadata extraction from images

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-2167

Image processing causes OCR to fail

Unassigned Matthew Caruana Galizia Critical Resolved Fixed  
Bug TIKA-2166

TaggedIOException from a ZipException on a valid Word file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2164

HSLFException from ZipException "invalid stored block lengths" on a valid Powerpoint file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2162

"Unknown compression method" on a Powerpoint file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2161

EOFException on a valid Powerpoint file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2160

POIXMLException from NullPointerException on a valid Word file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2159

Handle pre-parse embedded object exceptions uniformly and more robustly

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-2158

NullPointerException on a valid Word file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2155

IndexOutOfBoundsException on a valid Excel file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2153

TaggedIOException on a valid Powerpoint file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2152

NullPointerException on a valid Word file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2151

Imposed Write Limit Causes Lost Data With Pdfs

Unassigned Josh Cummings Critical Resolved Duplicate  
Bug TIKA-2145

InvalidFormatException on a valid Word file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2142

ArrayIndexOutOfBoundsException

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2137

NullPointerException on a valid Word file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2136

External file links in PPTX misparsed

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2134

Different NullPointerException on a valid Excel file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2132

NullPointerException on a valid Excel file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2130

TaggedIOException from ZipException on a valid PowerPoint file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2129

IllegalArgumentException/"Unknown shape type" on a valid Powerpoint file

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2127

NullPointerException on a valid PPTX

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2125

XmlValueOutOfRangeException on a good Word document

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2123

CommonsDigester calculates wrong hashes on large files

Unassigned Yahav Amsalem Major Resolved Fixed  
Improvement TIKA-2122

Extract all email headers from Outlook .msg files into Metadata

Unassigned Chris Knott Minor Resolved Fixed  
Bug TIKA-2118

Misleading exception on a password protected XLS

Unassigned Seva Alekseyev Major Resolved Fixed  
Bug TIKA-2117

NullPointerException on PDF (fixed in PDFBox)

Unassigned Seva Alekseyev Major Resolved Fixed  
Improvement TIKA-2116

Upgrade to POI 3.16-beta1 when available

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-2115

OOM caused by corrupt embedded OLE object

Unassigned Thomas Galla Major Resolved Fixed  
Improvement TIKA-2113

Upgrade metadata-extractor to 2.9.1

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2111

Executable Parser adds Content-Type instead of setting

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2109

OutOfMemory when parsing 5MB word document

Unassigned Julian Major Resolved Not A Bug  
Bug TIKA-2104

Upgrade to a version of POI that fixes common bugs in macro extraction, when available

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-2100

Html Parser does not keep the html tag attributes

Unassigned Gerard Bouchar Major Resolved Fixed  
Bug TIKA-2098

Tika.parseToString() with maxLength doesn't work correctly for PDF files

Tim Allison Alexander Kazakov Major Resolved Fixed  
Bug TIKA-2097

Fix NPE in mbox parser

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2096

Supply AutoDetectParser for embedded documents if user forgets to pass it in via ParseContext

Unassigned Tim Allison Major Resolved Fixed  
Improvement TIKA-2095

Include version of Tika in tika-server's GREETING

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2090

Extract javascript from PDActions in PDFs

Unassigned Tim Allison Minor Resolved Fixed  
Sub-task TIKA-2083

TIKA-2085 Tika 2.0 - Audit master branch against 2.x branch

Bob Paulin Bob Paulin Blocker Closed Fixed  
Improvement TIKA-2082

Upgrade to PDFBox 2.0.3

Unassigned Luís Filipe Nassif Major Closed Duplicate  
Task TIKA-2081

Add back 'fileUrl' functionality to TikaJAXRS Server subject to security controls

Tim Allison John Dougrez-Lewis Minor Resolved Fixed  
Bug TIKA-2078

Account for potentially multiple runs within a hyperlink in DOCX

Tim Allison Tim Allison Minor Resolved Fixed  
Improvement TIKA-2069

Extract Macro text from Microsoft Office documents

Unassigned Jeff Swindle Major Resolved Fixed  
Task TIKA-2067

Upgrade maven plugin versions

Unassigned Tim Allison Trivial Resolved Fixed  
Task TIKA-2066

Upgrade commons-io to 2.5

Unassigned Tim Allison Trivial Resolved Fixed  
Task TIKA-2065

Upgrade forbiddenapis to 2.2

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-2058

Memory Leak in Tika version 1.13 when parsing millions of files

Unassigned Tim Barrett Major Resolved Fixed  
Improvement TIKA-2057

Extract PDF DocInfo fields into separate metadata fields

Tim Allison John Haynes Minor Resolved Fixed  
Bug TIKA-2055

Exception on parsing .docx file

Unassigned Sebastian Iturra Critical Resolved Fixed  
Improvement TIKA-2051

Upgrade to PDFBox 2.0.3 when available

Tim Allison Tim Allison Minor Closed Fixed  
Bug TIKA-2048

Add space for <br/> elements in MSWord 2003XML

Tim Allison Tim Allison Trivial Resolved Fixed  
Bug TIKA-2047

TXTParser overwrites mime type/masks types that are subtype of text

Tim Allison Tim Allison Minor Resolved Fixed  
Bug TIKA-2045

TIKA crashes / runs out of memory on simple PDF

Unassigned Egbert Major Resolved Fixed  
Bug TIKA-2041

Charset detection doesn't appear to be thread-safe

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-2040

OOM when parsing a corrupted CHM

Tim Allison Luís Filipe Nassif Major Resolved Fixed  
Improvement TIKA-2039

Upgrade jackcess to 2.1.4

Tim Allison Tim Allison Trivial Resolved Fixed  
Bug TIKA-2026

Handle OLE 2.0 embedded non-Office document in PPT/X and XLSX

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-2025

Extraction of long sequences of digits from Excel spreadsheets using Tika 1.13 doesn’t yield the expected results

Tim Allison Aeham Abushwashi Major Resolved Fixed  
Improvement TIKA-2024

Extract original filename/path when possible

Unassigned Tim Allison Major Resolved Fixed  
Improvement TIKA-2022

Add applefile parser

Tim Allison Tim Allison Trivial Resolved Fixed  
Task TIKA-2020

Tika 2.0 - remove AbstractParser's 3 parameter parse

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-2019

WordMLParser and SpreadsheetMLParser incorrectly concatenate tokens with ToTextHandler

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-2015

MAPIMessage String fileName constructor leaves file open

Unassigned Tim Barrett Major Resolved Fixed  
Improvement TIKA-2013

Upgrade to POI 3.15 when available

Tim Allison Tim Allison Minor Resolved Fixed  
Improvement TIKA-2011

Add mime detection for Endnote Import File (PRONOM: fmt/328)

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2009

Add magic for djvu

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2008

Add mime detection (and parser?) for MSOffice Owner File (PRONOM fmt/473)

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-2006

Add magic for vCalendar and iCalendar

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-2004

Add mime detection for Windows Media Metafile, PRONOM: application/x-puid-fmt-584

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-1999

org.apache.tika.sax.ToXMLContentHandler$ElementInfo.getPrefix(ToXMLContentHandler.java:58)

Tim Allison Egbert Major Resolved Fixed  
Improvement TIKA-1996

Upgrade to PDFBox 2.0.2 when available

Tim Allison Tim Allison Minor Closed Fixed  
Improvement TIKA-1994

Integrate OCR with PDFParser

Tim Allison Tim Allison Major Resolved Fixed  
Bug TIKA-1990

Broken .jpg inline image from .pdf files

Tim Allison Kukushkin Alexander Major Resolved Fixed  
Sub-task TIKA-1983

TIKA-2085 Tika 2.0 - remove tika-app's legacy server

Tim Allison Tim Allison Trivial Resolved Fixed  
Bug TIKA-1980

HTML head tags found after first script not parsed by HtmlParser (regression)

Tim Allison Joseph Naegele Major Resolved Fixed  
Bug TIKA-1978

Invocation of java.net.URL.equals(Object), which blocks to do domain name resolution, in org.apache.tika.parser.geo.topic.GeoParser.initialize(URL)

Lewis John McGibbney Lewis John McGibbney Critical Resolved Fixed  
Improvement TIKA-1977

RFC822Parser 'adds' dc:title causing rare exceptions if > 1 'subject'

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-1976

Add more robust date parsing fallbacks for RFC822 parser

Unassigned Tim Allison Minor Resolved Fixed  
Sub-task TIKA-1974

TIKA-2085 Tika 2.0 - remove deprecated metadata properties

Unassigned Tim Allison Blocker Resolved Fixed  
Bug TIKA-1971

Email saved as .eml with no body not detected as rfc822, while same email saved as plain txt is.

Unassigned Philipp Steinkrueger Minor Resolved Fixed  
Bug TIKA-1970

Date not extracted from email saved as plain txt

Unassigned Philipp Steinkrueger Minor Resolved Fixed  
Bug TIKA-1961

OutOfMemory when parsing shapes xml from xlsx files with multi-byte Unicode characters

Tim Allison Andrei Rebegea Major Closed Fixed  
Improvement TIKA-1959

Upgrade to PDFBox 2.0.1/JempBox 1.8.12

Unassigned Tim Allison Minor Closed Fixed  
Improvement TIKA-1958

Add mime detection and lightweight parsers for Office 2003 Word and Excel formats

Tim Allison Tim Allison Minor Resolved Fixed  
Bug TIKA-1956

NPE in WordParser when trying to getPicOffset

Tim Allison Ramit Wadhwa Major Resolved Fixed  
Improvement TIKA-1949

Upgrade to Commons Compress 1.11

Tim Allison Nick Burch Major Resolved Fixed  
Improvement TIKA-1948

Catch exceptions per page in PDFParser

Tim Allison Tim Allison Minor Resolved Fixed  
Improvement TIKA-1946

Add mime detection and parser for WordPerfect

Unassigned Nick C Major Resolved Fixed  
Bug TIKA-1938

HtmlParser drops <script> elements found inside <head>

Kenneth William Krugler Joseph Naegele Major Resolved Fixed  
Bug TIKA-1937

LinkContentHandler skips script tags

Unassigned Joseph Naegele Major Resolved Fixed  
Sub-task TIKA-1935

TIKA-1936 ISArchiveParser not releasing resources

Tim Allison Tim Allison Trivial Resolved Fixed  
Sub-task TIKA-1934

TIKA-1936 GeographicInformationParserTest leaving behind temp file in trunk

Tim Allison Tim Allison Trivial Resolved Fixed  
Sub-task TIKA-1932

TIKA-1936 Clear resources in ParserDecorator

Tim Allison Tim Allison Trivial Resolved Fixed  
Task TIKA-1924

Upgrade com.googlecode.mp4parser's isoparser to 1.1.18

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-1918

Shouldn't have to specify outputSuffix in tika-batch

Tim Allison Tim Allison Trivial Resolved Fixed  
Bug TIKA-1906

ExternalParser No Longer Supports Commands in Array Format

Ray Gauss II Ray Gauss II Major Resolved Fixed  
Task TIKA-1895

Upgrade to POI 3.15-beta1 when available

Unassigned Tim Allison Major Resolved Fixed  
Improvement TIKA-1879

Extract recipient information in MSG files with more granularity

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-1866

Out of memory error on Word document

Unassigned Shawn Johnson Major Resolved Fixed  
Improvement TIKA-1865

Save sender email address in Outlook MSG metadata

Unassigned Luís Filipe Nassif Major Resolved Fixed  
Sub-task TIKA-1851

TIKA-1824 Tika 2.0 - Move test resources from core to test-resources

Tim Allison Tim Allison Trivial Resolved Won't Fix  
Sub-task TIKA-1847

TIKA-1824 Tika 2.0 - Clean up tika-parsers pom dependencies and a few other things

Tim Allison Tim Allison Trivial Resolved Fixed  
Task TIKA-1846

Set up Hudson (or similar?) with new Git repo

Lewis John McGibbney Tim Allison Major Resolved Fixed  
Bug TIKA-1844

PooledTimeSeriesParser takes precedence over MP4Parser

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-1822

NullPointerException when parsing a .doc file

Tim Allison Panagiotis Mpailis Major Resolved Fixed  
Improvement TIKA-1805

Default parser/detector loading should warn on missing/empty classes

Unassigned Nick Burch Major Resolved Fixed  
Improvement TIKA-1706

Bring back commons-io to tika-core

Unassigned Yaniv Kunda Minor Resolved Fixed  
Bug TIKA-1658

unable to parse microsoft visio files with tika

Unassigned senthil Major Resolved Fixed  
Improvement TIKA-1513

Add mime detection and parsing for dbf files

Tim Allison Tim Allison Minor Resolved Fixed  
Bug TIKA-1473

Apache Tika is not working for .docx documents

Unassigned Franco Catto Major Resolved Fixed  
Improvement TIKA-1436

improvement to PDFParser

Unassigned Stefano Fornari Major Resolved Fixed  
Sub-task TIKA-1332

TIKA-1302 Create tika-eval module

Tim Allison Tim Allison Major Resolved Fixed  
New Feature TIKA-1321

Add experimental SAX/Streaming XWPF/docx extractor

Tim Allison Tim Allison Minor Resolved Fixed  
Bug TIKA-1301

Establish TikaServer on Apache hosted VM

Lewis John McGibbney Lewis John McGibbney Major Resolved Fixed  
Bug TIKA-1255

WordExtractor - bold hyperlink not closed properly

Tim Allison Alan Hunter Minor Resolved Fixed  
Improvement TIKA-1195

XLSB support

Unassigned Frederic Ronny Major Resolved Fixed  
Bug TIKA-879

Detection problem: message/rfc822 file is detected as text/plain.

Unassigned Konstantin Gribov Major Closed Duplicate  
Improvement TIKA-456

Support timeouts for parsers

Tim Allison Kenneth William Krugler Major Resolved Fixed