|
|
TIKA-2326
|
java.lang.OutOfMemoryError: Java heap space
|
Unassigned
|
Md
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-2172
|
can not read Arabic file
|
Unassigned
|
Ahmad Sawalhah
|
|
Resolved |
Won't Fix
|
|
|
|
|
|
|
TIKA-2143
|
POI deprecated method used in TIKA 1.13
|
Unassigned
|
sbathrutheen
|
|
Open |
Unresolved
|
|
|
|
|
|
|
TIKA-1967
|
Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@10b8c32
|
Unassigned
|
kostali
|
|
Resolved |
Not A Problem
|
|
|
|
|
|
|
TIKA-1965
|
Added types to Grobid quantities parser
|
Dave Meikle
|
Can Menekse
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1961
|
OutOfMemory when parsing shapes xml from xlsx files with multi-byte Unicode characters
|
Tim Allison
|
Andrei Rebegea
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-1960
|
Put legacy language detection code back into 1.x=trunk
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1959
|
Upgrade to PDFBox 2.0.1/JempBox 1.8.12
|
Unassigned
|
Tim Allison
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-1956
|
NPE in WordParser when trying to getPicOffset
|
Tim Allison
|
Ramit Wadhwa
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1955
|
MIME types updates and additions for Scientific Data based on TREC-DD-Polar
|
Chris A. Mattmann
|
Chris A. Mattmann
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1950
|
Clean up jdom version conflict
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1949
|
Upgrade to Commons Compress 1.11
|
Tim Allison
|
Nick Burch
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1948
|
Catch exceptions per page in PDFParser
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1944
|
Add mime magic for apple single/double files
|
Unassigned
|
Nick C
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1943
|
Include support for Yandex Translate API
|
Chris A. Mattmann
|
Mark Duske
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1939
|
Preparation for Tika 1.13 release
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1937
|
LinkContentHandler skips script tags
|
Unassigned
|
Joseph Naegele
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1935
|
TIKA-1936
ISArchiveParser not releasing resources
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1934
|
TIKA-1936
GeographicInformationParserTest leaving behind temp file in trunk
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1933
|
TIKA-1936
ForkParser leaves tmp jars behind on Windows (at least)
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1932
|
TIKA-1936
Clear resources in ParserDecorator
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1927
|
NPE in JDBCTableReader
|
Tim Allison
|
Nick C
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1926
|
JSON TEI Exception
|
Chris A. Mattmann
|
Ayesha Hasan
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1924
|
Upgrade com.googlecode.mp4parser's isoparser to 1.1.18
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1918
|
Shouldn't have to specify outputSuffix in tika-batch
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1917
|
Just a quick fix to allow NLTK Parser extract measurement information from text
|
Chris A. Mattmann
|
Manali Shah
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1916
|
NPE in OpenDocumentParser
|
Tim Allison
|
Nick C
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1914
|
ExecutableParser doesn't call start document
|
Unassigned
|
Nick C
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1913
|
Integrate MIT Information Extraction(MITIE) into Tika to perform Named Entity Recognition
|
Chris A. Mattmann
|
Manali Shah
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1906
|
ExternalParser No Longer Supports Commands in Array Format
|
Ray Gauss II
|
Ray Gauss II
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1898
|
backslashes in mime-type : application/vnd.mif are wrong
|
Unassigned
|
Steffen Netz
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1895
|
Upgrade to POI 3.15-beta1 when available
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1894
|
Add XMPMM metadata extraction to JempboxExtractor
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1893
|
Add new mimetype for *.icns (Apple Icon Image Format) files
|
Unassigned
|
Manisha Kampasi
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1892
|
Mime Magic for application/x-mobipocket-ebook and application/x-shapefile
|
Unassigned
|
Suman Kashyap
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1890
|
Update mimetype for application/vnd.ms-cab-compressed
|
Unassigned
|
Ajay Kumar Loganathan Ravichandran
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1888
|
TIKA-1955
Update mimetype for application/x-netcdf
|
Unassigned
|
Ajay Kumar Loganathan Ravichandran
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1886
|
TIKA-1955
Updating tika-mimetypes.xml to detect .hfa files
|
Chris A. Mattmann
|
Nandan Chandrashekar
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1885
|
TIKA-1955
Tika MIME updates for *.cdf and *.xar and custom zero length file detector based on TREC-DD-Polar
|
Chris A. Mattmann
|
Adesh Gupta
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1882
|
TIKA-1955
Scientific MIME updates to .cab files, .xar and .mobi and .mov files based on TREC-DD-Polar analysis
|
Chris A. Mattmann
|
Manisha Kampasi
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1881
|
TIKA-1955
Updates to MIME types for Postscript, WordPerfect, image and RSS based on Polar analysis
|
Chris A. Mattmann
|
Namitha Sanjeeva Ganiga
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1878
|
Upgrade Apache SIS 0.6
|
Unassigned
|
Hendy Irawan
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1877
|
On updating the tika-mimetypes.xml to detect .fts file format, tika detector does not return anything
|
Unassigned
|
Prasad Nagaraj Subramanya
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1876
|
Integrate Natural Language Toolkit (NLTK) into Tika to perform Named Entity Recognition
|
Chris A. Mattmann
|
Manali Shah
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1875
|
Updating tika-mimetypes.xml to detect .NC files
|
Unassigned
|
Prasad Nagaraj Subramanya
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1872
|
Backport tika-langdetect from 2.x branch to 1.13 branch
|
Chris A. Mattmann
|
Trevor Lewis
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1871
|
Update Tika JAXRS wiki page with the info about multipart/form-data
|
Sergey Beryozkin
|
Sergey Beryozkin
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1870
|
Relocating RichTextContentHandler into tika-core from tika-server
|
Unassigned
|
John Patrick
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1869
|
Jackson update to latest version
|
Unassigned
|
John Patrick
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1868
|
create clean tika-server jar and shaded classifier jar
|
Unassigned
|
John Patrick
|
|
Closed |
Won't Fix
|
|
|
|
|
|
|
TIKA-1866
|
Out of memory error on Word document
|
Unassigned
|
Shawn Johnson
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1862
|
Exception in thread "Thread-9" java.lang.UnsatisfiedLinkError: /usr/lib/jvm/jre/lib/amd64/headless/libmawt.so: libcups.so.2: cannot open shared object file: No such file or directory
|
Unassigned
|
Avinash
|
|
Resolved |
Invalid
|
|
|
|
|
|
|
TIKA-1861
|
Upgrade to sqlite-jdbc 3.8.11.2
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1857
|
Enhance PDFParser to extract text from XFA forms
|
Unassigned
|
Pascal Essiembre
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1856
|
Error while parsing an ogg file
|
Unassigned
|
Yash Tanna
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1846
|
Set up Hudson (or similar?) with new Git repo
|
Lewis John McGibbney
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1845
|
Unable to extract content from certain RTFs using tika-server versions since 1.5
|
Tim Allison
|
Ian Williams
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1844
|
PooledTimeSeriesParser takes precedence over MP4Parser
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1836
|
Convertion DOC->TXT failed due to POI issue
|
Unassigned
|
Jorge Spinsanti
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1830
|
Upgrade to PDFBox 1.8.11 when available
|
Tim Allison
|
Tim Allison
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-1823
|
Support detecting DWF format
|
Unassigned
|
Luca Moretti
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1821
|
Problem in Tika().detect for xml file signed in CADES
|
Unassigned
|
Alessandro De Angelis
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1816
|
Lenient testing for NamedEntityParser
|
Unassigned
|
Thamme Gowda
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1801
|
Integrate MITIE Named Entity Recognition support
|
Chris A. Mattmann
|
Chris A. Mattmann
|
|
Resolved |
Duplicate
|
|
|
|
|
|
|
TIKA-1723
|
Integrate language-detector into Tika
|
Kenneth William Krugler
|
Kenneth William Krugler
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1696
|
Language Identification with Text Processing Toolkit from MITLL
|
Chris A. Mattmann
|
Paul Ramirez
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1657
|
Allow easier XML serialization of TikaConfig
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1473
|
Apache Tika is not working for .docx documents
|
Unassigned
|
Franco Catto
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1435
|
Update rome dependency to 1.5
|
Chris A. Mattmann
|
aoeu
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1285
|
Upgrade to PDFBox 2.0.0 when available
|
Unassigned
|
Jeremy Anderson
|
|
Closed |
Fixed
|
|
|
|
|