Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9064

Can we remove the FST cache in Kuromoji and Nori analyzers?

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Won't Do
    • None
    • None
    • None
    • None
    • New

    Description

      Is the ~30k han cache in kuromoji redundant after LUCENE-8920?

      https://github.com/apache/lucene-solr/blob/813ca77250db29116812bc949e2a466a70f969a3/lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/TokenInfoFST.java#L35-L38)

      The entire linked file's purpose is all around this caching, so if its not needed anymore it would be a nice cleanup. But it was definitely needed for good performance before, so we shoudl be careful. Nori analyzer has the exact same thing (file has the same name) for ~10k hangul syllables.

      Attachments

        Activity

          People

            broustant Bruno Roustant
            broustant Bruno Roustant
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: