admin管理员组

文章数量:1022989

I am using the following code to add documents to a Lucene index. I have indexed 23,425 documents, but the folder where the index is stored has a size of 447.4 MB. In contrast, when storing the same data in a Parquet file with the same 23,425 records, the file size is only 625 KB. The folder size for the Lucene index seems excessively large. Could someone help identify why this is happening and how to optimize it? Below is the code I am using:

        MMapDirectory indexDirectory = new MMapDirectory(Paths.get(directory));
        // Configure the IndexWriter with an analyzer
        StandardAnalyzer analyzer = new StandardAnalyzer();
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        IndexWriter indexWriter = new IndexWriter(indexDirectory, config);

for (Map.Entry<String, OperationAggregation> entry : operations.entrySet())
  {
           Document doc1 = new Document();
           doc1.add(new StringField("namespace", namespace, Store.YES));
           doc1.add(new StringField("type", "operations", Store.YES));
           doc1.add(new StringField("data", entry.getKey(), Store.YES));
           doc1.add(new StringField("serviceName",entry.getValue().getServiceName(),
                                                    Store.YES));
           List<AggregationAttribute> attributes =
                                            entry.getValue().getOperationAttributes();
             for (int i = 0; i < attributes.size(); i++) 
             {
                 doc1.add(new StoredField(attributes.get(i).getName(),
                               String.valueOf(attributes.get(i).getValue())));
              }
               try { docCount.getAndIncrement();
                     ndexWriter.addDocument(doc1);
                  } catch (IOException e) {
                     logger.error("Error while adding document to index", e);
                }
    }
    indexWritermit();
    indexWriter.close();

I am using the following code to add documents to a Lucene index. I have indexed 23,425 documents, but the folder where the index is stored has a size of 447.4 MB. In contrast, when storing the same data in a Parquet file with the same 23,425 records, the file size is only 625 KB. The folder size for the Lucene index seems excessively large. Could someone help identify why this is happening and how to optimize it? Below is the code I am using:

        MMapDirectory indexDirectory = new MMapDirectory(Paths.get(directory));
        // Configure the IndexWriter with an analyzer
        StandardAnalyzer analyzer = new StandardAnalyzer();
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        IndexWriter indexWriter = new IndexWriter(indexDirectory, config);

for (Map.Entry<String, OperationAggregation> entry : operations.entrySet())
  {
           Document doc1 = new Document();
           doc1.add(new StringField("namespace", namespace, Store.YES));
           doc1.add(new StringField("type", "operations", Store.YES));
           doc1.add(new StringField("data", entry.getKey(), Store.YES));
           doc1.add(new StringField("serviceName",entry.getValue().getServiceName(),
                                                    Store.YES));
           List<AggregationAttribute> attributes =
                                            entry.getValue().getOperationAttributes();
             for (int i = 0; i < attributes.size(); i++) 
             {
                 doc1.add(new StoredField(attributes.get(i).getName(),
                               String.valueOf(attributes.get(i).getValue())));
              }
               try { docCount.getAndIncrement();
                     ndexWriter.addDocument(doc1);
                  } catch (IOException e) {
                     logger.error("Error while adding document to index", e);
                }
    }
    indexWritermit();
    indexWriter.close();

本文标签: javaFolder Size is too Large of Lucene DocumentsStack Overflow