Segment info. This contains metadata about a segment, such as the number of documents, what files it uses, and information about how the segment is sorted。其中包含有关片段的元数据,例如文档数量、它使用的文件以及有关片段排序方式的信息
Field names. This contains metadata about the set of named fields used in the index.包含文档fields的元数据以及名称。
Stored Field values. This contains, for each document, a list of attribute-value pairs, where the attributes are field names. These are used to store auxiliary information about the document, such as its title, url, or an identifier to access a database. The set of stored fields are what is returned for each hit when searching. This is keyed by document number.以文档ID作为key,存储当前文档的fields键值对。
Term dictionary. A dictionary containing all of the terms used in all of the indexed fields of all of the documents. The dictionary also contains the number of documents which contain the term, and pointers to the term’s frequency and proximity data.包含所有文档的所有索引字段中使用的所有term的字典。该词典还包含包含该term的文档数量,以及指向该术语的频率和邻近数据的指针。
Term Frequency data. For each term in the dictionary, the numbers of all the documents that contain that term, and the frequency of the term in that document, unless frequencies are omitted (IndexOptions.DOCS)。term在当前文档出现的频率以及在全部文档出现的频率,主要用于score得分,比如term在当前文档出现的频率最高,在所有文档出现的频率最低,那么搜索该term在该文档中搜索得分高。
Term Proximity data. For each term in the dictionary, the positions that the term occurs in each document. Note that this will not exist if all fields in all documents omit position data。term出现在所有文档的位置,可省略。
Normalization factors. For each field in each document, a value is stored that is multiplied into the score for hits on that field.计算相关性score的时候可为某些field字段乘以一个系数。
Term Vectors. For each field in each document, the term vector (sometimes called document vector) may be stored. A term vector consists of term text and term frequency. To add Term Vectors to your index see the Field constructors。每一个文档的每一个field会有一个term向量,主要根据term出现的频率计算出来,用于搜索的score分值计算。
StoredField: Stored-only value for retrieving in summary results。仅存储值。
Per-document values. Like stored values, these are also keyed by document number, but are generally intended to be loaded into main memory for fast access. Whereas stored values are generally intended for summary results from searches, per-document values are useful for things like scoring factors.类似StoreField,可以更快加载到内存访问,用于搜索的摘要结果,但是每个文档的值对于评分因素有很大的影响。
Live documents. An optional file indicating which documents are live.一个可选文件,指定哪些文档是实时的。主要用于段数据删除时候,在段外部维护一个状态记录段的最新状态。
Point values. Optional pair of files, recording dimensionally indexed fields, to enable fast numeric range filtering and large numeric values like BigInteger and BigDecimal (1D) and geographic shape intersection (2D, 3D).可选的一对文件,记录维度索引字段,以启用快速数值范围过滤和大数值,例如 BigInteger 和 BigDecimal (1D) 以及地理形状交集(2D、3D)。
Vector values. The vector format stores numeric vectors in a format optimized for random access and computation, supporting high-dimensional nearest-neighbor search.
In Lucene, fields may be stored, in which case their text is stored in the index literally, in a non-inverted manner. Fields that are inverted are called indexed. A field may be both stored and indexed.