Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Finding archives and/or participants based on alternative names.

  • Finding Aksesjons/Tilveksts using the Mottat-fra values.

  • Finding archives that have a Serie/Mappe with a particular name.

  • Finding archives and/or their descendants based on their creator values.

Indexing

...

Entities

As Asta7 is a generic system all the entities in the system are treated equally and all of them get indexed in Elasticsearch separately. But this does not work well for most of the cases. Some entities (like Alternativtnavn, Geografy, etc. in ISADG) should not be indexed separately but rather should be a part of the parent entity.

To mitigate this shortcoming a new option has been added in the entity to make it searchable or not.

...

Indexing

...

Related/Inherited System Entities

Often it is needed to search for something based on some related system entity (participant, restriction, and tag). But as each entity resides on its own index in Elasticsearch and join/subquery is not possible it has not been possible so far.

To overcome this shortcoming, some options have been added in the entity to make it possible to index the related and even inherited system entities with each object.

...

Indexing

...

Member/

...

Descendant Entities

As Elasticsearch is a flat document-based database that does not support joining or subqueries the only way to do a search based on member/descendant entities is to index them with the parent. So, every document will have the necessary members/descendants and related system entities indexed with them.

...

  1. Total nesting limit: 10 (Arkiv → Arkivdel → Serie → Serie → Serie → Stykke → Mappe → Mappe → Mappe → Geografi)

  2. Self nesting limit: 2 (A → Aa → Aaa)File nesting limit: 1 (Arkiv → Fil)

  3. Nested member limit: 1000

  4. ES Total field limit: 5000 (Default was 1000)

  5. ES Nested field limit: 500 (Default was 50)

  6. ES Nested object limit: 50_000 (Default was 10_000)

...

  1. Level 1: All fields

  2. Level 2: All fields for leaf entities (Alternativtnavn, Geografy, etc.), otherwise only required fields (Serie, Stykke)

  3. Level below 2: Required fields

Note

Althogh Although it is possible to change the search settings any time, even after project inilizationinitialization, it will no be in effect immediately. Project search data has to be re-indexed after any settings change. Otherwise the sarch search might be broken or not work as expected. Changing these settings might also break the existing saved searches.

Indexing Digital Files

It is also possible to index the associated digital files along with the archive units. When an entity with fileName field(s) is made searchable the associated files get indexed automatically, nothing else needs to be done. With some limits files can also be indexed with member/descendant archive units as well.

File Fields

  1. ID: file id

  2. Name: file name

  3. Type: file mime type

  4. Length: file length in bytes

  5. Timestamp: file upload time

  6. Content: extracted text from the file

Limits

  1. Supported types: TEXT, XML, HTML, PDF, CSV

  2. Max file size for text extraction: 2GB

  3. File nesting limit: 1 (Arkiv → Fil)

Searching

Tree Search

The tree search has been updated with the following changes

...

Based on the selected field type a particular type of query/operator would be selected, although there can be other operators available for that field type as well which can be found in the advanced mode. The basic mode will do an AND query. There are many more options available in the advanced mode.

File Search

As digital files associated with archive units are also indexed along with the archive units, it is possible to use the file metadata (name, type, length, etc.) and/or content for searching archive units.

If any file entities and/or entities with file member entities (depends on file nesting limit) are selected, then the associated file contents will also be considered for the free text search.

Otherwise it is also possible to select the file fields as filter rules as well.

...

Highlighting

When doing free text search matched fragments will be highlighted based on some criteria. If entities are selected then all the top-level text fields of those entities will be highlighted. If file entities are selected then file content matches will be highlighted in a separate section in the expanded view.

Take a look at Search Examples for some example searches.

...

  • Need to handle orphan members and system relations during sync.

  • More control over which member to include, like Serie should be included with Arkiv but not with Arkivdel.

  • Control which field is searchable and/or searchable as a member. This should make the field limitations unnecessary. If not then need to make the field limitations configurable instead of hard-coding.

  • Include member’s/descendant's related system entities?

  • Although there are limitations on the number of nested members, they are not applied during syncing at the moment. Need to fix this.Need to use Asta7 models and properties in Essync instead of duplicating them.

  • Multi-level nesting makes the search quite complex and might not be needed for all the projects. Should we consider adding support for flat nesting as well? Add support for doing a level-agnostic search using descendant entities.

  • Multiple entities have fields with the same name and same type, but one of them has the CodeTableRef/FileName feature. The first field will be used.

  • Multiple child entities with the same name. The first one will be used. Note, that if there are more fields and/or child entities on the later entities those will be not available.

  • Should file content indexing be configurable?