CSV files and XML files can be imported into the databases using the "Import" button in the footer of the pages. Alternatively, the API can be used to import data. An R package and a Python package simplify API operations. The import function allows both, new articles, sections and categories to be created and existing entities to be updated.

The Relational Article Model (RAM)

To prepare data imports, a basic understanding of the data model will be helpful. Epigraf implements the Relational Article Model (RAM) to store documents in tables for projects, articles, sections, items, links, footnotes and properties.

When it comes to data import, the most crucial aspect of the RAM are IRIs. IRIs are globally unique identifiers. In Epigraf databases, each entity has an IRI and you use them to prepare an import table. For example, to import categories, the import table can be structured as follows:

Id Lemma
properties/categories/scifi Science fiction
properties/categories/musical Musical
properties/categories/drama Drama

In the example, the ID field is populated with IRI paths. Each IRI path consists of the target table properties, the property type categories and a unique IRI fragment for the entity. The table is fixed. Which types are available within a table is defined in the types configuration of a database. The type used here refers to the movies sample database. The IRI fragment is an arbitrary identifier, it can contain numbers or letters.

Using IRI paths, first, ensures that no new entities are created when the same data is imported twice. Instead, the import function compares the IRIs to the database to determine whether an entity already exist. Entities with an existing IRI path are overwritten, otherwise new entities are created.

Second, IRI paths are used to link records to each other. For example, to import text into an article, the import table can be structured as follows:

id name content projects_id articles_id sections_id
projects/default/movies Movies
articles/default/0001 Chronicles of Narnia projects/default/movies
sections/text/0001 Abstract projects/default/movies articles/default/0001
items/text/0001 The Chronicles of Narnia is a series of films based on the novels by C.S. Lewis. projects/default/movies articles/default/0001 sections/text/0001

In the example, an item entity with the description of a movie is created in the last line. The description ends up in an item of type text with the IRI fragment 0001. This item is inserted into the section with the IRI path sections/text/0001, which in turn is created in the article with the IRI path articles/default/0001.

Such import tables following the RAM can be imported directly as CSV files. Alternatively, XML files mapping the very same table structure can be imported. The R and Python packages support the conversion of data into the RAM format and upload import tables directly from R or Python scripts with a single command.

Which fields are available?

In principle, all fields that are included in the entity export or that are documented in the development documentation are available for the import. All target tables share their fields when importing, for example for the name of an article and the name of a section. Fields that are irrelevant for an entity – such as the content field for articles – simply remain empty.

For some fields, aliases can be used to keep the file clearer:

  • In the respective tables, the field type can be used instead of projecttype, articletype, sectiontype, itemtype, propertytype, usertype, scope or from_tagname.
  • In the items table, the field to_id can be used instead of links_tab and links_id. Usually it contains an IRI path of the target entity.
  • In the links and footnotes tables, the field root_id is sufficient instead of root_tab and root_id, provided that the table can be derived from the provided IRI path. The same applies to from_id and to_id.

See the aliases and their corresponding database fields as listed below.

Project import fields

Alias Explanation Field in the data model Example
id IRI path
published Publication state 0 to 4. published
type Entity type projecttype
iri IRI fragment norm_iri
sortno sort number sortno
signature Short title signature
name Long title name
content Project metadata in JSON format description
norm_data Authority data norm_data

Article import fields

Alias Explanation Field in the data model Example
id IRI path
published Publication state 0-4 published
type Entity type articletype
iri IRI fragment norm_iri
sortno sort number sortno
signature Article dentifier (text) signature
name Article title (text) name
status Article status (text) status
norm_data Authority data (text) norm_data
creator Article author (IRI path) created_by
modifier Article editor (IRI path) modified_by
project Project (IRI path) projects_id

Section import fields

Alias Explanation Field in the data model Example
id IRI path
published Publication state 0-4 published
type Entity type sectiontype
iri IRI fragment norm_iri
sortno Sort number sortno
number Section number (number) number
name Section name (text) name
signature Alternative secion name (text) alias
content Section notes (text) comment
layout_cols Number of columns in a grid (number) layout_cols
layout_rows Number of rows in a grid (number) layout_rows
articles_id Article (IRI path) articles_id
parent_id Parent section (IRI path) parent_id

Item import fields

Alias Explanation Field in the data model Example
id IRI path
published Publication state 0-4 published
type Entity type itemtype
iri IRI fragment norm_iri
sortno Sort number sortno
value A single value (text) value
content Text content content
translation Translation text translation
property Linked category (IRI path) properties_id
pos_x Position in the grid (number) pos_x
pos_y Position in the grid (number) pos_y
pos_z Position in the grid (number) pos_z
sections_id Section (IRI path) sections_id
articles_id Article (IRI path) articles_id

To be added: to_id, flagged, file_*, date_*, source_*

Types import fields

Alias Explanation Field in the data model Example
id IRI path
type Scope of the type configuration scope
iri IRI fragment norm_iri
sortno Sort number sortno
name Type name name
caption Type label (text) caption
mode Mode (text) mode
category Type category (text) category
description Type decription (text) description
config Type configuration in JSON format config

How are entities linked to each other?

Projects contain articles, articles consist of sections and sections contain items. The items in turn refer to properties. The link between all these entities is established during import via IDs. IDs can be created in three different ways:

  • IRI paths (Internationalized Resource Identifiers) are particularly flexible and recommended, as they allow data transfer between different databases without knowing the internal database IDs. They are formed according to the scheme <table>/<type>/<irifragment>. Example: properties/languages/iso-de-de.
  • Database IDs must correspond to an existing entity. They are used to overwrite existing data or to refer to existing data. These IDs are formed according to the scheme <table>-<id>, where the placeholder <id> contains an existing numeric ID. Example: articles-1.
  • Temporary IDs are not imported into the database, but are only used for linking within a CSV file. They are formed according to the scheme <table>-tmp<id>, i.e. the table name is followed by the prefix "tmp" after a hyphen and then a custom name, which can be composed of any letters and numbers. When importing entities using temporary IDs, database-specific IDs are automatically created and used for all fields with the same temporary ID. Example: articles-tmp123.

In general, IRI paths are suitable for both, importing new data and updating existing data. Database IDs only work for updating existing data. Temporary IDs are rarely useful; they are only useful for one-time initial imports because new entities are created each time the import process is repeated. Here is an example of an import table with temporary IDs instead of IRI paths:

id articles_id sections_id name content
articles-tmp1 An article
sections-tmp1 articles-tmp1 A section Comment on the section
items-tmp1 articles-tmp1 sections-tmp1 The content of the section

In the example, not only a separate ID is specified for a section, but also the ID of the associated item; the same applies to the item entity. The entities are therefore linked to each other during import.

In the import preview, you can see whether IDs in an import can be resolved to existing data. All entities found in the database, based on IRI paths or IDs, are highlighted in green. Unmarked entities are new to the database and will be created.

How are entities updated instead of newly created?

If IRIs or database IDs are used (see above) and an entity with the same IRI or ID already exists, it is not created again, but overwritten. Two variants for noting the IRI paths are supported:

  • A complete IRI path is given in the id column, for example "properties/languages/iso-de-de".
  • The components of the IRI path are given in the respective columns. The table name "properties" results from the table column (or from a temporary ID). In addition, the entity type "languages" must be specified in the types column so that the IRI path can be derived. The IRI fragment is given in the iri column, for example "iso-de-de".

Further behavior can optionally be controlled via the _action and _fields columns:

  • _action=clear: Entities contained in the current entity are deleted to make room for the following entities. For example, a section can be cleared before new items are imported in the following steps.
  • _action=skip: The entity is not imported, i.e. it is not overwritten and not created. The entity is only included in the import data to serve as intermediary link target or link source.
  • _action=link: The entity is only created if it does not exist yet. You can use this option, for example, to create non-existing properties without overwriting content of existing properties.
  • _fields: All tables in an import file share the columns. This is how the comment of a section as well as the text of an item both are imported from the content field. If the field is empty, it is cleared. Alternatively, the _fields column specifies which fields should be taken into account. List all fields to be considered, separated with commas (note: do not forget the ID fields). For example, this ensures that a comment for a section is not overwritten, but a transcription in an item entity is updated, although the import file contains a shared content column and, thus, an empty comment field would clear the comment of a section by default. If the _fields column is missing or empty, all fields are considered.