Documentation

CSV files and XML files can be imported into the databases using the "Import" button in the footer of the pages. Alternatively, the API can be used to import data. An R package and a Python package simplify API operations. The import function allows both, new articles, sections and categories to be created and existing entities to be updated.

To prepare data imports, a basic understanding of the data model will be helpful. Epigraf implements the Relational Article Model (RAM) to store documents in tables for projects, articles, sections, items, links, footnotes and properties.

When it comes to data import, the most crucial aspect of the RAM are IRIs. IRIs are globally unique identifiers. In Epigraf databases, each entity has an IRI and you use them to prepare an import table. For example, to import categories, the import table can be structured as follows:

Id	Lemma
properties/categories/scifi	Science fiction
properties/categories/musical	Musical
properties/categories/drama	Drama

In the example, the ID field is populated with IRI paths. Each IRI path consists of the target table properties, the property type categories and a unique IRI fragment for the entity. The table is fixed. Which types are available within a table is defined in the types configuration of a database. The type used here refers to the movies sample database. The IRI fragment is an arbitrary identifier, it can contain numbers or letters.

Using IRI paths, first, ensures that no new entities are created when the same data is imported twice. Instead, the import function compares the IRIs to the database to determine whether an entity already exist. Entities with an existing IRI path are overwritten, otherwise new entities are created.

Second, IRI paths are used to link records to each other. For example, to import text into an article, the import table can be structured as follows:

id	name	content	projects_id	articles_id	sections_id
projects/default/movies	Movies
articles/default/0001	Chronicles of Narnia		projects/default/movies
sections/text/0001	Abstract		projects/default/movies	articles/default/0001
items/text/0001		The Chronicles of Narnia is a series of films based on the novels by C.S. Lewis.	projects/default/movies	articles/default/0001	sections/text/0001

In the example, an item entity with the description of a movie is created in the last line. The description ends up in an item of type text with the IRI fragment 0001. This item is inserted into the section with the IRI path sections/text/0001, which in turn is created in the article with the IRI path articles/default/0001.

Such import tables following the RAM can be imported directly as CSV files. Alternatively, XML files mapping the very same table structure can be imported. The R and Python packages support the conversion of data into the RAM format and upload import tables directly from R or Python scripts with a single command.

In principle, all fields that are included in the entity export or that are documented in the development documentation are available for the import. All target tables share their fields when importing, for example for the name of an article and the name of a section. Fields that are irrelevant for an entity – such as the content field for articles – simply remain empty.

For some fields, aliases can be used to keep the file clearer:

In the respective tables, the field type can be used instead of projecttype, articletype, sectiontype, itemtype, propertytype, usertype, scope or from_tagname.
In the items table, the field to_id can be used instead of links_tab and links_id. Usually it contains an IRI path of the target entity.
In the links and footnotes tables, the field root_id is sufficient instead of root_tab and root_id, provided that the table can be derived from the provided IRI path. The same applies to from_id and to_id.

See the aliases and their corresponding database fields as listed below.

Alias	Explanation	Field in the data model
id	IRI path
published	Publication state 0 to 4.	published
type	Entity type	projecttype
iri	IRI fragment	norm_iri
sortno	sort number	sortno
signature	Short title	signature
name	Long title	name
content	Project metadata in JSON format	description
norm_data	Authority data	norm_data

Alias	Explanation	Field in the data model
id	IRI path
published	Publication state 0-4	published
type	Entity type	articletype
iri	IRI fragment	norm_iri
sortno	sort number	sortno
signature	Article dentifier (text)	signature
name	Article title (text)	name
status	Article status (text)	status
norm_data	Authority data (text)	norm_data
creator	Article author (IRI path)	created_by
modifier	Article editor (IRI path)	modified_by
project	Project (IRI path)	projects_id

Alias	Explanation	Field in the data model
id	IRI path
published	Publication state 0-4	published
type	Entity type	sectiontype
iri	IRI fragment	norm_iri
sortno	Sort number	sortno
number	Section number (number)	number
name	Section name (text)	name
signature	Alternative secion name (text)	alias
content	Section notes (text)	comment
layout_cols	Number of columns in a grid (number)	layout_cols
layout_rows	Number of rows in a grid (number)	layout_rows
articles_id	Article (IRI path)	articles_id
parent_id	Parent section (IRI path)	parent_id

Alias	Explanation	Field in the data model
id	IRI path
published	Publication state 0-4	published
type	Entity type	itemtype
iri	IRI fragment	norm_iri
sortno	Sort number	sortno
value	A single value (text)	value
content	Text content	content
translation	Translation text	translation
property	Linked category (IRI path)	properties_id
pos_x	Position in the grid (number)	pos_x
pos_y	Position in the grid (number)	pos_y
pos_z	Position in the grid (number)	pos_z
sections_id	Section (IRI path)	sections_id
articles_id	Article (IRI path)	articles_id

To be added: to_id, flagged, file_*, date_*, source_*

Alias	Explanation	Field in the data model
id	IRI path
type	Scope of the type configuration	scope
iri	IRI fragment	norm_iri
sortno	Sort number	sortno
name	Type name	name
caption	Type label (text)	caption
mode	Mode (text)	mode
category	Type category (text)	category
description	Type decription (text)	description
config	Type configuration in JSON format	config

Projects contain articles, articles consist of sections and sections contain items. The items in turn refer to properties. The link between all these entities is established during import via IDs. IDs can be created in three different ways:

IRI paths (Internationalized Resource Identifiers) are particularly flexible and recommended, as they allow data transfer between different databases without knowing the internal database IDs. They are formed according to the scheme <table>/<type>/<irifragment>. Example: properties/languages/iso-de-de.
Database IDs must correspond to an existing entity. They are used to overwrite existing data or to refer to existing data. These IDs are formed according to the scheme <table>-<id>, where the placeholder <id> contains an existing numeric ID. Example: articles-1.
Temporary IDs are not imported into the database, but are only used for linking within a CSV file. They are formed according to the scheme <table>-tmp<id>, i.e. the table name is followed by the prefix "tmp" after a hyphen and then a custom name, which can be composed of any letters and numbers. When importing entities using temporary IDs, database-specific IDs are automatically created and used for all fields with the same temporary ID. Example: articles-tmp123.

In general, IRI paths are suitable for both, importing new data and updating existing data. Database IDs only work for updating existing data. Temporary IDs are rarely useful; they are only useful for one-time initial imports because new entities are created each time the import process is repeated. Here is an example of an import table with temporary IDs instead of IRI paths:

id	articles_id	sections_id	name	content
articles-tmp1			An article
sections-tmp1	articles-tmp1		A section	Comment on the section
items-tmp1	articles-tmp1	sections-tmp1		The content of the section

In the example, not only a separate ID is specified for a section, but also the ID of the associated item; the same applies to the item entity. The entities are therefore linked to each other during import.

In the import preview, you can see whether IDs in an import can be resolved to existing data. All entities found in the database, based on IRI paths or IDs, are highlighted in green. Unmarked entities are new to the database and will be created.

If IRIs or database IDs are used (see above) and an entity with the same IRI or ID already exists, it is not created again, but overwritten. Two variants for noting the IRI paths are supported:

A complete IRI path is given in the id column, for example "properties/languages/iso-de-de".
The components of the IRI path are given in the respective columns. The table name "properties" results from the table column (or from a temporary ID). In addition, the entity type "languages" must be specified in the types column so that the IRI path can be derived. The IRI fragment is given in the iri column, for example "iso-de-de".

Further behavior can optionally be controlled via the _action and _fields columns:

_action=clear: Entities contained in the current entity are deleted to make room for the following entities. For example, a section can be cleared before new items are imported in the following steps.
_action=skip: The entity is not imported, i.e. it is not overwritten and not created. The entity is only included in the import data to serve as intermediary link target or link source.
_action=link: The entity is only created if it does not exist yet. You can use this option, for example, to create non-existing properties without overwriting content of existing properties.
_fields: All tables in an import file share the columns. This is how the comment of a section as well as the text of an item both are imported from the content field. If the field is empty, it is cleared. Alternatively, the _fields column specifies which fields should be taken into account. List all fields to be considered, separated with commas (note: do not forget the ID fields). For example, this ensures that a comment for a section is not overwritten, but a transcription in an item entity is updated, although the import file contains a shared content column and, thus, an empty comment field would clear the comment of a section by default. If the _fields column is missing or empty, all fields are considered.

The Relational Article Model (RAM)

Which fields are available?

Project import fields

Article import fields

Section import fields

Item import fields

Types import fields

How are entities linked to each other?

How are entities updated instead of newly created?