- Help
- »
- Import Data
CSV files and XML files can be imported into the databases using the "Import" button in the footer of the pages. Alternatively, the API can be used to import data. An R package and a Python package simplify API operations. The import function allows both, new articles, sections and categories to be created and existing entities to be updated.
The Relational Article Model (RAM)
To prepare data imports, a basic understanding of the data model will be helpful. Epigraf implements the Relational Article Model (RAM) to store documents in tables for projects, articles, sections, items, links, footnotes and properties.
When it comes to data import, the most crucial aspect of the RAM are IRIs. IRIs are globally unique identifiers. In Epigraf databases, each entity has an IRI and you use them to prepare an import table. For example, to import categories, the import table can be structured as follows:
| Id | Lemma |
|---|---|
| properties/categories/scifi | Science fiction |
| properties/categories/musical | Musical |
| properties/categories/drama | Drama |
In the example, the ID field is populated with IRI paths. Each IRI path consists of the target table properties, the property type categories and a unique IRI fragment for the entity. The table is fixed. Which types are available within a table is defined in the types configuration of a database. The type used here refers to the movies sample database. The IRI fragment is an arbitrary identifier, it can contain numbers or letters.
Using IRI paths, first, ensures that no new entities are created when the same data is imported twice. Instead, the import function compares the IRIs to the database to determine whether an entity already exist. Entities with an existing IRI path are overwritten, otherwise new entities are created.
Second, IRI paths are used to link records to each other. For example, to import text into an article, the import table can be structured as follows:
| id | name | content | projects_id | articles_id | sections_id |
|---|---|---|---|---|---|
| projects/default/movies | Movies | ||||
| articles/default/0001 | Chronicles of Narnia | projects/default/movies | |||
| sections/text/0001 | Abstract | projects/default/movies | articles/default/0001 | ||
| items/text/0001 | The Chronicles of Narnia is a series of films based on the novels by C.S. Lewis. | projects/default/movies | articles/default/0001 | sections/text/0001 |
In the example, an item entity with the description of a movie is created in the last line. The description ends up in an item of type text with the IRI fragment 0001. This item is inserted into the section with the IRI path sections/text/0001, which in turn is created in the article with the IRI path articles/default/0001.
Such import tables following the RAM can be imported directly as CSV files. Alternatively, XML files mapping the very same table structure can be imported. The R and Python packages support the conversion of data into the RAM format and upload import tables directly from R or Python scripts with a single command.
Which fields are available?
In principle, all fields that are included in the entity export or that are documented in the development documentation are available for the import. All target tables share their fields when importing, for example for the name of an article and the name of a section. Fields that are irrelevant for an entity – such as the content field for articles – simply remain empty.
For some fields, aliases can be used to keep the file clearer:
- In the respective tables, the field
typecan be used instead of projecttype, articletype, sectiontype, itemtype, propertytype, usertype, scope or from_tagname. - In the items table, the field
to_idcan be used instead of links_tab and links_id. Usually it contains an IRI path of the target entity. - In the links and footnotes tables, the field
root_idis sufficient instead of root_tab and root_id, provided that the table can be derived from the provided IRI path. The same applies tofrom_idandto_id.
See the aliases and their corresponding database fields as listed below.
Project import fields
| Alias | Explanation | Field in the data model | Example |
|---|---|---|---|
| id | IRI path | ||
| published | Publication state 0 to 4. | published | |
| type | Entity type | projecttype | |
| iri | IRI fragment | norm_iri | |
| sortno | sort number | sortno | |
| signature | Short title | signature | |
| name | Long title | name | |
| content | Project metadata in JSON format | description | |
| norm_data | Authority data | norm_data |
Article import fields
| Alias | Explanation | Field in the data model | Example |
|---|---|---|---|
| id | IRI path | ||
| published | Publication state 0-4 | published | |
| type | Entity type | articletype | |
| iri | IRI fragment | norm_iri | |
| sortno | sort number | sortno | |
| signature | Article dentifier (text) | signature | |
| name | Article title (text) | name | |
| status | Article status (text) | status | |
| norm_data | Authority data (text) | norm_data | |
| creator | Article author (IRI path) | created_by | |
| modifier | Article editor (IRI path) | modified_by | |
| project | Project (IRI path) | projects_id |
Section import fields
| Alias | Explanation | Field in the data model | Example |
|---|---|---|---|
| id | IRI path | ||
| published | Publication state 0-4 | published | |
| type | Entity type | sectiontype | |
| iri | IRI fragment | norm_iri | |
| sortno | Sort number | sortno | |
| number | Section number (number) | number | |
| name | Section name (text) | name | |
| signature | Alternative secion name (text) | alias | |
| content | Section notes (text) | comment | |
| layout_cols | Number of columns in a grid (number) | layout_cols | |
| layout_rows | Number of rows in a grid (number) | layout_rows | |
| articles_id | Article (IRI path) | articles_id | |
| parent_id | Parent section (IRI path) | parent_id |
Item import fields
| Alias | Explanation | Field in the data model | Example |
|---|---|---|---|
| id | IRI path | ||
| published | Publication state 0-4 | published | |
| type | Entity type | itemtype | |
| iri | IRI fragment | norm_iri | |
| sortno | Sort number | sortno | |
| value | A single value (text) | value | |
| content | Text content | content | |
| translation | Translation text | translation | |
| property | Linked category (IRI path) | properties_id | |
| pos_x | Position in the grid (number) | pos_x | |
| pos_y | Position in the grid (number) | pos_y | |
| pos_z | Position in the grid (number) | pos_z | |
| sections_id | Section (IRI path) | sections_id | |
| articles_id | Article (IRI path) | articles_id |
To be added: to_id, flagged, file_*, date_*, source_*
Types import fields
| Alias | Explanation | Field in the data model | Example |
|---|---|---|---|
| id | IRI path | ||
| type | Scope of the type configuration | scope | |
| iri | IRI fragment | norm_iri | |
| sortno | Sort number | sortno | |
| name | Type name | name | |
| caption | Type label (text) | caption | |
| mode | Mode (text) | mode | |
| category | Type category (text) | category | |
| description | Type decription (text) | description | |
| config | Type configuration in JSON format | config |
How are entities linked to each other?
Projects contain articles, articles consist of sections and sections contain items. The items in turn refer to properties. The link between all these entities is established during import via IDs. IDs can be created in three different ways:
- IRI paths (Internationalized Resource Identifiers) are particularly flexible and recommended, as they allow data transfer between different databases without knowing the internal database IDs. They are formed according to the scheme
<table>/<type>/<irifragment>. Example:properties/languages/iso-de-de. - Database IDs must correspond to an existing entity. They are used to overwrite existing data or to refer to existing data. These IDs are formed according to the scheme
<table>-<id>, where the placeholder<id>contains an existing numeric ID. Example:articles-1. - Temporary IDs are not imported into the database, but are only used for linking within a CSV file. They are formed according to the scheme
<table>-tmp<id>, i.e. the table name is followed by the prefix "tmp" after a hyphen and then a custom name, which can be composed of any letters and numbers. When importing entities using temporary IDs, database-specific IDs are automatically created and used for all fields with the same temporary ID. Example:articles-tmp123.
In general, IRI paths are suitable for both, importing new data and updating existing data. Database IDs only work for updating existing data. Temporary IDs are rarely useful; they are only useful for one-time initial imports because new entities are created each time the import process is repeated. Here is an example of an import table with temporary IDs instead of IRI paths:
| id | articles_id | sections_id | name | content |
|---|---|---|---|---|
| articles-tmp1 | An article | |||
| sections-tmp1 | articles-tmp1 | A section | Comment on the section | |
| items-tmp1 | articles-tmp1 | sections-tmp1 | The content of the section |
In the example, not only a separate ID is specified for a section, but also the ID of the associated item; the same applies to the item entity. The entities are therefore linked to each other during import.
In the import preview, you can see whether IDs in an import can be resolved to existing data. All entities found in the database, based on IRI paths or IDs, are highlighted in green. Unmarked entities are new to the database and will be created.
How are entities updated instead of newly created?
If IRIs or database IDs are used (see above) and an entity with the same IRI or ID already exists, it is not created again, but overwritten. Two variants for noting the IRI paths are supported:
- A complete IRI path is given in the
idcolumn, for example "properties/languages/iso-de-de". - The components of the IRI path are given in the respective columns. The table name "properties" results from the
tablecolumn (or from a temporary ID). In addition, the entity type "languages" must be specified in thetypescolumn so that the IRI path can be derived. The IRI fragment is given in theiricolumn, for example "iso-de-de".
Further behavior can optionally be controlled via the _action and _fields columns:
- _action=clear: Entities contained in the current entity are deleted to make room for the following entities. For example, a section can be cleared before new items are imported in the following steps.
- _action=skip: The entity is not imported, i.e. it is not overwritten and not created. The entity is only included in the import data to serve as intermediary link target or link source.
- _action=link: The entity is only created if it does not exist yet. You can use this option, for example, to create non-existing properties without overwriting content of existing properties.
- _fields: All tables in an import file share the columns. This is how the comment of a section as well as the text of an item both are imported from the
contentfield. If the field is empty, it is cleared. Alternatively, the_fieldscolumn specifies which fields should be taken into account. List all fields to be considered, separated with commas (note: do not forget the ID fields). For example, this ensures that a comment for a section is not overwritten, but a transcription in an item entity is updated, although the import file contains a shared content column and, thus, an empty comment field would clear the comment of a section by default. If the_fieldscolumn is missing or empty, all fields are considered.