Documentation

Pipelines are used to transform data from the relational article model into other data formats For example, they are used to create Word files, TEI documents or HTML content for a website.

After constructing a pipeline, you apply it to an article selection by clicking the export button. The export button is available in the footer of each article and also in the footer of the articles table. To export an entire project, select the respective articles using filters in the article table, i.e. narrow down the article list to a specific project.

Pipelines usually start with RAM data in XML format that is transformed by XSLT stylesheets. For complex output formats - such as DOC or ODT files - several transformation steps can be processed in succession. To inspect the initial RAM data, a) open a single article as XML, b) use the default data pipline with the full entity export option, or c) construct a simple pipeline that generates an XML document without any transformation. XSLT stylesheets required to transform the raw data are usually stored in the pipelines folder.

Pipelines contain a sequence of tasks that are processed one after another. A pipeline consists of three types of tasks:

Retrieval tasks: A pipeline begins by querying article data and writing it to one or multiple intermediary files.
Preparation tasks: Intermediary files are merged into a common file or folder.
Transformation tasks: Intermediary files are transformed into the target format using XSLT stylesheets.
Output tasks: The resulting files are prepared for download.

UTF-8 is used as encoding in all tasks. Non-printable characters such as the Unit Separator control character are filtered out.

Options: Adds an options object to the output file. Options can be selected by a user when starting the export job.
Job parameters: Adds the job entity to the output file. The job, for example, contains information about the current date and time and the server.
Project data: Adds the project entity to the output file, if a project was selected when starting the export job.
Article data: Adds the selected article entities to the output file.
Index data: Creates an index from all categories (=properties) used in the exported articles and adds the index to the output file. The index contains all properties used in articles that were processed in a preceding pipeline task. Therefore, place it after article data tasks.
Property data: Adds all categories of a selected category system (=property type). The task, for example, is used to export a complete list of literature stored as categories, regardless of whether articles include the literature.
Types data: Adds the types configuration to the output file. This way, for example, you can use settings for specific property types to process exported indexes or links settings for rendering annotations.

Exports are based on the selected articles when starting a job. Some data tasks allow for additional settings:

Option	Description
Article types / scopes	The article types can be restricted using a comma-separated list, for example "epi-article". In the types data task, the scope can be restricted, for example to "properties".
All articles in selected projects	Export articles that were not selected directly but are contained in the selected project. It is advised to restrict the article type as well, otherwise all articles in a project will be exported. The option is intended to be used for including generic articles for a project. For example, a specific article type can be used to store the introduction for a printed volume that should be exported along with other articles in a project.
Copy images	Copy images of the selected articles to the output folder.
Image item types	If article images are to be copied, you must provide a comma-separated list of item types containing the image file names.
Image metadata configuration	Based on the item type configuration, metadata is written into the image files. In addition, you can add metadata keys in the pipeline that complement or overwrite the item type settings.
Image folder in the current job folder	The target folder for images. By default, images are copied to the "images" folder within the job's folder.
Output file	By default, the article data is written to the default job file (e.g. job_1234.xml). You can explicitly define a filename and refer to the file in following pipeline tasks. By default, all data is written to one single output file. You can split the output into separate files by using placeholders in curly brackets in the output filename. Placeholders are replaced by entity data. Example: "article-{signature}.xml"

Bundle files: Concatenate all files in a folder into a single output file.
Copy files: Copy files to the job folder. Can be used, for example, to copy a template for ODT-files.

Depending on the task, further options are available:

Option	Description
Source root folder	The folder that contains files. Bundling is restricted to files in the job folder.
Source folder or file within the root folder	Either a file path or a folder path relative to the root folder. The bundle task, by default, and if not specified otherwise, concatenates all files in the job folder.
File with list of filenames	Optional. If the source is a folder, the file list can be filtered by providing a file with one file name per row. You could generate such a file list in an earlier stage of the pipeline.
Prefix and postfix	For file bundling, text that will be added to the top or bottom of the output file. For example, open a root xml-tag in the prefix setting: ``` <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <book> ``` Close the root tag in the postfix setting: ``` </book> ``` In the prefix and postfix, you can use placeholder strings to access job data. Example for the prefix in a TTL export pipeline: ``` a schema:Dataset ; a schema:DataFeed ; schema:DateModified "{created}^^schema:Date" ```
Target folder / output file	The target folder (for copy tasks) or output file (for bundle tasks) relative to the job folder.

Transform with XSL: Use an XSLT stylesheet to transform XML files generated in preceding tasks.
Search and replace: Use regular expressions to replace values in a file generated by preceding tasks.

Zip a file or folder: Zip a single file or a folder to prepare it for download.
Save to file: Send a file to the browser
Show downloads: Copy result files to a specific destination and show list of download links.

Depending on the task, the following options are available:

Option	Description
Folder or file to be zipped	Leave empty to zip the default job file. Enter a file name to zip a single file. Enter the name of a folder within the current job folder to zip its content. Make sure the folder or file was created in preceding tasks.
Outputfile	The name of the resulting file. Leave empty to use the default job file.
Files	In the show downloads task, a list of file names, one file per line. The files must exist before this task. You can prefix the file names with a caption for the download link, separated by an equal sign.
Download root folder	Other tasks create files in the job folder. Select the shared folder or the current database folder to copy the files to a destination outside the job folders.
Download folder within the root folder	Enter the name of a subfolder. Result files will be copied to this target. Can contain placeholders in curly brackets that refer to job data. For example, give the job a name "awesomebook113" and then copy the result to "data/books/{caption}" within the shared folder. The caption contains the name, or if not set, the id of the job. Or implement a name option in the pipeline, then copy the result to "export/job-{config.options.custom.name}" within the database folder. You must make sure the folder name is valid. Thus, usually you should not rely on placeholders alone, add at least some prefix. The caption placeholder is a good default option, as it falls back to the job id if no name is set.

Epigraf can be used to export images and automatically update their metadata. To store image metadata, usually the content field of an image item type is configured to hold several metadata values as JSON.

The following table shows typical image metadata fields. The attribute column refers to the Extensible Metadata Platform (XMP) standard.

What	Attribute	Example
Image title	xmp:Title	DI 34, No. 13 - Niederhausen, Protestant parish church - 1st half of 13th century
Image caption	xmp:Headline	Epitaph of Johann von Schmidtburg
Creator	xmp:Creator	Thomas G. Tempel
Source	xmp:Source	Image archive ADW Mainz
Copyright information	xmp:Rights	Heidelberger Akademie
Terms of use	xmp:UsageTerms	CC BY 4.0
Credit / Provider	xmp:Credit	ADW Mainz, Inscription commission
Copyright status	xmp:Marked	True: Protected by copyright. False: Public domain.

To transfer metadata into image files, you need to create an export pipeline that puts all images into a ZIP-archive. In the item type configuration for images you can explicitly define metadata input fields. In the export pipeline you use placeholder strings to map article data (including image items) and project data to specific metadata attributes. to images when exporting. To store image metadata, usually the content field of an image item type is configured to hold several metadata values as JSON.

The following table shows typical image metadata fields. The attribute column refers to the Extensible Metadata Platform (XMP) standard.

What	Attribute	Example
Image title	xmp:Title	DI 34, No. 13 - Niederhausen, Protestant parish church - 1st half of 13th century
Image caption	xmp:Headline	Epitaph of Johann von Schmidtburg
Creator	xmp:Creator	Thomas G. Tempel
Source	xmp:Source	Image archive ADW Mainz
Copyright information	xmp:Rights	Heidelberger Akademie
Terms of use	xmp:UsageTerms	CC BY 4.0
Credit / Provider	xmp:Credit	ADW Mainz, Inscription commission
Copyright status	xmp:Marked	True: Protected by copyright. False: Public domain.

To transfer metadata into image files, you need to create an export pipeline that puts all images into a ZIP-archive. In the item type configuration for images you can explicitly define metadata input fields. In the article export task, contained in a pipeline, you use placeholder strings to map article data (including image items and project data) to specific metadata attributes.

Construction of pipelines

1. Retrieval tasks

2. Preparation tasks

3. Transformation tasks

4. Output tasks

Image export including metadata