Skip to main content

Command Palette

Search for a command to run...

Auto-Pilot Type-checking module

Updated
5 min read
Auto-Pilot Type-checking module

The type-checking module is used to further validate raw extracted data by verifying the values initially extracted by the Machine Learning (or NER) model and checking for errors or inconsistencies to ensure that it meets the required standards and formatting. The module exists of the following 4 main components:

  • Required

  • Type

  • Additional Settings

  • Key name

In this article, we will discuss how mandatory fields work, the different data types available, and the importance of the key name. The additional settings for each data type will be covered in a separate article, including a detailed step-by-step guide and examples. The link to these articles can be found under the heading for each data type.

Required entities & error messages:

If an entity is marked as required, it must have a value in the Output Window to be processed. If this is not the case, the user will receive an error message and the document will not be able to be processed. Below you can see how to mark an entity as required. This can also be identified by the red dot that appears on the left side.



In addition to what has been discussed previously, it is possible that the extracted value from a document does not match the correct Data Type (which will be discussed later in this article) or does not comply with the Additional Settings. In all of these cases, the user will see an error notification and the document is not able to be processed without manual adjustment.

Data types



Coming soon:

  • Address

  • Currency

  • Timespan

Type: Any

The data type "any" refers to a type that can be any value. This is useful when the value of a field is unknown or may vary, as it allows the type checker to accept any value without raising an error.

For example, if a field may contain both text and numerical values, it may be necessary to use the data type "any" to allow for both types of values.

Type: Calculated

The data type "calculated" refers to a type that is calculated based on the values of other fields. This allows users to automatically calculate values based on other data, such as totals or averages.

When using the data type "calculated", it is important to carefully define the calculation that will be used to determine the value of the field. This may involve using mathematical operators or functions, such as addition, subtraction, multiplication, or division.

When the data type of an entity is calculated, make sure that the fields involved are indicated as "numbers"

Read this article for a thorough explanation.

Type: Date

The data type "date" refers to a type that represents a calendar date. This data type is used to store and manipulate dates in a standardized format, such as "YYYY-MM-DD".

Auto-Pilot has an in-house date parser that converts unstructured dates to a single, standardized format.

For example, this parser can convert 24th of January '23 to 24.04.2023

Read this article if you are solely interested in our date parser API.

Type: Date range

The data type "date range" refers to a type that represents a range of dates, such as a start date and an end date. This data type is used to store and manipulate date ranges in a standardized format, such as "YYYY-MM-DD to YYYY-MM-DD".

Like our Date parser mentioned in the previous section, we also have an in-house date range parser available.

For example, this parser can convert: 'END/JAN - 1H FEB' to '21.01.2023 - 10.02.2023'

The conversion rules can be tailored to meet specific customer requirements.

Type: Lookup API

The data type "Lookup API" refers to a type that retrieves data from an external API (Application Programming Interface) using a specific key or reference. This allows businesses to automatically retrieve and integrate data from other systems or databases into their own processes.

Here are some examples of when you might need the lookup API:

  • Look up if an extracted value exists in an external database, such as client name, product name or related fees.

  • "Standardize different input forms that refer to the same concept or data point. For example, 'RTM', 'RDM', 'RDAM', and 'R'dam' all refer to 'Rotterdam', but the endpoint only allows the format 'Rotterdam'. This is also common in medical jargon."

Type: Number

The data type "number" refers to a type that represents a numerical value. This data type is used to store and manipulate numerical data, such as prices, quantities, or measurements.

When using this entity to calculate the value of another entity, make sure to set the data type of this entity to "number".

Type: Regular Expression

The data type "regular expression" refers to a type that uses a specific pattern or syntax to match and extract data from a string. This data type is commonly used to extract specific pieces of data from unstructured text, such as names, addresses, or phone numbers.

When using the data type "regular expression", it is important to specify the pattern or syntax that will be used to match the data. It is also important to consider the range of data that may be matched and ensure that the regular expression is flexible enough to handle variations in the data.

See a couple of specific examples in this article.

Type: String

The data type "string" refers to a type that represents a sequence of characters, such as words or sentences. This data type is used to store and manipulate text data, such as names, addresses, or descriptions.

Wanneer gebruik je dit?

Key name

This is the name that is used in the JSON file that is generated as output when a document is processed.