Connectors : Default Connectors : IMAP Connector : About IMAP Configuration
 
About IMAP Configuration
 
Subject Filters
Indexing Email Threads
Gmail
Subject Normalization
The IMAP connector indexes attachments and links them with the mail. All supported documents are indexed and the metadata is extracted.
You add an IMAP connector source for each IMAP server you want to crawl.
Subject Filters
Subject filters are regular expressions that filter out individual emails from being indexed, based on words contained in the subject line. They work like anti-spam filters. These filters apply to all emails.
Indexing Email Threads
A thread is a set of emails that are replies to one another. The IMAP connector lets you index documents that contain a complete thread. The text includes the message body of each mail in the thread, quotes being removed when possible to avoid redundancies.
For your information, the other metas pushed automatically by the connector include:
unanswered: 0 if the mail has had an answer, 1 otherwise. The connector considers that a thread is answered if someone other than the original sender replied to the thread (A replying to A is considered as an unanswered thread).
rootsender: author of the thread's first mail.
participants: all the people that replied to the thread.
avgreplytime: average time (in seconds) between two mails in this thread.
firstReplier: first person to reply to the initial mail.
firstReplyTime: time elapsed between the initial mail and its first reply in the thread (in seconds).
mailnumber: number of mails in the thread.
Gmail
The IMAP connector automatically detects when the IMAP server is a Gmail server.
In this case, the connector pushes additional metas for the following Gmail-specific variables:
X-GM-MSGID: Gmail unique mail identifier
X-GM-THRID: Gmail thread identifier
X-GM-LABELS: Gmail labels
This allows you to access your emails directly in the Gmail interface by using the following URL: https://mail.google.com/mail/#inbox/X-GM-THRID
Subject Normalization
To detect when emails are part of the same thread, the basic algorithm uses the References or In-Reply-To fields. However, these can sometimes be corrupted, hence breaking the links. To overcome this issue, you can activate the Subject normalization option, so that threads and emails are grouped in one thread if they share the same subject (the analysis does not take the Re: strings into account).
The Subject norm is associated with the Subject normalization option. You can use it when email subjects share a common tag, for example [Test], because some email clients will detect this tag and insert the reply tags (Re:, R:, etc.) after it. This makes the Subject normalization process inefficient. To ensure that the subject is normalized consistently, exclude this tag using the Subject norm field.