Named Entities Matcher

For example, when mapped to facets, they provide useful navigation entry points. But they may also be used as input for further processing such as relation discovery or to highlight some relevant keywords.

Example of Named Entities with Filters
Text	Filter using POS	Use Known words	both filters	No filter
J. Brown	NE.person	no annotation	no annotation	NE.person
J. Told	NE.person	no annotation	no annotation	NE.person
Mr Brown	NE.person	NE.person	NE.person	NE.person
Mr Told	no annotation	NE.person	no annotation	NE.person
Mr J. Brown	NE.person	NE.person	NE.person	NE.person
Mr J. Told	NE.person	NE.person	NE.person	NE.person
Teddy Brown	NE.person	NE.person	NE.person	NE.person
Teddy Told	no annotation	NE.person	no annotation	NE.person

Example of Named Entities with Filters

Text

Filter using POS

Use Known words

both filters

No filter

J. Brown

NE.person

no annotation

NE.person

J. Told

NE.person

no annotation

NE.person

Mr Brown

NE.person

Mr Told

no annotation

NE.person

no annotation

NE.person

Mr J. Brown

NE.person

Mr J. Told

NE.person

Teddy Brown

NE.person

Teddy Told

no annotation

NE.person

no annotation

NE.person

List of Named Entities Classes and Subclasses Detected by the Named Entity Matcher
NE Type	Annotations	Description	Examples
People	NE.person	Rule-based matching and an ontology of first names, titles.	"John Smith"
	subclasses:
	NE.famousperson	Exact name matching based on an ontology and rules	"Albert Einstein"

	NE.partialperson	Patterns in a rules matcher	"Mr Smith" or "J. Smith"
Organization	NE.organization	Based on ontology and rules	"EXALEAD" "Independant Human Right Commission"
	subclasses:
	NE.organization.corporation		"EXALEAD" "Walt Disney Company" "Burger King"
	NE.organization.governmentorganization		"NATO" "Department of Defense" "The Supreme Court"
	NE.organization.nongovernmentorganization		"Greenpeace" "Sea Sheperd Conservation Society"
	NE.organization.educationalorganization		"Harvard" "MIT" "Science-Po Paris"
	NE.organization.sportsteam		"Arsenal" "PSG" "Lakers"
	NE.organization.miscellaneousorganization		"PADI" "Ju-Jitsu Association"
Place	NE.place	Ontology-based matching	"New Orleans"
	subclasses:
	NE.place.city		"Cambridge"
	NE.place.country		"United Kingdom"
	NE.place.state		"California"
	NE.place.otheradministrativearea		"Greater London"
	NE.place.landform		"Mediterranean Sea" "The Highlands"
	NE.place.civicstructure		"Madison Square Garden" "Royal Albert Hall"
Event	NE.event	Rule-based matching	"2nd New York Jazz Festival" "London 2012"
	subclasses:
	NE.event.cultural		"Avignon Theater Festival" "Asian Regional Meeting" "Cuba's Bishops Conference"
	NE.event.military		"Falklands War" "World-War-II" "Battle of Waterloo"
	NE.event.natural		"Hurricane Katrina" "Blizzard of 1993"
	NE.event.political		"French presidential election" "Inauguration of Barack Obama"
	NE.event.religious		"Easter Monday" "Aïd el Kebir" "Pessah"
	NE.event.social		"Independence Day" "World Day for Migrants and Refugees"
	NE.event.sport		"2008 Summer Olympics" "Football World Cup" "Moto GP Championship"
	NE.event.security		"Suicide bombing" "Spinboldak attack"
Date	There are several annotations, see below	Rule-based matching
	NE.date	Normalized to European numerical standard "day month year" with two-digit days and months	"14 06 1982" "05 12 2003"
	NE.date.full	If found, the normalized day of the week is prepended	"Mon 13 02 1977" (English) "Lun 13 02 1977" (French)
	NE.date.uk, NE.date.us	For English text, two annotations are set for ambiguous dates. Use the annotation NE.date.uk for British texts and NE.date.us for American texts
Price	NE.money	Rule-based matching and ontology for currencies.	"$2.73" "4,5€" "three hundred million dollars"
	subclasses:	The following subclasses aim at simplifying currency conversions
	currency.unity		dollar US
	currency.quantity		150
French postal address	NE.address.fr	Rule-based matching and ontology of French cities	"10 place de la Madeleine, 75008 Paris"
French phone number	NE.phone.fr	Rule-based matching	"(+33)6.82.33.15.12" "05 64 222 222"
Time, duration, and time ranges in French	NE.time	Rule-based matching	"13h45" "3 h 56 min 12 sec" "de 7h03 à 17h28"
Email	NE.email	Rule-based matching	john.smith@gmail.com
URL	NE.url	Rule-based matching	"https://www.exalead.com"
IP v4 address	NE.ip	Rule-based matching	"192.168.204.120"
Credit card	NE.creditcard	Generated by Basis Tech tokenizer and rule-based matching	"378282246310005" (American Express) Note: The following formats are not supported: • Australian BankCard: 5610591081018250 • Some VISA Number pattern: 4222222222222 • Dankort (PBS): 76009244561 • Dankort (PBS): 5019717010103742 • Switch/Solo (Paymentech): 6331101999990016

Extract entities with...	when...	for example
rules	• Entities are either numerical or textual and values are countless. • Context can be identified. For example, in "$100", the "$" symbol shows us that 100 is a price. • Some parts of your entities are already annotated by the Named Entities Matcher or another resource.	dates, phone numbers, emails, URLs, addresses, prices, etc.
ontology resources	• Entities are textual and values can be listed (not infinite) • A resource already exists (employees, categories) • The context does not help to identify them • Listing them is not a big challenge • You need to normalize output values.	• first names, cities, days, months, etc. • To normalize group of entities like: USA, United States, United States of America, Etats Unis, Estados Unidos -> in United States of America
both rules and ontology resources	The number of values to extract is countless but parts of these entities are a clue to recognize it	Mr Obama We need: • a resource to annotate Mr, Mister, Miss, etc. • a rule to extract persons’ names when we have an annotation next to a capitalized word.