The #attrsimilar function calculates similarity between a given vector and vectors in the index. For example, you can use it to detect 3D parts with similar shape or size.
#attrsimilar is a query node in the index, which returns all the documents matching the similarity query and calculates the similarity measure. As it does not filter search results at all, you must combine it with a #filter to return only the documents having a similarity higher than a given threshold value.
Note: Similarity is the inverse of distance and calculated as follows: similarity = 1 - distance
Important: The standard way to use #attrsimilar is inside a query template. See Defining Query Templates.
This section describes how to index and process signature values to be able to enter similarity queries in the Search API and calculate similarity measures.
Configure the Data Model and the Data Processing
The following procedure explains how to store a signature in an index field represented by the SIGNATURE_INDEX_FIELD variable.
Note: If you need to store multiple signatures, use a dynamic field. To do so, follow step 2 and in the field Advanced options, select the Multivalued and Store meta names properties.
1. In the Administration Console, go to Index > Data Model > Advanced Schema .
2. Add a SIGNATURE_INDEX_FIELD to store signature values.
a. Click Add field .
b. Enter a name, for example, my_signature_bin and set the type to Binary.
c. Set the new field as RAM-based for performance reasons.
3. Go to Index > Data Processing > Analysis pipeline > Document Processors.
4. Add a SimilarStringToPart document processor for part conversion to the pipeline, and in Input from, enter the name of the SIGNATURE_INPUT_META containing all values of the signature vector, for example, my_signature_meta.
This document processor can:
◦ Parse signature values and convert them into binary blob ready to use by the index.
◦ Delete the meta to create a part with the same name.
5. In the Mappings tab, create the mapping between the SIGNATURE_INPUT_META and the SIGNATURE_INDEX_FIELD.
a. Add a mapping source. Give it a name, for example, my_signature_meta and set its type to Part.
b. Add the SIGNATURE_INDEX_FIELD as mapping target. For example, target the my_signature_bin index field.
6. Click Apply.
Test the Configuration
1. Go to the API Console to push a test document.
a. In URI, enter a document name, for example, doctest.
b. In Metas, add your SIGNATURE_INPUT_META in the Name column and a list of float separated by spaces in the Value column.
For example, Name = my_signature_meta, Value = 0.458 -1.68 2
c. Click Push document.
The result must be "The document was successfully pushed."
2. Open the Search API and test the #attrsimilar function with the following syntax:
For more details about the use of #attrsimilar in the Search API, see the following section.
Use the #attrsimilar Function in the Search API
This section describes the use of the #attrsimilar function in the Search API, after the http://HOSTNAME:BASEPORT+10/search-api/search?eq=%23 part of the URL. Do not forget to remove the # before attrsimilar in the URL.
#attrsimilar Syntax
You can call the #attrsimilar function in a query using the following Search API syntax:
• the MULTICONTEXT_INDEX_FIELD variable corresponds to the dynamic field name containing the signatures.
• context_signature_1 is the name of a context in this dynamic field.
Similarity Functions
The similarity measure varies depending on the function used to compare vectors two by two.
Important: With most similarity functions, it is not possible to compare two vectors that do not have the same size. In that case, indexed documents for which the signature vector does not have the same size than the query vector, are not returned to the #attrsimilar node.
Similarity is calculated as follows: similarity = 1 - distance. For all _normed functions, we can summarize the calculation as:
similarity = 1 <--> close; similarity = 0 <--> far dist = 1 <--> far; dist = 0 <--> close
For non-normed similarity functions (for example Manhattan, Euclidian, etc.), the calculation is identical but the distance milestones change from [0;1] to [0,Infinity] and similarity is delimited by [-Infinity;1].
The cosine similarity function is the exception, with milestones -1 (unsimilar) and 1 (similar). The angular similarity function allows you to bring cosine similarity between 0 and 1, and be consistent with other similarity functions.
Function
Use
manhattan (default function)
For L1-normalized vectors.
Formula: sim = 1 - (Sum{abs(x1[i] - x2[i])}/2)
The similarity is between 0 and 1.
manhattan_normed
Same as manhattan with L1-normalized vectors first.
This syntax allows you to keep only the documents with a similarity measure higher than (>) the SIGNATURE_SCORE_THRESHOLD. For example, you could use a float value like 0.55.
You can also combine several signature computations in one #filter expression. For example:
Code Samples to Create Similarity Query Prefix Handlers
The standard use of #attrsimilar is inside a query template using the ELLQL language. For advanced Exalead CloudView users who want to manage similarity queries in UQL, you can adapt the following code samples.
To create your similarity query prefix handler, adapt the following code samples to your use case and package your custom prefix handler as a CVPlugin. For more information, see in the Exalead CloudView Programmer's Guide.
Code for simple attrsimilar prefix handler (SimpleAttrSimilarPrefixHandler.java)
@IsMandatory(false) @PropertyDescription("The binary index field that contains the signatures." + "If it's a dynamic field (multi valued and storing meta names) " + "use the following syntax: signatureName@indexFieldName. " + "This value can be overridden in query option \"index_field\".") public void setIndexField(String indexField) { this.indexField = indexField; }
public String getIndexField() { return indexField; }
@PropertyDescription("The distance function to use. " + "Some possible values: manhattan, manhattan_normed, euclidian, " + "euclidian_normed, cosine. " + "Normed versions of the distance must be used when the signatures " + "in the index have not been normed before indexing. " + "This value can be overridden in query option \"function\".") public void setDistance(String distance) { this.distance = distance; }
public String getDistance() { return distance; }
@IsMandatory(false) @PropertyDescription("The minimum similarity score that a hit must have " + "to match the query. " + "This value will generally between 0 and 1. " + "It can be overridden in query option \"filter_value\". " + "If empty, there will be no filtering based on the score.") public void setFilterValue(Double filterValue) { this.filterValue = new LongOrDouble(filterValue); }