Checklists for Production Deployments

This section describes best practices related to production environments and indexing speed.

This section presents the issues to be aware of before going into production.

• In the Administration Console > Index > Data Model menu:

◦ Delete the default data model if you do not use it.

◦ Disable the Trace all metas option if you used it to define your data model properties.

• Check the whole configuration to make sure that no debug tests are left behind.

See "General recommendations" in the Exalead CloudView Administration Guide.

• Secure directories containing sensitive data. See Secure Directories.

• Change the default password and the default login to access the Administration Console. See Secure the Access to the Exalead CloudView Interfaces.

• Deactivate roles that are not used in production. See "Deactivating roles for production" in the Exalead CloudView Administration Guide.

• See "Enable Cross-Site Request Forgery Protection (CSRF)".

• Schedule full compact. See "Configure Indexing" in the Exalead CloudView Administration Guide.

• Schedule backups. See "Backup/Restore Operations" in the Exalead CloudView Administration Guide.

For each of your connectors, check if you really want to enable the Store in document cache option. This is to avoid cluttering the document cache when it is not required, and therefore optimize its disk space consumption.

Remind that the Document cache is mainly linked to your build group configuration (controlled by the Administration Console > Deployment > Build Groups > Document cache option). Therefore:

• If it is enabled in your build group configuration, the source connectors deployed afterward on this build group will also have the Store in document cache option enabled in their Deployment configuration tab.

• If it is disabled, the source connectors deployed on this build group will not be able to push to the document cache. In that case, the Store in document cache option is not displayed.

For more information, see "Using Document Cache" in the Exalead CloudView Connectors Guide.

• Specify an isAlive query to monitor search server health. See "Monitor search server health with load balancers" in the Exalead CloudView Administration Guide.

• Set the log level to INFO and configure log rotation/purge to reduce the disk space used by logs. See "Rotate and purge logs" in the Exalead CloudView Administration Guide.

• Review reporters configuration and rotation frequency to avoid generating too much data. See "Analyzing User Queries with Reporters" in the Exalead CloudView Configuration Guide.

• If required, build your own monitoring using .RRD files. See "Build your own monitoring" in the Exalead CloudView Administration Guide.

• Define what the user sees in case of slice error or timeout in <DATADIR>/config/SearchLogicList.xml:

◦ sliceDownAction: when a slice is down.

◦ searchTimeoutAction: when a slice times out.

By default, these parameters are set to ignore, meaning partial results will be displayed. If set to error, an error page displays. See Exalead CloudView XML Configuration Reference Guide.

Consider fine-tuning the following settings when the load is very important (above 20 queries per second):

• Define the maximum number of concurrent queries processed by the search server in <DATADIR>/config/SearchAPI.xml:

◦ maxConcurrentQueries - when the limit specified is reached, incoming queries will be queued. The aim is to limit the number of parallel requests to minimize resource consumption. By default, this parameter is set to 0, meaning the number of cores of the search server multiplied by 2.

◦ See Exalead CloudView XML Configuration Reference Guide.

• Review the Thread pool size parameter value specified in the Mashup Builder, under Application > General > API properties > Concurrency policies. If your application is targeting:

◦ Only one Search API, set the Thread pool size to equal the maxConcurrentQueries parameter of the Search API.

◦ More than one Search API, set the Thread pool size to equal the sum of the maxConcurrentQueries parameters of the different Search APIs.

• Review the Max connections parameter value specified in the Mashup Builder, under /search > Feeds > Search API. When using:

◦ Only one feed, the Max connections value must be set to the maxConcurrentQueries parameter value of the Search API.

◦ Multiple feeds, lower the Max connections value to avoid server overload.

Recommendation: When using streamed queries, use a dedicated search server.

• Run performance testing scripts with your own queries (using Apache JMeter or similar tools) to understand how your platform will behave under your anticipated load. Run these performance tests in real volume and query per seconds conditions.

• Run endurance tests, that is, on duration (for example, during 24 hours) while indexing is running and queries are launched. Your endurance tests should cover all possible platform configuration scenarios.

• Technical tests (monitoring/deployment procedure/High Availability).

This section describes the best practices to reach an indexing speed of more than 1000 docs per second.

Above 50 properties, consider using dynamic properties. For more information, see "Creating Dynamic Properties" in the Exalead CloudView Configuration Guide.

• Do not set Compress content (in Index > Data Model > Properties > Other advanced options) for properties < 200 bytes.

• Remove everything unnecessary in the analysis pipeline.

• By default, a LanguageDetector is applied to your meta. If you already know the language, push a meta named language with the corresponding 2-letter ISO code instead of using the meta.

• By default, a NGramsExtractor in Semantic Processors is used to do advanced multiword spell check. If you do not need advanced spell check, remove it to accelerate indexing.

• In your java custom doc processor, use getMeta(name) rather than iterating on all metas.

• Check that your connector sends documents per batch at least 20 per 20:

◦ Explicitly using the addDocumentList PAPI command.

◦ Or via a BufferedPushAPI.

◦ Or by using the Buffering PushAPI Filter.

• Multithread your connector but stay below the number of logical cores. More threads would trigger too much CPU system.

• If you have more than 20 metas and no parts, use the MetaData Compaction PushAPI Filter, which will group them as a binary blob.

◦ In this case, at the top of your Data Processing > Analysis pipeline, just after the MimeDetector, you must have the native_textextractor - Text Extractor (text,html,exalead) which will uncompress this blob.

◦ Also remove the Standard Parts Merger.

• If you have good I/Os, you can set more slices (up to the number of logical cores).

• Set a number of Analyzers around the number of logical cores and proportional to your number of slices.

• If you do not manage to consume all your cores on machines with more than 16 cores, consider splitting your content into 2 build groups.

• Trigger jobs on RAM threshold only to do the biggest jobs possible (avoid sync(), setCheckpoint(sync=true), triggerIndexingJob).

• If you have made a lot of updates, run cvadmin cvdebug indexing full-compact-dih after your big batches of remove.