A new component, the Consolidation Server, has been developed to replace the Structured Data Consolidation connector (SDC). It allows you to define consolidation rules for documents before pushing them into the Indexing Server. The Consolidation Server therefore fits before the indexing server in a build group. It can be viewed as a transformation phase between the source connectors and the Indexing server. It benefits from a better integration than SDC and thus provides more visibility and information on processing operation and performances monitoring.
The Consolidation server allows you to cover the following use cases:
• Manage incremental updates: Consolidation is very helpful to index relational data and handle this flattening during an incremental index build, that is to say, to take updates as they come instead of rebuilding the entire index when an object changes. The incremental update is a complex task as it requires calculating the impact of any change and building complete documents according to projection rules. To do so, the Consolidation Server keeps track of object relationships and stores data to rebuild documents for CloudView.
• Create and manage links between documents: The consolidation server allows you to create links between objects coming from various sources easily. Using Groovy pre-processors, you can now define rules based on documents metas to create any wanted links between objects. These processors can be defined directly within the Administration Console and offer more flexibility than what is available with SDC.
• Enrich business objects with linked objects: Adding relevant information on a document by taking advantage of existing links can easily be done using aggregation processors. Multiple chained aggregation processors targeting multiple document types can be defined directly within the Administration Console. These processors currently support Groovy language and SDC specific language. Java Aggregation processors will be provided in next release.
For more info: see the Consolidation Server Guide.
Warning: The Consolidation Server is still considered as a Beta version. The main reasons are that existing APIs and the storage format are still subject to change, which could lead to a full re-indexing in case of upgrade. Note also that no auto migration will be possible on Groovy code; we will only provide migration procedures. This component performance will be improved in future releases.
What about the SDC Connector?
• SDC benefits from various bug fixes done for the Consolidation Server.
• It is now possible to declare hierarchical type for SDC objects.
• SDC is still supported in R2015x version but might be deprecated in the next release.
Refactoring of Related Terms Extraction
A deep refactoring of the Related Terms code has been done to improve quality, performances and unlock new usages.
Extraction:
• Extraction has been reviewed to increase quality of extracted terms (#21836).
• Double pass indexing is no longer needed to activate the related terms extraction.
• Single words can now be eligible as related terms.
• Custom annotations can be added on tokens by user to enrich the related-term dictionary. This allows you to express your own similarity vector (#13672)
• You can now also add a Related Term Blacklist as resource. This resource can be managed through the Business Console. Once defined, you can declare it on a RelatedTerms processor using the new blacklistResource parameter.
• Blacklisting terms at indexing helps you reducing the index memory footprint irrelevant terms that occur too frequently.
• New parameters have been added in the Search Logic XML configuration to enhance the quality and performances of Related Terms calculation. In the API Console, click Manage and search for SetSearchLogicList. The <RelatedTermsSynthesisConfig> node now has the two following parameters: maxSliceCategories and maxRelatedTermsHitsPerSlice. For more information, see the Technical Reference Guide > Configuring Search Queries > Related Terms > Configure Related Terms.
Search: Related terms now use standard storage in the index (ValueFacet) which unlocks several behaviors: (#14859, #22060)
• Possibility to parallelize the synthesis between slices -> better performances inOp multi-slices environments
• Possibility to have multiple related terms fields, for example, related terms coming from title, related terms coming from document with a certain type, etc.
Value:improve extracted Related Terms quality and Usability
Business Console
• User Interface:
o Revamping of the UI for semantic resources (screens, labels and tooltips have been simplified and harmonized).
o Wizards have been added to easily import resources from CSV or Excel files (#18790).
o The Test tabs have been improved to test the behavior of ontologies, fast rules and semantic extraction. They allow you to enter a sample text and check the annotations generated by your configuration.
o While editing ontology, the search tool now retrieves matching results for both forms and entries values (#22782).
• Alerting: users can create custom alerting publishers and use them in the Business Console (#22651). For more info: see the CloudView Programmer Guide and the Business Console User Guide.
Supported platforms
R2014x was the last CloudView version to support RedHat 5. R2015x supports RedHat 6 only. We now also support CentOS 6. Check below the full list of supported platforms for R2015x.
OS
Additional information
Windows 2008 Server
Windows 2012 server
Red Hat Linux version 6
6.1 to 6.5
SUSE Linux Enterprise 11
SP3 only
CentOS version 6
6.1 and higher
Solaris 10
Some document conversion features are not available (preview, thumbnails)
Other Improvements
Performances
• Far less RAM consumption on index and searchserver when used at high qps on linux machines with a lot of cores (due to drastic fragmentation elimination) (#21503)
• Alphanum sort performances on Windows server can be improved by choosing the "Case Sensitive" policy in the "Sort & Relevance" configuration (#20791).
• Enable fast TCP loopback on Windows 8/Server 2012 and later (#22575). Improves inter process communication performances (#22321).
• Performances for Date Facet are now equivalent to Value Facet (#20756)
• Faster query initialization when doing synthesis on hundreds categories (#22483)
• Faster queries when retrieving thousands different metas from dynamic fields (#22482)
Connectors
• Abort operation on CSV connector is now taken into account during the processing of a file, and not only between the processing of two files (#22076).
• It is now possible to clear connector documents without clearing them from the buildgroup cache. It is then possible to repush them from cache if needed (#22483).
• JDBC Connector has been improved for trigger-based incremental synchronization mechanisms. There are now more configuration options allowing you to better manage incremental operation (#22280). For more information, see CloudView Connectors guide > JDBC Database Connector > Examples of JDBC Database Connector configurations > Configuring trigger-based synchronization with two checkpoints.
• Connector status is now set to "working" when clearing document operation is launched (#20841)
Platform
• CVDiag now includes all plugins deployed on the instance, to simplify the EXALEAD Support team investigations (#20365).
• Cloudview probes are now published over JMX to allow other monitoring systems to easily retrieve them (#21756)
• Every JVM process running OutOfMemory is now killed and heap dump is provided in the run folder (#22689)
• A new connector/indexing server crash recovery utility has been introduced (#22505). When the connector server or the indexing server crashes while scan operations are running, the Connector server can relaunch another scan a few minutes afterwards automatically. This allows CloudView indexing to be up and running. The behavior of this utility is the following:
o Initial waiting period before relaunching a scan: 30s.
o The waiting period is doubled for each retry.
o The maximum waiting period is limited to 1 hour.
The maximumNumberOfScanRetries property can be set in the Connectors.xml file or through the MAMI Console. This utility is enabled by default (it is set to 16), to disable this feature, set the value to 0.
• Better multi hosts monitoring for installation deployed on EC2 or cluster (more generally where hostname is not public) (#21531)
• The Push API can now provide the list of checkpoints with their synchronized states (see the Connector Programmer Guide > enumerateCheckpointInfo) (#21692)
• The indexing status of documents can be checked using the Push API. The setCheckpoint method now provides a usable serial that can be used with the areDocumentsSearchable(serial, onAllReplicas, ignoreDetachedReplicas) method (see the Connector Programmer Guide > areDocumentsSearchable & setCheckpoint ) (#21289). This serial can even be provided without requesting for an explicit index synchronization, therefore without any impact on indexing performances.
Document format
• MPEG files are now supported. Basic meta-data (custom meta-data and common fields) are extracted. Thumbnails and preview are also available (#17893).
• AT&T Troff format (http://en.wikipedia.org/wiki/Troff) is now supported (#22713). You can test this feature by indexing /usr/share/man on any Unix machine. Only ASCII and UTF-8 are supported. This feature requires groffer product (roff2pdf) to be installed on the platform.
• Postscript (PS, EPS) formats are supported (#22641). This feature requires Ghostscript product (ps2pdf) to be installed on the platform.
Indexing
• Dictionary Builder explicitly release native memory (reduce memory allocation failure risks for the large dictionaries often built) (#22327)
• A new processor FormatCheckerDate is automatically generated for Date/Time fields declared in the DataModel using the format defined directly on the field. It replaces the former Date Formatter processor, kept for backward compatibility (#21159).
Search
• It is possible now to use Dynamic fields content in virtual expression with the use of the #extract function (#15903)
• Dynamic Properties are now supporting date/time types (#19768)
• The Search API parameteradd_hit_metastargeting a dynamic field now provides a syntax to select only a subset of the metas stored inside a dynamic field based on patterns. See Search API Parameters Reference (#22555)
• Unclipped number of categories for a facet is provided in answer group info (#16755)
• Native support for binary retrievable fields (#21456)
• Cache expiration time for ResourceFilter and FilePackerController is configurable (#22044)
• Streaming output mode respects the searchlogic limits (number of hits) (#17153)
• Virtual fields language is now able to use alphanumerical and value fields (see Virtual Fields Language Reference) (#22170)
• The #attrnum ELLQL function on numerical retrievable fields has the same features for range operator than #num ELLQL function (#20509).
Semantics
• Enhancement of Business Console UI - the configuration of semantic resources has been reviewed to simplify and normalize the creation of annotations and rules. See "Resources > Synonyms | Ontology Matcher | Semantic Extractor"
Value: simplify semantic resources management
For more info: see "Working with Business Console > Semantic Resources" in the Business Console User Guide.
• Enhancement of currency extraction - In addition to NE.currency, there are now two new annotations: currency.unity and currency.quantity
Value: simplify the mapping of extracted information
For more info: see "Named Entities Matcher" in the CloudView Technical ReferenceGuide.
<Annotation displayForm="150 dollar US" displayKind="exact" tag="NE.money" nbTokens="3" trustLevel="100" />
• Enhancement of Geolocated Named Entities - Geolocated Named Entities benefit from a new annotation: exalead.geoloc
It is now possible to generated GPS points related to extracted named entities at indexing time. The resource has also been improved to offer a better unified result. For example, countries are now summarized under their English form (#15076).
Value: no need to call an external geolocation API to display hits on widget maps.
For more info: see "Configuring Search Queries > Geographic Search > Use geolocation based on place detection" in the CloudView Technical Reference Guide.
• Use resource of known words to disambiguate NE candidates (#22774) - this option is only available for English and French. It uses pre-compiled dictionary resources of known words to disambiguate named entities candidates.
Value: useful to avoid getting too much Named Entity "noise".
For more info: see "Appendix: Semantic Processors > Information Extraction Processors > Named Entities Matcher" in the CloudView Technical Reference Guide.
• Clearer syntax for Synonyms equivalence classes (#21562) - It is now possible to define left, right or symmetric expansions for synonyms with the equivalenceClass attribute.
For more info: see "Configuring Search Queries > Configuring Query Expansion > Synonyms > To create a synonym resource file " in the CloudView Technical Reference Guide.
• Lemmatization expansion module is not applied by default to synonyms anymore (#21426). If you have a Query expansion config on a prefix handler, combining Synonyms and Lemmatization module, performances will be improved as lemmatization will not be applied on synonyms forms anymore.
It is possible to change this default behavior by setting lemmatize_synonyms option at true in the lemmatization module configuration.
• Keep a single annotation entry for multiple matching forms(#18076). An annotation entry may sometime be generated by multiple forms of the same word. For example, if you have the following entries:
The word whatever will match both forms and therefore generate the annotation entry twice. In advanced use case, while mapping annotation entries to categories, it might lead to bad counting due to these excessive annotation entries. It is now possible to limit the generation of annotation to a single one for any entry using the trustLevelBasedDedup option.
For more info: See the "Configuring Semantic Resources > Adding Ontology Resources > Configure ontology annotations > To keep annotations with the highest trust level only" in the Business Console User Guide.
<Entry display="anything">
<Form value="whatever" level="normalized"/>
<Form value="whatever" level="phonetic"/>
</Entry>
• The NamedEntitesMatcher processor Set of Rules to use field provides the list of available sets of rules and for each of them displays a tooltip with output annotations (#20265)
• The FrequencyQuery used by the DictionaryClient to retrieve words frequencies now has an extra setNGram() method, which allows you to request an ngram frequency instead of a plain word frequency (with the setWord() method) (#22167)
• Transliteration is activate by default at search time (Ł -> L) (#21425)
Documentation
• Business Console User Guide:
o Documentation of new semantic resources configuration. Description of required CSV, XML files for each type of resource (see "Working with Business Console > Semantic Resources")
o Plan entirely reviewed for more readability and consistency.
• Technical Reference Guide: All semantic processors are now documented. Several have also been reviewed to include better explanations (see "Ontology Matcher")
• Installation & Administration Guide: New checklist added for "Best practices to increase indexing speed" (see "Deploying CloudView > Checklists")
Over 640 technical Questions & Anwers are now on Dassault Systèmes Knowledge Base. You can now find CloudView how-tos and troubleshooting tips by searching the DS Support Knowledge Base (powered by CloudView). To log in, use your DS Passport (for customers or partners) or trigram (for DS employees).
Upgrade notes
• If you modified the default DateTimeFormatter generated by the data model, you'll need to manually do the mapping of your output context to the corresponding data model field. There is now a new processor FormatCheckerDate that is automatically generated for Date/Time fields declared in the Data Model, using the format defined directly on the field (#21159).
• No more warmup query config by default in mashup application (#22413). Note that warmup queries on secured pages are not available.
• Index content is changed during migration, so you really need to detach slave slices before migration!
Changes that can break things
• Related terms built in a previous CloudView version will be lost. You will have to reindex your data to get them once again.
• If you parse the xml output of the SearchAPI, Related Terms are no longer in the <keyword> tag. They are now present in the <Facet name=rt_keyword> tag. Note: If you use the java SearchAPI client, the getRelatedTerms function still has the old behavior but is deprecated.
• Date/Time fields created without the Data Model (directly in the Administration Console > Data Model > Advanced Schema) no longer benefit from a default Date/Time Formatter. You will need to define an output format for these fields in your search logic(s) or using the parameters syntax of the Search API. Example: add_hit_meta=analysisdate,index_field:analysisdate &hit_meta.analysisdate.operation.format_analysisdate.type=time_format &hit_meta.analysisdate.operation.format_analysisdate.property.outputformat="%m/%d/%Y %H:%M:%S" Note: if no output format is defined, the value returned by default uses a timestamp in an internal format. It should return a valid unix timestamp in further release (#22936).
Major fixed issues
Ticket
Title
Existing since
Component
#22124
Monitoring Console is broken when 2 SDC connectors or more are deployed
V6R2014
Monitoring Console
#22873
Alerting (RealTime) : alerts not triggered
V6R2014x.SP3
Alerting
#22781
Business Console : RulesMatcher edition - SUBPART element
V6R2014x.SP3
Business Console
#22198
CSV Editor (SpellCheck, suggest, ...) does not work on Business Console
V6R2014x.SP2
Business Console
#21026
Apply changes with 30 connectors can lead to product instability
Prehistoric defect
Config system / APIs
#20830
JDBC incremental scan issue
V6R2014 SP2
Connectors / Connector JDBC
#20587
Dropbox connector: config modification not effective without "clear documents" action
V6R2014x
Connectors / Connector Dropbox
#22355
Some tar file makes FS connector crash
V6R2014x.SP3
Connectors / Connector File Systems
#21088
"aborting command" blocks "apply configuration"
V6R2014 SP2
Connectors / Connector JDBC
#20798
Missing security tokens with XML Connector
V6R2014x
Connectors / Connector XML
#20943
It should not be possible to add SDC plugin for V6R2014x.SP1 and newer
V6R2014x.SP1
Connectors / Structured Data Consolidation
#22372
Cannot enrich documents pushed by ENOVIA SBA Connector with SDC directives
V6R2013x.SP2
Connectors / Structured Data Consolidation
#22750
GDS compact fail under windows with SDC
V6R2014x.SP4
Connectors / Structured Data Consolidation
#21992
Multi-sheet autoCAD conversion issues
V6R2014x.SP2
Convert
#21382
Random corrupted strings inside MIME message unserialization
Prehistoric defect
Core
#22156
Almost Crawler loop crashing when scheduled through SchedulingConfig
V6R2013.SP2
Crawler
#22370
Indexing server with many slots
V6R2014x.SP3
Generic
#21651
V6R2014SP2 - index6 loop-crashing GDS lib
V6R2014 SP2
Generic
#21503
Huge fragmentation even with mmap_threshold=4K
V6R2014x.SP1
Generic
#21698
Indexing server stability (suddenly went to high kernel time)
V6R2014x.SP1
Generic
#19470
Gateway stuck with lot of thread
V6R2013x.SP3
Generic
#22795
Warmup crashlooping BG
V6R2014x.SP4
Generic
#22687
Unable to detect langue on attached documents
V6R2014x.SP3
Indexing
#21232
Crash of indexing server, assert on url failure
Prehistoric defect
Indexing
#22608
Blacklist mechanism not working
V6R2014 GA
Indexing
#22566
Resource Manager not usable in the Semantic Query Analyzer for nested processors
V6R2014x.SP3
Search
#21993
Searchserver consuming all available memory on OnePart app
V6R2014x.SP2
Search
#21699
Searchserver export timeout / cancel
V6R2014x.SP1
Search
#22751
Security influences the score of results
V6R2013x.SP2
Search
#20574
Slice crash on getValueFromVId
V6R2013
Search
#21867
#sequence operator does not work alongside #split operator
Prehistoric defect
Search
#20722
When securing cookie on WAS, the enovia security source is not working anymore
V6R2013x
Security
#22142
Issue with Bengali and Sinhala Tokenization
V6R2013x.SP3
Semantic
#22289
Named entities performance
V6R2014x
Semantic
Other fixed issues
Ticket
Title
Existing since
Component
#13298
Cannot export PDF in a secured Mashup UI
V6R2012x
360/Mashup UI
#17129
Tags widget does not work if loaded by AJAX trigger
Prehistoric defect
360/Mashup UI & API
#20848
Advanced Table widget configuration should not have the same storage key by default
V6R2014 SP2
360/Mashup UI & API
#22304
Mashup packed js cache not regenerated after config update
Prehistoric defect
360/Mashup UI & API
#22725
Issue with highcharts resize with layout in %
Prehistoric defect
360/Mashup UI & API
#22883
UI bug when scrolling on Facet dropdown with pointer (not scroll)
V6R2014x
360/Mashup UI & API
#14791
Tag Cloud widget does not work with multidims (works only with h2d)
V6R2013
360/Widgets
#20942
"Drag and drop plugins" is not working with Chrome > 2012R2
V6R2014x.SP1
Admin-ui
#21089
Change the way the log messages are displayed in Admin UI
V6R2014x
Admin-ui
#21474
Semantic Analysis: chunker is only available in 3 languages but Admin UI proposes more
V6R2014x.SP1
Admin-ui
#21555
Api-ui xml formatter adds whitespaces when CDATA contains "\n"
Prehistoric defect
Admin-ui
#21557
Expansion control of dynamic property display doesnt work
V6R2014x
Admin-ui
#21693
Dynamic prop : hit content is not correctly configured
V6R2014x.SP1
Admin-ui
#21971
Disable normalizer through Admin UI is not working
V6R2014 SP2
Admin-ui
#22256
We don't know what is used in the Connector Scheduling Admin UI
V6R2014x.SP2
Admin-ui
#22257
Wrong URL to access MAMI wsdl
V6R2014x.SP3
Admin-ui
#22455
The sort order part of a facet inside h2d facet is not moved when you move them
Prehistoric defect
Admin-ui
#21590
Alerting - Alert Group configuration: problem setting Result mode
V6R2014x.SP1
Alerting
#22496
Alerting: wrong charset in notification (http POST)
V6R2014x.SP3
Alerting
#20930
Incorrect cron expression goes undetected and prevents gateway to start
V6R2014x.SP2
Alerting
#21001
Wrong cron syntax for Alerting (documentation & UI)
V6R2014x
Alerting
#22169
Ontology matchers and annotation managers are not identified by their full name in test interface
V6R2014x
Business Console
#22171
Slow perfs in Annotate Text in Semantic Console
V6R2014x
Business Console
#1561
Invalid cron expressions error messages are not properly thrown to end user
Prehistoric defect
Config system / APIs
#20023
Resource manager does not return full exceptions
V6R2014 SP1
Config system / APIs
#21522
Missing config check for alert group without name
Prehistoric defect
Config system / APIs
#22654
ApplyConfig is not synchronized
V6R2013
Config system / APIs
#17473
Apply change does not work when unchecking "Follow directory symlinks" option
V6R2013x
Connector/Java-Filesystem
#22166
Filesystem connector preview with huge container files lead to clobber the server
Prehistoric defect
Connectors / Connector File Systems
#22443
Notes connector does not extract correctly numerical fields
Prehistoric defect
Connectors / Connector IBM Lotus Notes
#22476
Websphere connector seems to trigger after scan even with flag set to false
Prehistoric defect
Connectors / Connector IBM Web Content Manager & Websphere Portal
#21320
Sharepoint connector201X - Changed Site title not recognized
V6R2013x.SP2
Connectors / Connector Microsoft SharePoint
#20424
Multiple authentication : error (403) Forbidden
V6R2013x.SP3
Connectors / Connector Microsoft SharePoint
#19813
Sharepoint2013 blog comment: author name is badly handled
V6R2014 SP2
Connectors / Connector Microsoft SharePoint
#17597
Preview of OCR ised PDF file has a superposition effect on the text
V6R2014x.SP2
Convert
#20605
Remove broken indexRedirectSources option
V6R2013x.SP2
Crawler
#18722
Box3.deleteDocument() cannot remove doc from different source
V6R2013x.SP2
Crawler
#21797
Bad text formatting in the "Test rules" dialog box (<b> tag displayed)
V6R2014x.SP2
Crawler
#22533
getDictionnaryResourceInfo doesn't work
V6R2014x.SP3
Dictionaries
#22266
Preview command Search API
V6R2013x.SP2
Documentation
#17136
No reference to index kind in documentation
V6R2013x.SP1
Documentation
#20911
SXI: incorrect examples in synonym documentation
V6R2013.SP2
Documentation
#22759
Epoch inputFormat for DateFormatter in the XML Configuration Reference
V6R2014x.SP4
Documentation
#22686
Change of "build after import" option needs a restart to be taken into account
V6R2013x.SP2
Generic
#22717
V6R2013xSP2 issue with antislash in filesystem connector's rootpath
V6R2013x.SP2
Generic
#18610
Warnings "[ Process has been restarted or died. Try to get new pid]" spam the logs
V6R2013x.SP2
Generic
#19713
Crash of connectors-java0 process on a fetch operation on AIX
V6R2014 GA
Generic
#19843
Cvdiag generation may fail just after an applyConfig
V6R2014 GA
Generic
#19854
Too short and useless log message when failed to load a suggest (ss0 side)
V6R2014 GA
Generic
#22000
Commit conditions documentation can lead to some misundersandings in taskqueue mode
V6R2014x
Generic
#22116
Ensure suggestion permutations are not computed when computePermutation="false"
V6R2014 SP2
Generic
#22263
JRDS doesn't gather probes from processes launched via cvconsole
Prehistoric defect
Generic
#22675
Crash of ss0 : /%NETHOOD%/: java.lang.NumberFormatException: NE at org.eclipse.jetty.util.TypeUtil.parseInt(TypeUtil.java:320)
V6R2014x.SP3
Generic
#22870
Remove superUserPassword in gateway log when set on installer command line
Prehistoric defect
Generic
#17492
Forever MEMLEAK reporting
V6R2013x.SP2
Index6/Search
#20711
Bottleneck in UnqueuedBuildGroupManager.allocateDidsSlices
V6R2014x
Indexing
#18613
Convert may not be available if docs are pushed too fast after a process restart
V6R2014 GA
Indexing
#19164
Connector scan blocked after data model change
V6R2012x.SP1
Indexing
#21473
Failed requests for MetaFinder API console services
V6R2014x.SP1
Indexing
#21665
Option "use separate metas for each coordinate" is not correctly working
V6R2014x.SP1
Indexing
#22335
Buildgroup status returns no documents if timeout is set to 0
Prehistoric defect
Indexing
#20266
Logging filters issue
V6R2014 SP2
Logging
#17354
cvconsole generate-static-report does not work on aix
V6R2013x.SP1
Mercury/CVConsole
#21451
PerfUI => Convert cpu usage probe is broken
V6R2014x.SP1
Monitoring Console
#21876
suggestSyncedEntries with limit=0 should not return any entry