Solr   I   Enterprise Search Engine

EAS Search Powered by Solr

We are proud of the vast technical and end-user experience enhancements that have been made to EAS since we assumed custodianship of the software in 2014.  The augmentation of EAS to index into Solr remains one of the most strategic and beneficial enhancements we pursued.  If you’re unfamiliar with Solr, it is an Apache project that powers enterprise search for many leading companies, such as Goldman Sachs, NetFlix, Salesforce, Apple, and eBay. 

Solr was a natural fit to add to our index platform portfolio given its scalability and reliability.  This technical brief will provide you with insights on the benefits of the Solr platform, steps involved to upgrade, and sample sizing.

 

Solr is highly reliable, scalable and fault-tolerant, providing distributed indexing, replication, and load-balanced querying, automated failover and recovery, centralized configuration and more.  Solr powers the search and navigation features of many of the world's largest internet sites.  

   ---   Apache Lucene 

 

Benefits

 

Verification

As an open-source platform, Solr provides Sceven, Unified Global Archiving’s software development organization, the opportunity to enhance the degree of integration between Solr and EAS and develop them in parallel.   

 

A good example of this is index verification.  The verification process confirms that all items that have been archived have also been indexed.  Historically, in IDOL-based deployments, administrators would have to export message ID’s from the index and compare them against SQL [index by table] manually.  Any messages that were not indexed would need to be repopulated to the FT Notify table to be queued up for indexing.  This can be a time-intensive process.  

 

Within a Solr based EAS deployment, we have added a button that automatically verifies what has been indexed into Solr and what EAS deems to be indexed.  This reduces the time administrators require to identify potential variances between the archive and the index, thus decreasing the time to bring them back into sync.

 

Validation

Within a base Solr deployment, content engines are referred to as shards.  PremCloud Services recommends that Solr is deployed in a highly available configuration.  In this configuration, there is a primary and replica shard.  If the primary shard fails, Solr [internally] will rebuild the shard from the replica.  This is the self-healing capability of Solr which occurs without user intervention. Therefore, Solr is considered to have built-in self-healing.  

 

Solr simultaneously writes to both the primary and replica shards.  In IDOL-based deployments, high availability is created via a mirrored instance, where the secondary environment is a copy of the primary.  The issue with this configuration is if the primary content engines become corrupted, the mirrored instance will also become corrupted as it is a copy of the primary.  The validation process within Solr will confirm that the primary and replica shards are in sync.  If the process detects a variance, Solr will automatically begin to repair the shard which is not in sync. Thus providing an added layer of quality assurance to the indexes. 

 

Additional Benefits:

  • Solr can be deployed on virtualized hardware.

  • Highly scalable and rapid indexation speeds [we have achieved indexation of 12 MM items per day].  The more computing resources you provide, the faster the indexing can take place. 

  • Shards [15-20 MM] have greater object capacity relative to IDOL content engines [5 MM].

  • Simple UI enables administrators to have real-time clear visibility into the health and status of the indexer. 

  • Less administrative burden and management.

 

If you already have an IDOL index today, PremCloud Services can implement Solr, re-index all data from IDOL and simply cut over to Solr once the index has caught up with the live data. 

   ---   Peter Mellett, PremCloud President

 

 

Solr Setup

Below is a high-level overview of the steps PremCloud will take to implement the Solr infrastructure, configure EAS for Solr, index the data, validate the data, and cutover: 

 

  • Setup a new SolrCloud Infrastructure. 

  • Install all SolrCloud components.

  • Add the Index to the EAS configuration.

  • Review and confirm mailboxes to index. This would typically be all users in the organization, both enabled and disabled users. 

  • Start the re-indexation process.

  • Once indexing is complete, PremCloud will perform validations that all messages have been indexed.  PremCloud will also validate that “net-new” messages are being indexed.

  • PremCloud will perform additional testing to ensure searches are working as expected.  

  • Retire the legacy IDOL infrastructure. 

 

Sample Customer Sizing

When preparing to evaluate a re-index to Solr, the PremCloud Services team will perform a formal sizing exercise to provide you with exact sizing requirements.  However, below is a sample customer environment you can use to gain a sense of the hardware requirements.  PremCloud Services will collect data on your existing EAS archives such as message count and average message size.  Furthermore, PremCloud Services will also forecast annual growth over the next 3-5 years.  As a general rule of thumb, you can expect your Solr index to be 10-15% of your archive size. 

 

Sample Customer:

  • Approximately 500 employees.

  • 28 million messages archived today in EAS.

  • 47 messages per user/per day.

  • 109 KB average message size [typical average message size is 180-250 KB per message].

  • Annual growth rate of 15% in message count.

  • Annual growth rate of 10% in employee count.

  • 5-year growth rate accounted for in sizing.

  • Forecast of 65 million messages in 5 years.

 

Proposed Solr Environment: 

  • Server Count: Single Server

  • Memory: 20 GB of RAM

  • Storage: 250 GB of Index Data

Follow us on social media:

  • LinkedIn
  • YouTube
  • Facebook

NA  |  5110 Main Street  |  Williamsville, New York 14221  |  USA

EMEA  |   2 London Wall Place, 6th floor  |  London EC2Y 5AU  |  UK

Copyright © 2020 Unified Global Archiving.  All rights reserved.