Books I recommend

Print

Understanding the Value of Data Archiving

 

Introduction

In this White Paper I will try to explain some of the great value derived from a correctly implemented data archiving system. This covers both email and file system data.

Audience

This is intended for IT personnel that have basic understanding of email systems, storage systems and networking.  If you are using this to make a decision and you are not an IT professional, some of the terms and lack of detailed explanation will hinder you.  You should consult with one of your engineers after reading this publication.

Why do you want to Archive Data?

There are many good reasons for archiving the email and file data located in your corporate messaging systems and storage platforms. Some I can think of are…

  • Reduced storage needs (on your messaging servers and high speed storage platforms).
  • Faster searching and retrieval of email and file data for legal discovery and other purposes.
  • Faster backups of your information stores (Exchange Server).
  • Faster migrations; mailboxes from one server to another.
  • Increasing email storage for the humans without increasing storage for your mail databases.
  • Increasing file data storage for the humans without buying more disks for your high speed (read “expensive”) storage devices.
  • Regulatory compliance.
  • Better overall management of your email and file storage.

There are more to be sure, but these are the most common.  I will cover these bullet-points in detail on the following pages.

 

Before we get started…

We should get some of the terminology out of the way;

Single instance is the process where all duplicate data is rounded up and only one copy is retained and the rest deleted. The actual process is much more technical than this but that is the basic function.

Shortcut or stub is a small file placed in your email database (a user’s mailbox) and it points back to the message in the archive system. This stub is usually distinguished by an icon that differs from the standard email message icon. Double clicking the icon will (in most archiving systems) retrieve, not restore, the message for the user to review.

The “Archive” is a storage location controlled by the archive server that contains the stored messages. Most enterprise systems will compress the original message to save disk space. In best practices the archiving system will be able to utilize low cost storage for the archive.

De-duplication is not the same as “single instance” (but is mistakenly used instead of single-instance by some vendors) and is a term more commonly used with flat file systems and many times controls data redundancy at a “byte” level rather than just at the file level.

 

Reduced storage needs (on your messaging and file servers)

No brainer. The email is moved to your less expensive secondary storage or “archive” where it is compressed and “single-instanced”.  Leaving in the original message place a “shortcut” or “stub” which is a pointer to the actual message, this is usually just a few kilobytes in size, much smaller than the original email; thereby reducing the size of data in your email databases.

This applies as well to your file system data. The process is the same only faster as you are accessing flat-files. Most SAN’s or even NAS systems utilize high speed SCSI Fiber Channel disks. The secondary storage you are moving off to can easily be SATA or equal drive performance. The archive systems you choose should be able to utilize this kind of “cheap” storage without issue.

 

Faster searching and retrieval of email for legal discovery and other purposes

Now that the email has been archived you can use the tools that should come with any good, enterprise archiving solution and search rapidly through the “archive” of messages. Usually these tools will have much greater search capabilities than any search tool available natively on your email servers. Also you should be able to place “holds” on messages so they cannot be deleted until you have released them.

 

Faster backups of your information stores (Exchange)

Your email databases will shrink as you archive the original messages and leave smaller “placeholders” behind, thereby reducing the size of your mail databases and making your backups faster. This is true for any email system not just Exchange. It should be noted that with Exchange you will need to perform an offline defrag to regain the “white space” so that your .EDB files will be smaller.

Faster migrations; mailboxes from one server to another

Again, “smaller database =faster to move”. Example; if a mailbox is 1.5 gigs and it will take 40 minutes to move it to another server, you archive its contents leaving stubs behind.  Now the mailbox is 300 Megs and it only takes 10 minutes to move. So if you are moving a lot of mailboxes to another server, performing a mailbox migration, it will drastically speed up the process if you can shrink the amount of data you have to move.

 

Increasing email storage for the humans without increasing storage for your mail databases

This is one purpose that is a win-win for you and the people using your email system. Once you have implemented the archiving system you can use it as a huge “blob” of extended storage for the email system. I have seen it setup many ways; I will outline this use from one of the more successful implementations I worked with.

Users have a quota of 200 Megs on their mailbox. They are constantly running out of space and storing email off to PST files (which is unsafe) and complaining to you. You implement an email archiving system. You set your policy to let users keep 1 month of email in their mailbox and archive everything else. You set your archive retention period to 3 years. You also stop, through policy (GPO’s), the use of PST’s. Now you lift mailbox limits. Next you perform a migration to move all PST data into your archive and delete the PST’s after. Your end users have 1 month of email in there inbox and the rest is stubbed.  They now have access to 3 years worth of email by accessing the archive (which in any archive system worth-its-salt is a very easy, intuitive thing to do). You have given them unlimited mailbox storage and gained greater control of your email system.

 

High Level View

archive1

Litigation and Email Searches

There are many good reasons to implement an archive system but the number one reason I have seen in the last 4 years is to meet the requirements of the legal department during litigation when a judge orders email to be allowed for “discovery”.  This can be very time sensitive and if you have a large amount of email it can be almost impossible without some kind of effective system in place to deal with it. With a well designed archiving system you should be able to produce any ordered search results in hours or days rather than weeks or months. In many cases the amount of money saved in time searching by the IT administrators and or outsourced searching will easily pay for an enterprise level archiving system.

Also having access to all email during searches and being able to show that you control your email system through policy effectively will help to reduce any effort by the opposing counsel to refute your searches and results as incomplete.

 

Storage Vendors

One of the very positive results of the explosion of archiving systems in the software market is that many of the more mature systems have partnered with storage vendors to improve the archive storage platform and performance. Many such as Data Domain, Hitachi and NetApp are working with Archive vendors to provide a “matched” solution that allows better interaction with their products so the archiving solution can more effectively store data.

A good example of this is Data Domain; because their focus is on compression and de-duplication they have far superior algorithms and processes to handle the compression and single instancing of data. So they have partnered with some of the enterprise archive software companies and now those products allow you to turn off their built in compression in favor of the storage providers; this is important to look for in a product when you start looking to make a purchase.

 

Lower Cost Storage

Some of the enterprise archival software vendors have worked to make sure they can run effectively on lower cost storage. I don’t mean a USB drive from your local electronics store. I’m talking about NAS storage using SATA drives or better or internal storage on a server. This is a good place to get ROI because most email systems (especially MS Exchange) require fast SCSI or better disk subsystems for the email databases. By archiving to low cost storage you can keep from increasing your email server storage space and focus on NAS storage; this can benefit the rest of your organization by utilizing the storage for more than just archived email.  NOTE: beware, there are a few archival manufacturers who must have high speed storage to function and they will tell you otherwise in their demo or during the sales cycle. Make sure you understand the performance requirements before you settle on a product.

How does email and data archiving work?

Below is a diagram (high level) that shows the basic functions of an archiving system.

 

archive2

Conclusion

Hopefully I have given you some insight as to the great benefits of data archiving.  Please remember there are even more benefits than I have listed here.  Also each archiving vendor has different technologies and methods that may fit your environment well so make sure you investigate several before you make a choice. Also (as if I need to say it) don’t make a purchase decision based solely on one slick demo, try and install the product in a lab and test it.

Thanks for taking the time to read this article and I hope it has helped you.

Document Links

Cornerstone Technologies - Data Archiving Consulting Firm

http://www.cornerstonetechnologies.com

Outlook performance – too many items in the inbox:

http://support.microsoft.com/kb/905803

Single instance storage explained (Wikpedia):

http://en.wikipedia.org/wiki/Single_Instance_Storage

Print

Evaluation of Symantec Enterprise Vault and Mimosa Nearpoint

on .

Introduction

In this “white paper” I will articulate the differences between Symantec Enterprise vault and Mimosa Nearpoint email archiving systems. My intention is not to decry one product over the other but rather to point out key differences in their architecture and functionality. My goal is to provide a better means by which to make a decision in the vast email archiving arena.
You may wonder why I have chosen these two products; quite simply, these are Gartner’s “Magic Quadrant” top players. 

Additionally, I have extensive experience with both products in the real world and I have tested or evaluated several of the other “players” in this market. I am asked often to explain the differences between these two systems.
Symantec’s Enterprise Vault is a vast product with many features that Mimosa’s Nearpoint system does not cover; this is due mostly to the fact that Enterprise Vault is a much more mature product. So I will highlight the features that are of significant value in each product.
Audience
This is intended for IT personnel that have basic understanding of email systems, storage systems and networking. If you are using this to make a decision and you are not an IT professional, some of the terms and lack of detailed explanation will hinder you. You should consult with one of your engineers after reading this publication.
Experience
I am an IT professional of 15 years. Three of those years as a consultant, which I am working as now, 10 years working as a systems administrator at several bay area corporations and several years running my own IT service related business.
My experience with Mimosa is 1 1/2 years running it , a (editorial note - removed per request) .
My experience with Symantec Enterprise Vault is as a consultant deploying it. I am Symantec Certified in Enterprise Vault 7.5. I now consult for a IT Engineering organization, Cornerstone Technologies, LLC.
NOTE: I wrote extensively about Mimosa and how my evaluation was performed in a “production copy” lab. That article is located at my own personal website. I will supply the link in the “supporting documents” section of this paper.
Mimosa Nearpoint Architecture
 
(editorial note - removed per request)
 
You will see this throughout the article as I have been requested by Mimosa to remove any information about their product. It’s funny I wrote another article that was on my site for 2 years about their product and they sent customers there all the time to read it and it had pretty much the same technical info in it. They even had me do an in-depth interview with a eWeek reporter and a webinar for their customers. But the new article, which was not derogatory in any way, but completely fact based completely freaked them out. I guess they don’t want their customers to know the truth? Well you may see the unedited version back here again, maybe…


Symantec Enterprise Vault Architecture
    Detailed below in “Diagram 2” you will see the basic high level layout of Symantec’s Enterprise Vault. There are 3 servers (to start) used in the EV system. You may or may not also need an external storage system such as a SAN (Storage Area Network) or NAS (Network Attached Storage) for the storage needs of the system.
    • MS SQL Server
    • Enterprise Vault Server
    • Your Exchange Server
    • Storage

    Diagram 2 – Enterprise Vault
    alt
    Architecture

    • Services
      • The EV server is Running IIS services as all client access is through HTTP or HTTPS.
      • There are EV services running as well; Enterprise Vault Admin Service, Enterprise Vault Directory Service, Enterprise Vault Indexing Service, Enterprise Vault Shopping Service, Enterprise Vault Task Controller Service, Enterprise Vault Storage Service.
    • Storage requirements
      • Volume for the Archive.
      • Volume for the Index.
      • Index disks must be high performance, fiber-channel preferred, no NAS!
    • SQL database sizes vary amongst installations. Below are the database sizes for my Lab installation with 50 mailboxes and total information store sizes of 35 gigs (some whitespace).
      • EnterpriseVaultDirectory – 35mb
      • Enterpris VaultMonitoring (optional) - 180mb
      • EnterpriseVault - 180mb
    Basic Function
      In design Enterprise Vault sets up “Targets” or Exchange servers. This basically adds your Exchange server to the Enterprise Vault Directory so it can be archived.

      • Archiving
        • First an “Archive” is setup. This includes the creation of the “Store” database. The archive can reside on any disk sub-system, SAN or NAS as high performance is not required for archive storage.
        • Now polices are set up based on date/time, size or water mark for archiving. The Exchange server is accessed by MAPI and mailboxes are scanned for items that should be archived. Items that meet the policy are removed from the Exchange server using MAPI and replaced with a shortcut. This “shortcut” is an Outlook form that is stored in the ORG forms library on the Exchange server.
        • It should be noted that an Outlook Add-in must be deployed to end users before they can access the archive.

        Feature Comparison

        • Archive Storage
          • (editorial note - removed per request) versus Symantec Enterprise Vaults’s “Vault”. So one of the key differences here is that the (editorial note - removed per request) is put in here. Whereas Symantec only stores messages that meet the policy criteria during an archive run. This means that the Mimosa (editorial note - removed per request) and Enterprise Vault will have only the messages that meet the archive policy, thereby Symantec will need les storage.
          • Mimosa has a (editorial note - removed per request)
          • Mimosa stores (editorial note - removed per request) 
          • Both products have the ability to close off a storage area IOR (Mimosa) or Archive (EV) and open a new one but maintain access to all. This would be done as an archive area grows too large.
          • Symantec has the ability to gather the “dvs” files and place them in a .CAB file. The size of the .CAB file is controlled through policy. The .CAB files can be moved off to another storage system such as tape. Symantec does integrate with NetBackup. Mimosa (editorial note - removed per request).

          • Archiving
            • Mimosa has a feature called (editorial note - removed per request)
            • Symantec uses “Journaling” to do a complete capture of all messages that traverse the email system or you can just choose mailboxes to archive. Mimosa does not use “Journaling as they (editorial note - removed per request) 
              • The scenario would go like this: (editorial note - removed per request)
              • With Journaling it doesn’t matter if Symantec loses connectivity, as all mail is delivered to the Journal mailbox. So when connectivity is regained whatever is in the journal mailbox is pulled.
            • Mimosa captures (editorial note - removed per request) With Symantec you will only be able to retrieve what has been archived.

          • Single Instancing
            • Both products offer “single instancing”. This is where more than one copy of an attachment has been archived and as the system determines that both files are the same one is kept the other deleted. Exchange does this as well but only amongst storage groups where as the Mimosa product will (editorial note - removed per request). The Symantec product has this same functionality but in their new release they can perform this across all archives giving them an edge in cutting down on storage consumption.

              • Legal Discovery
                • Both products can perform searches through there archive for the purpose of legal discovery. In fact this is the driving force behind sales of email archiving solutions nowadays. The Mimosa product (editorial note - removed per request). The interface is drastically different from the Symantec product. The Mimosa product (editorial note - removed per request).
                • The Symantec product has the capability for advanced complex searches as well. In use both products are complex and you will need training to use them but the Symantec product really shows its maturity here as it has a wealth of tools and customizations for the search. Also Symantec has some very granular controls for who can do what, “roles assignment” as it is called (actually Mimosa uses the same terminology) but Mimosa has (editorial note - removed per request) only.
                • The eDiscovery product from Mimosa has (editorial note - removed per request). The Symantec product uses a web interface that can be accessed by any computer and is controlled by domain account login.
                • In the Symantec product searches can be scheduled for off hours to reduce workload on the SQL server and EV sever.
                • Both products can mark items for “hold” so that any retention policies do not deleted messages found in the search.

                • Outlook
                  • Both products have Outlook client functionality. The mimosa product does not install any applications to the client for its basic functions. The Symantec product needs to install a small client in Outlook.
                  • Mimosa uses an (editorial note - removed per request).
                  • Both products have an “offline” feature that can be deployed to mobile users. This feature allows you to have access to your archived email when you do not have access to your companies Network.
                  • Both products have OWA access to the archive as well. This feature works well in both products.

                      Well I could go on but I think I have covered the functionality that most IT personal will be looking for when the start the decision making process to purchase an email archiving product. It should be noted that I covered only the features that were shared between the products. Where Symantec starts to show its “enterprise” face is in the features not covered here. Features like; File System archiving, SharePoint archiving, Lotus Domino archiving, IM archiving and more.

                        Document Links
                        Email archiving:
                        http://www.arconi.com/index.php?option=com_content&view=article&id=60:emailarchiving&catid=47:white-papers&Itemid=79

                        Outlook performance – too many items in the inbox:
                        http://support.microsoft.com/kb/905803

                        Single instance storage explained (Wikpedia):
                        http://en.wikipedia.org/wiki/Single_Instance_Storage

                        Gartner’s “Magic Quadrant” document on email archiving products. (You will need to register)
                        http://www.gartner.com/DisplayDocument?ref=g_search&id=674418&subref=advsearch