Many of our customers have more that one content repositories. So we often get into situations where there is a need for:
- Coexistence: Business requires these multiple repositories to exist simultaneously. This could be because there are different applications for different requirements or because the migration effort is so huge that it is not possible to retire one system immediately. So there is a need for a common interface so that business users can access all the repositories without knowing they are different. They should be able to checkin from one, checkout to another and generally work on multiple repositories as if there was a single backend system.
- Migration: Because of multiple reasons (licensing, satisfaction etc), there is a requirement to move content from one system to another. Or perhaps deploying content from a content repository to a delivery channel.
- Consolidation: To save costs (licensing, training, infra), they want to consolidate to less number of repositories.
Now obviously, in any of these scenarios, it becomes very important to do content inventory, content analysis and mapping, taxonomy assessment and so on. However, when the content size is huge (some of our customers have terabytes and more of content, that too in the form of huge documents), it becomes important to automate the migration process to the extent possible. What this essentially means is that you need to have an intermediate layer that can talk to source and target repositories and move content across. Depending on whether you want coexistence or migration, the intermediate layer would need to be two way (read and write) or just one way (import or export). I can think of three ways to achieve this:
- Roll your own: This possibly provides most flexibility but needs maximum time to develop. You essentially write your own code that exports content from source, does transformation and cleansing and then finally imports it to the target repository. Most decent content management systems provide APIs that can be used in conjunction with code to achieve this.
- Use connectors/features provided by CMS vendors: Many CMS vendors provide some mechanism for importing and exporting. They might even provide some way of importing content from “specific” systems.
- Use third party tools
I have been doing some research and have come across these vendors who can help you automate this process to a great extent:
EntropySoft
EntropySoft provides an amazing collection of two way connectors for 30 or so different repositories. These connectors have the ability to read from and write to these repositories. They essentially provide two mechanisms:
A Content Federation Server which is a web application. It allows you to configure these repositories and exposes the functionality via a web interface. So using this interface you can access these repositories and your business users will not know that they are different repositories. So as an example, you can check out a policy document from Documentum and check it in FileNet. The same interface also lets you do migration from one to another, create tasks that will automatically migrate as and when a document is updated in one. In the screenshot below, you can see Livelink, Alfresco, FileNet and some other repositories shown.


Now this interface is simplistic in the sense that you can do an as-is migration. For more complex migration where you have to transform content, map metadata from source to destination, map permissions, users and roles, it provides an ETL product which is an eclipse based environment. Using this ETL, you can create complex migration processes, using drag and drop.

EntropySoft also works with many search engines for creating federated search applications.
The best thing about EntropySoft is its ease of setup. You can actually get up and running and start an as-is migration (meaning no transformation, no mapping) in just about 15-20 minutes (abt 7-10 minutes for setting up source and destination each). I think where they lag is possibly in terms of having connectors for more web content management systems.
OpenMigrate
OpenMigrate is an open source alternative from TSG. Currently they have adaptors for Documentum, Alfresco, JDBC, FileSystem. I believe they are probably working on sharepoint and filenet connectors as well.
Vamosa
Vamosa is also a good alternative. To me it appears that their strength lies in web content management. Here’s a list of their connectors. I think Vamosa’s differentiation is that they not only focus on connectors but holistically look at migration. So they have some good products that help you with all the steps that I mentioned above – content inventory, analysis etc and then migration.
Many people have said that with something like CMIS (if and when it becomes a standard), there will be an adverse impact on the connector industry. I actually think it will actually be good for these connector vendors because they would be able to use CMIS instead of relying on proprietary APIs of each repository. Plus I think connecting to a repository is only one, although an important aspect. There is a lot more that goes along with connecting to a repository – transformation, ability to map source data to target repository, reporting, exception management and so on and that is where such products add a lot of value.
Do you know of any other products in this space? What do you think of these?
Any idea on good approach for consolidation of repositories. Are there any third part product which can help in this process?
-Vinz