Whether one is implementing a new system or moving from one system to another, content migration is always an important aspect. Each situation is unique and so each migration scenario will have its own roadmap. However, there are some common factors that are present in each migration and can determine how long will the migration last. I’m listing down a few. If you think there are other factors as well, please feel free to comment.
In order to take stock of these factors, one needs to follow a good migration approach and spend decent amount of time in analysis. I will not go into details of such an approach – There are quite a number of good articles on these approaches and in particular I like this one by James Robertson. This post is also not about the importance of content analysis, governance and other best practices 🙂
So here are some factors. At this point in time, these are only high level thoughts and I will probably expand these as and when I get time. So Feedback most welcome.
Source(s) of Content
Where and how the original content lives is probably the most important factor defining migration. It could be in a flat file system, database, another content management system, another application or somewhere else. It is important to understand whether it is stored in a proprietary format or not and how easy is it to access it. Obviously content stored in a table in a relational database is easier to access as compared to something that is stored in a proprietary format.
Type of Content
Content Type could be Media (Images, Video, Audio), Documents (DOC, XLS, PDFs), text (html, xml), database or something else. Migration time lines are hugely dependent on this – migrating X-RAY images where each file could be couple of MBs or more has different challenges than migrating small text files. And when done across multiple environments, the effort only multiplies.
Closely related to this is what the content actually contains? So for example, do you need to migrate multiple languages, encodings and charactersets?
Quality of content
Pranshu gave me this example on twitter the other day. He was involved in a migration scenario in which the headlines of news articles were actually images. So even though migration of body and other fields could be automated, there was a good amount of manual intervention required to convert image headlines to text headlines for destination. Some other examples could be:
- do all html files follow a template?
- content with inconsistent metadata (like Male/Female Vs M/F)
- content with missing metadata that could be mandatory on destination system
- How much of the content is still relevant?
Amount of Content
Amount as well as the size of files is very relevant. In case of document scenarios, it is important because huge files take time to move across and in case of web content scenarios, it is important especially when things need manual intervention.
Is the target system a content management system or is it something else? Does it support the fields that you require or do you need workarounds? Does it provide some mechanism to ingest content? Does it support your metadata and permissioning requirements?
Transformation or Value Add required
In my opinion, this is the factor that is probably the most important. The amount of transformation required between source and destination can actually define how much automation is possible and how much manual intervention is required. If you were to do an “as is” migration, things would possibly be trivial. So for example:
- Title field in Source needs to be mapped to headline field in destination
- Do all source fields need to be migrated?
- Is there a need to define additional fields?
- Is there a need to transform fields based on constraints (for example an ID in the source CMS would be stored as “123-45-6789” where as in the new CMS, “-” would not be permitted and it needs to be stored as “123.45.6789”)
- Data cleansing
- Do you need other value adds (like SEO, copyrighting and so on)?
- Do you need to repurpose the same content for say delivery to mobile devices?
- Are there links between files that need to be preserved? (like an XLS embedded within a Doc)
- Do you want to migrate only the latest version or all versions? what happens to content that is part of an incomplete workflow?
Users and Roles
The difference in how users, roles and the whole permissioning system works in source as compared to destination also plays an important role. This is dependent on capabilities of the system as well as how comprehensively your organization has defined these. In some cases, just like data mapping, you might also need to map these for permissions etc. Read permission in source could be mapped to view permission in Destination. There would also be cases when there is no one to one mapping of permissions between source and destination.
Amount of automation possible
Based on some of the above factors, you will have an idea of how much of the migration can be automated. The extent of automation is dependent on source as well as destination systems:
- Does source allow export of content?
- Does destination allow import of content?
- Are 3rd party products for analysis and migration being used?
- Do these products allow ETL kind of activities?
How you want to roll out the new system also impacts your time lines. In scenarios where there are multiple geographies or multiple business units involved, it could be tricky. The reasons for these are more organizational and less related to technology. So whether you do a big bang roll out or do a phased roll out impacts the migration process.
This is in some way related to the point above. Will Source and Destination systems be required to run in parallel? If yes, content will have to reside possible at both places and if users continue to modify content during the migration, you have to consider doing multiple iterations.
Infrastructure and Connectivity
The speed at which content can be moved across, exported, imported or ingested is also dependent on the connectivity between source, destination, databases etc.
So do you have similar experiences? Are there any other factors that can impact migration time lines?
(Thanks to @pranshuj, @rikang and @lokeshpant for inputs)