OpenText acquires Hightail (formerly, YouSendIt)

ECM vendor OpenText has acquired Hightail. Hightail, formerly called YouSendIt, offers a file-sharing and sync service.

OpenText now has at least four file-sharing and sync services. In addition to Hightail, they have OpenText Core and OpenText Tempo Box. Documentum acquisition also gave them EMC Leap, which has some overlaps with cloud-based file-sharing and collaboration services.

OpenText has a history of acquiring multiple overlapping products and services. So nothing new or surprising there.

Here’s a quick summary of how these products differ though.

Overlapping products but there are some differences

Hightail is offered as a public cloud-based SaaS service. It focusses on two major aspects:

  1. Sending large files, using an email like interface. In fact, it’s rather simple file sharing interface mimics how users send an email.
  2. Targets creative teams.

Both of these make it suitable for multi-media use cases. In fact, it has a separate product called “Creative Collaboration” that provides collaborative features for creative teams.

OpenText Core is also a public cloud-based SaaS service. It integrates with both Documentum and Content Suite (OpenText’s two ECM offerings), meaning you can access it from within Content Server’s or Documentum’s user interfaces. So if you want to share content stored in your say on-premise Content Server with external users, you can do it via OpenText Core.

You can of course use Core as a stand-alone file sharing service without using Content Server.

Finally, OpenText Tempo Box provides similar file sharing capabilities but is based on OpenText Content Suite platform. You can deploy it on-premise or in a cloud-hosted environment. You can use it with an existing Content Server repository too. You can also take advantage of all the sophisticated ECM features provided by the underlying Content Server. The key point to remember is that it is based-on and needs OpenText’s Content Suite. As a result, it is probably an overkill for relatively simpler file-sharing use cases.


Figure: User interfaces of OpenText Core and Hightail. Source: OpenText and Hightail

If you are evaluating OpenText’s file-sharing, sync and collaboration offerings, you will find many overlapping products and services. However, not all file sharing services are same and there are differences in the use cases they target, functionality they offer as well as other aspects such as their deployment model and so on. Also remember that you have several other options as well for file-sharing and sync services. If you’d like help navigating the ECM, Document Management or Enterprise File-sharing marketplaces, please feel free to email me.


ECM and Machine Learning – What are Box, IBM, OpenText and other Vendors doing?

There are many use cases in Enterprise Content Management (ECM) for which Machine Learning can be deployed. In fact, i’d argue that you can apply machine learning in all the stages of content life cycle. You can apply:

  • Supervised learning e.g, to automatically classify images, archive documents, delete files no longer required (and not likely required in future), classify records and many more
  • Unsupervised learning e.g, to tag audio and videos, improve your business processes (e.g., approve a credit limit based on a machine learning algorithm instead of fixed rules), bundle related documents using clustering and so on

What are ECM vendors currently offering?

Not much i’d say. These are still early days.

To be fair, Artificial Intelligence and Machine Learning have been used for a long time in enterprise applications but their usage has really been for really complicated scenarios such as enterprise search (e.g., for for proximity, sounds etc) or sentiment analysis of social media content. But it has never been easy to use machine learning for relatively simpler use cases. Additionally, no vendor provided any SDKs or APIs using which you could use machine learning on your own for your specific use cases.

But things are gradually changing and vendors are upping their game.

In particular, the “infrastructure” ECM vendors – IBM, Oracle, OpenText and Microsoft — all have AI and ML offerings that integrate with their ECM systems to varying degrees.

OpenText Magellan is OpenText’s AI + ML engine based on open source technologies such as Apache Spark (for data processing), Spark ML (for machine learning), Jupyter and Hadoop. Magellan is integrated with other OpenText products (including Content, Experience Suites and others) and offers some pre-integrated solutions. Specifically for ECM, you apply machine learning algorithms to find related documents, classify them, do content analysis and analyse patterns. You can of course create your own machine learning programs using Python, R or Scala.

Screen Shot 2018-01-24 at 5.54.13 PM

Figure: Predictive analytics using OpenText Magellan. Source: OpenText

IBM’s Watson and Microsoft Azure Machine Learning get integrated with several other enterprise applications and also have connectors for their own repositories (FileNet P8 and Office365).

Amongst the specialised ECM vendors, Box is going to make its offerings generally available this year.

Box introduced Box Skills in October 2017. It’s still in beta but appears promising. You can apply machine learning to images, audios and videos stored in Box to extract additional metadata, create transcripts (for audio and video files), use facial recognition to identify people and so on. In addition, you will also be able to integrate with external providers (e.g., IBM’s Watson) to create your own machine learning use cases with content stored in Box.

box ML

Figure: Automatic classification (tags) using image recognition in Box. Source:

Finally, there are some service providers such as Zaizi who provide machine learning solutions for specific products (Zaizi is an Alfresco partner).

Don’t wait for your vendors to start offering AI and ML

The rate at which content repositories are exploding, you will need to resort to automatic ways of classifying content and automating other aspects of content life cycle. It will soon be impossible to do all of that manually and Machine Learning provides a good alternative for those type of functionalities. If the ECM vendor provides AI/ML capabilities, that’s excellent because you not only need access to machine learning libraries but also need to integrate them with the underlying repository, security model and processes. An AI/ML engine that is pre-integrated will be hugely useful. But if your vendor doesn’t provide these capabilities yet, you still have alternatives. I’ve said this before and it applies to ECM as well:

There is no need to wait for your vendors to start offering additional AI/ML capabilities. Almost all programming languages provide APIs and libraries for all kinds of machine learning algorithms for clustering, classifications, predictions, regression, sentiment analysis and so on. The key point is that AI and ML have now evolved to a point where entry barriers are really low. You can start experimenting with simpler use cases and then graduate to more sophisticated use cases, once you are comfortable with basic ones.

If you would like more  information or advice, we’d be happy to help. Please feel free to fill the form below or email.


Blockchain for Information Management


Blockchain is best known for its use by the alternative and controversial currency market and most notably Bitcoin. But Blockchain is not Bitcoin nor is Blockchain an alternative/crypto currency. Rather Blockchain is an underlying distributed ledger technology (DLT) that can be used in many different contexts other than Bitcoin.

In fact, we are already seeing the experimental use of Blockchain in many non-currency related situations – for example, the management of healthcare records and the processing of shipping manifests. Specifically, in the context of Information Management, Blockchain can address a number of use cases. However, as with anything new in tech industry, Blockchain is also being used for information management use cases for which it is not a suitable platform.

This recently released report by analyst firm Deep Analysis (written by Alan Pelz-Sharpe and me) looks at the use of blockchain for information management. It explores the structure of this market and its future impact and growth potential.

The report’s ToC is as follows:

  • About this report
  • Methodology
  • Introduction and a brief history
  • Executive Summary
  • How does a blockchain actually work?
  • The key attributes of blockchain
  • Blockchains in action
  • Public versus Private Blockchains
  • The Market Structure
  • Market Drivers
  • Market Realities
  • Our Advice
  • Summary

You can read more details, including exec summary as well as purchase the report here.


What’s Up?

Happy New Year everyone. Here’s wishing you a great twenty-ten.

I have renamed this blog to just “Random Thoughts”. Of course, i’d still be writing about Content Technologies but on the CMS Watch blog now.

In fact, I’ve been enjoying my work at CMS Watch, where I cover Portals, Enterprise Content Management and Web Content Management marketplaces. I haven’t been able to post to this blog for a while but I’ve written a few on the CMS Watch blog:

An important component of our research is based on feedback from actual users of technologies we cover. So i’m always interested in hearing your experiences (both good and bad). If you would like to share something, please leave a comment here or send me an email at adurga (at) cmswatch (dot) com.

I will hopefully be writing more often from now on on this blog as well as CMS Watch.

Factors Impacting Content Migration

Whether one is implementing a new system or moving from one system to another, content migration is always an important aspect. Each situation is unique and so each migration scenario will have its own roadmap. However, there are some common factors that are present in each migration and can determine how long will the migration last. I’m listing down a few. If you think there are other factors as well, please feel free to comment.

In order to take stock of these factors, one needs to follow a good migration approach and spend decent amount of time in analysis. I will not go into details of such an approach – There are quite a number of good articles on these approaches and in particular I like this one by James Robertson. This post is also not about the importance of content analysis, governance and other best practices  🙂

So here are some factors. At this point in time, these are only high level thoughts and I will probably expand these as and when I get time. So Feedback most welcome.

Source(s) of Content

Where and how the original content lives is probably the most important factor defining migration. It could be in a flat file system, database, another content management system, another application or somewhere else. It is important to understand whether it is stored in a proprietary format or not and how easy is it to access it. Obviously content stored in a table in a relational database is easier  to access as compared to something that is stored in a proprietary format.

Type of Content

Content Type could be Media (Images, Video, Audio), Documents (DOC, XLS, PDFs), text (html, xml), database or something else. Migration time lines are hugely dependent on this – migrating X-RAY images where each file could be couple of MBs or more has different challenges than migrating small text files. And when done across multiple environments, the effort only multiplies.

Closely related to this is what the content actually contains? So for example, do you need to migrate multiple languages, encodings and charactersets?

Quality of content

Pranshu gave me this example on twitter the other day. He was involved in a migration scenario in which the headlines of news articles were actually images. So even though migration of body and other fields could be automated, there was a good amount of manual intervention required to convert image headlines to text headlines for destination. Some other examples could be:

  • do all html files follow a template?
  • content with inconsistent metadata (like Male/Female Vs M/F)
  • content with missing metadata that could be mandatory on destination system
  • How much of the content is still relevant?

Amount of Content

Amount as well as the size of files is very relevant. In case of document scenarios, it is important because huge files take time to move across and in case of web content scenarios, it is important especially when things need manual intervention.

Destination System

Is the target system a content management system or is it something else? Does it support the fields that you require or do you need workarounds? Does it provide some mechanism to ingest content? Does it support your metadata and permissioning requirements?

Transformation or Value Add required

In my opinion, this is the factor that is probably the most important. The amount of transformation required between source and destination can actually define how much automation is possible and how much manual intervention is required. If you were to do an “as is” migration, things would possibly be trivial. So for example:

  • Title field in Source needs to be mapped to headline field in destination
  • Do all source fields need to be migrated?
  • Is there a need to define additional fields?
  • Is there a need to transform fields based on constraints (for example an ID in the source CMS would be stored as “123-45-6789” where as in the new CMS, “-” would not be permitted and it needs to be stored as “123.45.6789”)
  • Data cleansing
  • Do you need other value adds (like SEO, copyrighting and so on)?
  • Do you need to repurpose the same content for say delivery to mobile devices?
  • Are there links between files that need to be preserved? (like an XLS embedded within a Doc)
  • Do you want to migrate only the latest version or all versions? what happens to content that is part of an incomplete workflow?

Users and Roles

The difference in how users, roles and the whole permissioning system works in source as compared to destination also plays an important role. This is dependent on capabilities of the system as well as how comprehensively your organization has defined these. In some cases, just like data mapping, you might also need to map these for permissions etc. Read permission in source could be mapped to view permission in Destination. There would also be cases when there is no one to one mapping of permissions between source and destination.

Amount of automation possible

Based on some of the above factors, you will have an idea of how much of the migration can be automated. The extent of automation is dependent on source as well as destination systems:

  • Does source allow export of content?
  • Does destination allow import of content?
  • Are 3rd party products for analysis and migration being used?
  • Do these products allow ETL kind of activities?
  • etc

Roll out

How you want to roll out the new system also impacts your time lines. In scenarios where there are multiple geographies or multiple business units involved, it could be tricky. The reasons for these are more organizational and less related to technology. So whether you do a big bang roll out or do a phased roll out impacts the migration process.

Parallel Run

This is in some way related to the point above. Will Source and Destination systems be required to run in parallel? If yes, content will have to reside possible at both places and if users continue to modify content during the migration, you have to consider doing multiple iterations.

Infrastructure and Connectivity

The speed at which content can be moved across, exported, imported or ingested is also dependent on the connectivity between source, destination, databases etc.

So do you have similar experiences? Are there any other factors that can impact migration time lines?

(Thanks to @pranshuj, @rikang and @lokeshpant for inputs)

Fatwire “Rescues” Interwoven and Vignette

Forrester recently named Fatwire a Leader in their WCM for external Sites Quadrant. And the folks at Fatwire have already called two of their fellow-quads (for the lack of a better term), Interwoven and Vignette as legacy WCM products! Incidentally, Interwoven sits nicely in the Leader quadrant in the same report and was also named the fastest growing ecm vendor by rival analyst firm Gartner. (Yeah, yeah I know –  the report by Forrester is on WCM and the other one by Gartner is on ECM).

On a more serious note though, Fatwire has been making some news in recent times. Among other things, recently they announced a rescue program for “legacy” Interwoven and Vignette customers – an offer to move to Fatwire at no license cost (only the support costs). They have announced this offering in partnership with Vamosa and Kapow. Vamosa and Kapow both have content migration offerings and compete in this space. Fatwire says they both add value to this proposition. I suspect they have partnered with both because Vamosa, along with expertise in many aspects of content migration, has connectors for Interwoven and Vignette while Kapow has connectors for Fatwire. Any content migration scenario will require both set of connectors – one set that exports from interwoven or vignette and one set that imports into Fatwire. You could obviously roll up your own migration scripts by publishing from Interwoven/Vignette as XML and then using Fatwire’s XMLPost or BulkLoader to import into Fatwire. But then the offer for free licenses wouldn’t be free or would it?

BTW, even though Fatwire’s release mentions these as partners, neither of these two have issued their press release nor have mentioned it on their respective sites. I think that’s natural because they probably have partnerships with those “legacy” vendors 🙂

This is an interesting and I’d say an aggressive move by Fatwire. After all there are only few niche WCM vendors remaining and they are one of them. There is a clear divergence happening in the marketplace – On the one hand, there are more web oriented scenarios (Web Content Management, Site Management, Portals, Web Sites and so on) and on the other hand are more repository/management oriented scenarios (Document Management, Records Management). The requirements, challenges as well as decision makers (and stake holders) for both these areas are usually different. Fatwire for one has been focusing on and targeting the needs of interactive marketers which usually fall under the former category of web oriented scenarios (or Web Experience Management, as they like to call it). While many other products have been diversifying horizontally. Call it vertical Vs horizontal diversification if you will.

If there was any time to go aggressive, this was possibly it when the two other big ones have been acquired. Interwoven and Vignette, though can by no means be called “Legacy”, even though they have been acquired. There are probably a few customers out there who are not convinced about Interwoven’s and Vignette’s future after their acquisition by Autonomy and OpenText respectively. But then, as Forrester’s Tim Walters says on his blog, there are many customers out there, including Fatwire customers who are unhappy with their current implementation. So nothing stops the other vendors to come out with this kind of offer for existing Fatwire customers. In fact, as Tony Byrne says, there’s nothing new in these kind of Competitive upgrades.

If you indeed take up this offer, remember that even though there is no license cost, there are quite a few other costs apart from the support costs that you would have paid to Vignette or Interwoven. Here’s Irina’s post on real costs of implementation.

For one, you will have to work with Fatwire’s “proven migration tools and services” which probably means you will need to work with Fatwire, Vamosa and Kapow’s professional services. All the three products (Interwoven, Vignette and Fatwire) have decent mechanisms for importing and exporting content. So content migration per se is certainly not the most challenging aspect. In particular, when you migrate from Interwoven to Fatwire, there are many other challenges depending on what version of TeamSite you are using. TeamSite’s delivery templates are totally different from those of Fatwire’s. If you are using the Perl based PTs (Presentation Templates) and doing a static publishing, your challenges are even bigger. There are many other issues as well – different ways of defining assets, all the complex customizations, different storage (XML Vs Database), workflows and so on.  Vignette, although more similar to Fatwire than Interwoven in terms of architecture, will also have similar challenges. Apart from technical challenges, any content management implementation and content migration has its own sets of challenges in terms of user training, ensuring content quality (Vamosa has some useful offerings here as well), different skill sets and so on. Here’s a nice take on different issues by Jon

I could write a big article on just the differences between Fatwire and Vignette/Interwoven and resulting challenges but the point is that don’t assume it is only about “content” migration. You will need to budget for many other things as well.

Random Notes on EMC World

These are some observations, in no particular order. I will possibly post some “more sensible” posts on specific topics later.

  • It was first time for me at EMC World and I thought the focus was much more on storage and infrastructure as compared to content management. They did certainly much better though in terms of integrating CMA (Content Management and Archival) with the overall EMC World. A lot of people who I talked to thought it was actually much better than that in the past when CMA folks felt quite out of place.
  • A big theme at the conference was about building social communities. Joe Tucci, the EMC Chairman started his key note with some statistics on tweets about the EMC World. He spoke about how EMC is working to give its customers more choice, better control and improved efficiencies. There was a dedicated blogger’s lounge, set up by Len Devanna and his team, which provided a great informal environment for bloggers and tweeps to come together and socialize. I am glad I was able to meet Laurence (pie), Len and Stu. There were other lounges on similar lines and in particular, the Momentum lounge provided a good place for Documentum users to meet.
  • Then there was CMA president Mark Lewis’ key note. He talked of ROI as return on information.
  • I was particularly interested in EMC’s initiatives around Customer Communication Management (or rather around their xPression product which came via the acquisition of Doc Sciences). Although, there were a few (and good) sessions on this, I was hoping for a bigger presence. They had a small, not so prominent booth within a large EMC booth.
  • Another interesting announcement (although this was done a couple of days before EMC World) was about free availability of the developer edition of Documentum. I think this is a great move to increase usage and acceptance of Documentum. EMC claims it takes 23 minutes to get up and running with Documentum, although i suspect it will take much more to download it – It is almost a 2 GB download and has steep RAM requirements (recommended 4 GB although 3 GB would work too) and so it would not be as easy to run it (on a laptop) as it is with some other products.This will essentially enable developers to get their hands dirty which in turn will help in more spreading of Documentum.  The developer edition comes bundled with Jboss and SQL Server Express database.
  • Some claimed that there were 7000 attendees but I felt the number was lower. I also think that number of customers, especially those interested in content management were far less than previous times. Although there were quite a few partners, the big partners were noticeable by their absence.
  • CMIS was reasonably covered. There was a dedicated session by Laurence and Karin Ondricek as well as Victor Spivak covered it in his session on D 6.5 architecture. Laurence demoed the federated CMIS sample application and according to him, the fact that Alfresco and Nuxeo allowed their servers to be up for Documentum conference showed the high amount of cooperation happening on CMIS.
  • Victor was quite clear about the scope of CMIS and more importantly what it is not. According to him, “I” is the most important letter in the acronym and in that sense, the objective is to provide interoperability and not implement more sophisticated features. And so the focus is only on basic services, mashup type of applications and not real business applications which are best handled by proprietary APIs (like DFS) or CMS specific features. He also said If you were to describe 6.5 release in 1 sentence, it would be “high volume services”.
  • There were quite a few sessions on WCM and more “Delivery oriented” aspects like Dynamic delivery, site management, Web 2.0, RIAs and so on. EMC has also latched on to the term Web Experience Management (WEM), something that Vignette and Fatwire have been using for some time. Web Publisher is not yet as sophisticated a platform for WCM and it remains to be seen how they do it.
  • Most of the sessions were EMC specific and by EMC and I think the number of independent sessions should be increased. I attended the one by Jeetu Patel of Doculabs in which he talked about different type of ROI modeling for ECM projects.
  • There were quite a few sessions on CenterStage. Victor talked about the philosophy behind center stage and that was to separate front end completely from business logic and backend because front end technologies have been changing quite often. I think this is an obvious way and wonder why this was not done in Webtop. He also explained the increasing support for restful apis etc. (See Pie’s post here ).
  • There were also few discussions around Lucene replacing FAST search in EMC’s products.

Hello (EMC) World

I’d be traveling over next couple of weeks mainly to attend EMC World in Orlando and meet customers in the US. We have been working on some interesting concepts (Hosted Document Services, Factory Model and ECM Maturity Model) and I will take this opportunity to socialize these and get feedback from our customers.

This is my first time at EMC World and i’ve heard good things about it. I’m  especially looking forward to the Bloggers Lounge.

By the way, we (Wipro) also have a booth at the conference. So if you are there and want to say hello, do drop in.

Open Text acquires Vignette

After Autonomy/Interwoven and Oracle/Sun news, here comes the third big news of the year.

If Unilever can have multiple soaps and GM can have multiple car models, why can’t a Content Management vendor have multiple products? OT’s acquisition of Vignette points to this increasing “commoditization” of Content Management marketplace.

There may be a lot of overlaps in products across OT and Vignette but we all know that one size does not fit all and so why not have different products for different scenarios, different price points, different technology stacks and different requirements?  OT now has multiple options for Document Management, DAM, WCM etc plus a bonus portal server that they lacked before. They had a portal integration kit (PIK) that exposed LiveLink’s functionality as portlets that could be deployed on some of the portal servers (but not VAP and Sun as far as I know).

There’s some good analysis here and here.

On a side note, I think people who worked closely with Vignette knew it coming. A colleague of mine told me this:

One Singapore based vignette customer we were talking to  suddenly went quiet and our sales guy spotted him meeting OpenText. Another one who we were talking to, suddenly decided not to continue with Vignette and decided to migrate to Day communiqué. A senior person in Vignette Singapore joined OpenText about 2-3 months back – and was not replaced. There were many other signs in the way Vignette was handling people and partnerships that showed something is on.

I always considered Interwoven, Vignette and Fatwire (Open Market, Divine and FutureTense before that) as the leaders and pioneers in pure play Web Content space. With Interwoven and Vignette gone, what does this mean for the WCM marketplace? An end of the era?

Oracle buys Sun

Oracle announced it will acquire Sun.

Another big Portal/Content Management vendor is now an infrastructure vendor. Sometimes I wonder if  everything will soon become an appliance – you buy a Solaris box and it will come bundled not only with the OS (obviously) but also with WebCenter (or one of the numerous Oracle Portal type products), Content Server and so on. IBM, EMC and Microsoft can do this already in some sense.

Sun had open sourced its entire JES or Java ES (Java Enterprise System) sometime back and more recently dropped the JES Portal Server in favor of a partnership with Liferay. The result was WebSynergy, Sun’s branded portal based on Liferay’s codebase. It is not clear how Oracle will continue this partnership and frankly  they already have too many portal kind of offerings to continue with this. However, I think Liferay has a strong offering (and recently opened a new office in India) and will continue to be a good open source alternative whether or not Oracle continues this partnership.

The other component of JES that might have some relevant features is probably Sun Java Communications Suite which has features for collaboration  – things like calendar, messaging, Instant messenger as well as support for mobile communications. Some of these could be good additions to Oracle’s Fusion.

On a different note though, Janus had this to say on twitter:

Oracle buys sun – now Oracle has 5 enterprise portals! a new commercial for Larry: 5 out of 12 most significant portals are powered by ORCL

In spite of that, they had to resort to static pages!?