OpenText acquires Hightail (formerly, YouSendIt)

ECM vendor OpenText has acquired Hightail. Hightail, formerly called YouSendIt, offers a file-sharing and sync service.

OpenText now has at least four file-sharing and sync services. In addition to Hightail, they have OpenText Core and OpenText Tempo Box. Documentum acquisition also gave them EMC Leap, which has some overlaps with cloud-based file-sharing and collaboration services.

OpenText has a history of acquiring multiple overlapping products and services. So nothing new or surprising there.

Here’s a quick summary of how these products differ though.

Overlapping products but there are some differences

Hightail is offered as a public cloud-based SaaS service. It focusses on two major aspects:

  1. Sending large files, using an email like interface. In fact, it’s rather simple file sharing interface mimics how users send an email.
  2. Targets creative teams.

Both of these make it suitable for multi-media use cases. In fact, it has a separate product called “Creative Collaboration” that provides collaborative features for creative teams.

OpenText Core is also a public cloud-based SaaS service. It integrates with both Documentum and Content Suite (OpenText’s two ECM offerings), meaning you can access it from within Content Server’s or Documentum’s user interfaces. So if you want to share content stored in your say on-premise Content Server with external users, you can do it via OpenText Core.

You can of course use Core as a stand-alone file sharing service without using Content Server.

Finally, OpenText Tempo Box provides similar file sharing capabilities but is based on OpenText Content Suite platform. You can deploy it on-premise or in a cloud-hosted environment. You can use it with an existing Content Server repository too. You can also take advantage of all the sophisticated ECM features provided by the underlying Content Server. The key point to remember is that it is based-on and needs OpenText’s Content Suite. As a result, it is probably an overkill for relatively simpler file-sharing use cases.

ot-hightail

Figure: User interfaces of OpenText Core and Hightail. Source: OpenText and Hightail

If you are evaluating OpenText’s file-sharing, sync and collaboration offerings, you will find many overlapping products and services. However, not all file sharing services are same and there are differences in the use cases they target, functionality they offer as well as other aspects such as their deployment model and so on. Also remember that you have several other options as well for file-sharing and sync services. If you’d like help navigating the ECM, Document Management or Enterprise File-sharing marketplaces, please feel free to email me.

 

ECM and Machine Learning – What are Box, IBM, OpenText and other Vendors doing?

There are many use cases in Enterprise Content Management (ECM) for which Machine Learning can be deployed. In fact, i’d argue that you can apply machine learning in all the stages of content life cycle. You can apply:

  • Supervised learning e.g, to automatically classify images, archive documents, delete files no longer required (and not likely required in future), classify records and many more
  • Unsupervised learning e.g, to tag audio and videos, improve your business processes (e.g., approve a credit limit based on a machine learning algorithm instead of fixed rules), bundle related documents using clustering and so on

What are ECM vendors currently offering?

Not much i’d say. These are still early days.

To be fair, Artificial Intelligence and Machine Learning have been used for a long time in enterprise applications but their usage has really been for really complicated scenarios such as enterprise search (e.g., for for proximity, sounds etc) or sentiment analysis of social media content. But it has never been easy to use machine learning for relatively simpler use cases. Additionally, no vendor provided any SDKs or APIs using which you could use machine learning on your own for your specific use cases.

But things are gradually changing and vendors are upping their game.

In particular, the “infrastructure” ECM vendors – IBM, Oracle, OpenText and Microsoft — all have AI and ML offerings that integrate with their ECM systems to varying degrees.

OpenText Magellan is OpenText’s AI + ML engine based on open source technologies such as Apache Spark (for data processing), Spark ML (for machine learning), Jupyter and Hadoop. Magellan is integrated with other OpenText products (including Content, Experience Suites and others) and offers some pre-integrated solutions. Specifically for ECM, you apply machine learning algorithms to find related documents, classify them, do content analysis and analyse patterns. You can of course create your own machine learning programs using Python, R or Scala.

Screen Shot 2018-01-24 at 5.54.13 PM

Figure: Predictive analytics using OpenText Magellan. Source: OpenText

IBM’s Watson and Microsoft Azure Machine Learning get integrated with several other enterprise applications and also have connectors for their own repositories (FileNet P8 and Office365).

Amongst the specialised ECM vendors, Box is going to make its offerings generally available this year.

Box introduced Box Skills in October 2017. It’s still in beta but appears promising. You can apply machine learning to images, audios and videos stored in Box to extract additional metadata, create transcripts (for audio and video files), use facial recognition to identify people and so on. In addition, you will also be able to integrate with external providers (e.g., IBM’s Watson) to create your own machine learning use cases with content stored in Box.

box ML

Figure: Automatic classification (tags) using image recognition in Box. Source: Box.com

Finally, there are some service providers such as Zaizi who provide machine learning solutions for specific products (Zaizi is an Alfresco partner).

Don’t wait for your vendors to start offering AI and ML

The rate at which content repositories are exploding, you will need to resort to automatic ways of classifying content and automating other aspects of content life cycle. It will soon be impossible to do all of that manually and Machine Learning provides a good alternative for those type of functionalities. If the ECM vendor provides AI/ML capabilities, that’s excellent because you not only need access to machine learning libraries but also need to integrate them with the underlying repository, security model and processes. An AI/ML engine that is pre-integrated will be hugely useful. But if your vendor doesn’t provide these capabilities yet, you still have alternatives. I’ve said this before and it applies to ECM as well:

There is no need to wait for your vendors to start offering additional AI/ML capabilities. Almost all programming languages provide APIs and libraries for all kinds of machine learning algorithms for clustering, classifications, predictions, regression, sentiment analysis and so on. The key point is that AI and ML have now evolved to a point where entry barriers are really low. You can start experimenting with simpler use cases and then graduate to more sophisticated use cases, once you are comfortable with basic ones.

If you would like more  information or advice, we’d be happy to help. Please feel free to fill the form below or email.

 

Machine Learning for Personalizing Digital Experiences

Personalization has always been a key aspect in almost all kinds of digital experiences. Some examples of commonly found personalisation use cases are: allowing users to customise their dashboards or user interfaces, showing content based on explicit user-defined criteria, showing content based on implicit criteria or even that based on user behaviour. All these required complex personalization systems, with processing and rules engines for creating and managing personalization rules. As a result, it has always been a non-trivial exercise to implement personalization in a resource effective way.

Artificial Intelligence (AI) and Machine Learning (ML) techniques have evolved and it is now much easier than ever to use these to implement personalization now. At a very simplistic level, personalization is about “predicting” what a user will like to see and then offering that to the user. You can make this prediction based on a complex hierarchy of rules or use historical data to make this prediction. The latter is exactly what machine learning based techniques can do for you.

Delivering Right Content to the Right People

Consider this common scenario: You want to show content that is relevant to the user. For example, let’s say you run an events site and want to show events that are relevant to the user. To do this, you could create multiple rules such as rules that match a user’s and event’s locations, or show events based on user interests and so on. This works great, with may be 5 rules. But consider a scenario where your users have 100s of profile and behavioural attributes and your events also have similar large number of attributes. So as you come up with more criteria, this rules based business becomes really messy and difficult to manage.

But with machine learning based techniques, you now have alternatives. Plus you no longer have to procure sophisticated personalisation systems. Instead, you can start writing very simple programs that can help you predict what kind of events a user would like to view depending on the events that other users with similar profiles viewed. You could use the same logic to display targeted news, movie recommendations or books. Some of these machine learning techniques are really simple and you can get started very easily.

Here’s another example for the same events web site. As an event organizer, you create a new event but are not sure what kind of pricing would work best. Again, if you think of this problem as a prediction problem, as in “predict price of new event given pricing of past events”, you could again use a simple prediction algorithm to recommend pricing based on pricing data for past events. Instead of events, you can use the same logic to price your new offerings or whatever. In addition, you can use this new data point as another input for your next prediction.

Start Small and Experiment

In addition to personalization, Digital Experience Management use cases can have several aspects for which you can start using machine learning. And there is no need to wait for your vendors to start offering additional AI/ML capabilities. Almost all programming languages provide APIs and libraries for all kinds of machine learning algorithms for clustering, classifications, predictions, regression, sentiment analysis and so on. The key point is that AI and ML have now evolved to a point where entry barriers are really low. You can start experimenting with simpler use cases and then graduate to more sophisticated use cases, once you are comfortable with basic ones.

If you would like more  information or advice, we’d be happy to help. Please feel free to fill the form below or email.

 

Blockchain for Information Management

74369c_86663337b53d4b16a62217818ecdcc65_mv2

Blockchain is best known for its use by the alternative and controversial currency market and most notably Bitcoin. But Blockchain is not Bitcoin nor is Blockchain an alternative/crypto currency. Rather Blockchain is an underlying distributed ledger technology (DLT) that can be used in many different contexts other than Bitcoin.

In fact, we are already seeing the experimental use of Blockchain in many non-currency related situations – for example, the management of healthcare records and the processing of shipping manifests. Specifically, in the context of Information Management, Blockchain can address a number of use cases. However, as with anything new in tech industry, Blockchain is also being used for information management use cases for which it is not a suitable platform.

This recently released report by analyst firm Deep Analysis (written by Alan Pelz-Sharpe and me) looks at the use of blockchain for information management. It explores the structure of this market and its future impact and growth potential.

The report’s ToC is as follows:

  • About this report
  • Methodology
  • Introduction and a brief history
  • Executive Summary
  • How does a blockchain actually work?
  • The key attributes of blockchain
  • Blockchains in action
  • Public versus Private Blockchains
  • The Market Structure
  • Market Drivers
  • Market Realities
  • Our Advice
  • Summary

You can read more details, including exec summary as well as purchase the report here.

 

Using Factor Analysis to reduce number of attributes

In my last post on using machine learning for everyday use cases, i’d mentioned factor analysis as a way to reduce large number of items (e.g., news articles’ attributes) into smaller set of variables. Some people asked me for examples of this, so this post is an attempt to explain how factor analysis can be used for what is known as dimensionality reduction.

Issues with large number of attributes

Let’s say you have a list of customers, and you want to analyze some aspect. It’s quite easy to analyse your list if they have a relatively small number of attributes – say 10. What if the number of attributes increases to 20? 100? Sure, manageable. What about 1000 or 10000 or more? or what about attributes that are not obvious (e.g., intention to watch a movie)?

Recall that in a typical machine learning algorithm, these attributes form the input matrix based on which you predict an outcome. So as the number of attributes increase, your algorithm will get computationally expensive plus difficult to program (and debug etc). There are additional issues of overfitting — meaning your machine learning model will fit your training set extremely well but still may not be able to predict that well.

One way to address this would be to group some of the related attributes together and run your algorithm based on that “grouped” attribute as input. Now in some cases, it’s easy to group some attributes because it would be obvious.

For example, let’s say you have attributes that describe a customer’s height and weight. Are they directly proportional to each other? Probably not. But are they correlated? Probably yes. But many of these correlations are not that obvious and there could be underlying patterns that are hidden.

Factor Analysis to reduce number of variables

Factor analysis is a technique to reduce the number of attributes when the relationships between those attributes are not that obvious. Essentially, Factor analysis analyzes interrelationships (or correlations) among a large number of items and reduces the large number of these items into smaller sets of factors. This smaller set of factors can then be used in further analysis — e.g., in logistics regression or neural network to predict your outcome.

Here is another concrete example. This study analysed how social media is used within organizations and came up with a list of 31 activities. These are examples of organisational processes which can benefit by use of social media. Of course, there could actually be many more activities depending on the scenario. The linked post has a chart that shows these activities. Now, if I had to do any analysis, it meant creating a model and analysing the impact on these 31 variables. A factor analysis (actually Principal Component Analysis to be precise) was carried out on these 31 variables and it grouped them into 8 variables. So for example, the factor analysis suggested that following variables from those 31 variables be grouped together:

smvaluechain

Fig: Multiple attributes grouped together by factor analysis

You will also probably agree that all these activities appear to be correlated as all of them relate to sales are marketing activities. So instead of analysing all these variables separately, you can think of “Sales and Marketing” as one factor that encompasses all these 7 different activities (variables). Similarly, other groupings followed similar patten and I ended up with 8 high-level variables in place of 31 variables.

Okay, so once you have a smaller, more manageable set of attributes, you can then use the grouped variables in your machine learning algorithms for further analyses. This will not only improve the performance but also result in better algorithms and improved predictions. In this study, i eventually used these 8 variables for further analysis using Confirmatory Factor Analysis and SEM. But more about that later.

Machine Learning as an alternative to rule based processes

There’s a lot of discussion about machine learning these days and pretty much every one (vendors, users) is talking about it.

I remember attending courses on Artificial Intelligence, Machine Learning and even Artificial Neural Networks back in 1998. So what’s new?

How have AI and ML evolved?

I think a big reason why everyone is talking about machine learning now is that it it’s much simpler to use machine learning now for everyday, business use cases. Earlier, machine learning was mostly used for really complicated scenarios – think enterprise search (with advanced capabilities for proximity, sounds etc) or content analytics to do sentiment analysis. All these were useful but required expensive software and resources.

Not anymore. It’s become far easier to use machine learning for simpler problems. In fact, for lot of scenarios which required complex rules, you can actually use machine learning to take decisions. Let’s take an example. You are building a website that allows users to sell their old mobile phones. The website should be able to suggest a price based on a series of questions that a user answers. So you could have a set of rules that “rule-fy” each question.

For example:

Question 1: Phone model

If phone == A, Price = p

If phone == B, Price = q

Question 2: Age of phone

If phone == A, and bought within last year, price = P

If phone == A and bought more than one year ago but less than 2 years ago, price = 0.9 P

Question 3: Color

If phone == A, and bought within last year and color == black, price = P

If phone == A, and bought within last year and color == silver, price = 0.95 P

If phone == A and bought more than one year ago but less than 2 years ago, and color == black, price = 0.9 P

And so on. You can add more rules depending on questions about age, colour, defects, screen quality and so forth. And your rules become increasingly complex. And then what happens if a user wants to enter a value that the rule doesn’t handle?

Of course, in real life, you wouldn’t write rules like this. You will probably have a rules engine that that combines multiple rules and so forth but you get the idea.

Machine Learning as an alternative to Rules-based processing

Here’a how machine learning can replace a complex rules based application.

Let’s say you have historical data about phone sales. Yeah, I admit this is a big assumption but if you are creating rules and deciding prices, then you probably have some historical data anyways. So assume you have data such as this (this is just a sample; the more you have it, the better it is):

phone data

Fig: Second hand phone sales data

Now your original problem can be stated as a machine learning problem as follows:

How do you predict the price of a phone, that is not already there in the sample (or training set) above based on features and data available as part of training set?

Essentially, instead of you or your application making decisions based on pre-defined rules, you are now relying on your application to make decisions based on historical data. There are many techniques that can help you achieve this.

One relatively simpler technique is to use Linear regression. Linear regression is basically a statistical technique to predict an outcome (or dependent variable) based on one or more independent variables. Based on example above, you can describe Price P as a function of variables model, age, colour etc. Or in linear regression, it can be expressed as:

P = b0 + b1*model + b2*age + b3*colour + b4*condition…..

Machine learning algorithm then calculates values of b0, b1, b2 etc based on historical data and then you use this equation to predict price for an item that was not there in the training set. So if a new user now comes and offers a phone for sale on your site, you can recommend a price to her based on past sales.

Okay, that was a rather simplistic machine learning example and you can use many other more sophisticated techniques. For example, you can do a factor analysis or Principal Component Analysis (PCA) to reduce large number of items (e.g., news articles’ attributes) into smaller set of variables. Or use logistic regression instead of linear regression.. or whatever. The key point is that it is now much easier to use machine learning for everyday use cases without spending a lot on expensive software or resources. Pretty much all programing languages and development platforms have machine learning libraries or APIs that you can use to implement these algorithms.

The main drawback of using this approach (as in this example) is that the results might not always be as good as you would get with rules based technique. The quality of result is highly dependent on training set and as the training set improves (in terms of quality as well as quantity), the results would improve.

Are you using machine learning for your applications? If yes, what techniques are you using?

 

Some data from my social media survey

In my last post, I provided an overview of how to measure impact of a digital or social media initiative. But before you can measure the impact, you need to be able to create an inventory of what all activities in your organization can use these media.

In this post, let’s look at the findings of a survey I conducted in order to identify how organizations use social media.

First, let’s look at some profile data. There’s a lot more than what’s below, however, i’m sharing only a subset.

Profile of Participants

In terms of verticals or industry segments, the participants’ organizations were spread across all major industries. In fact, “All others” category, shown in the chart below, consisted of more than a quarter of participants and pretty much all industries were represented. However, bulk of participants were from service-oriented industries such as IT (21%), BFSI (8%), Consulting (11%) and Technology (10%).

organizations businessFig 1: Organizations’ primary business

About 37% of participants reported they worked for organizations that have more than 10,000 employees. Another 20% said their organizations employed between 500 to 9999 employees. These are representative of large organizations that typically have several complex processes and challenges.

no of emps

Fig 2: Total number of employees in organization

Usage Analysis

Participants were asked to select activities for which they could use social media. Results for activities that made maximum use of social media are shown below. This figure shows top 15 activities based on number of responses. Most marketing oriented activities figure in this list. That is not surprising. In addition, many use cases from other activities also figure in this list. In particular, social hiring (55%), customer support (53%), and knowledge management (52%) are amongst the most common activities for which social media is used.

social media in value chain

Fig 3: Social media usage in value chain activities: Top 15 activities

Another aspect of usage is in terms of social media tools. Users use these tools for any social media activity and so in addition to measuring usage of social media within activities, this is another way to measure usage. The figure below shows the percentage corresponding to usage of different social media tools. Social networking sites such as Facebook and Linkedin were at the top with 86% participants using them. Facebook’s usage for social media marketing and Linkedin for social hiring are well known use cases, so this result is not unexpected. This was followed by Blogs (64%), Video/Presentation sharing such as YouTube (59%), Instant messaging like WhatsApp (58%), Microblogs such as Twitter (56%) and Document and File Sharing such as Dropbox (55%).

social media tools usage

Fig 4: Social media tools usage

There’s lot more and i’ll share more findings in future posts. Meanwhile, i’m working on a set of tools that uses all this data as a basis for further analysis — something that organizations will be able to use for their own digital initiatives.

How do you Measure the Impact of a Social Media Initiative?

While there’s a widespread usage of digital and social media within organizations’ value chains, there are several questions that I hear often. E.g.,

  1. How  do you measure the impact of these initiatives?
  2. There are several use cases for which social media can be used. How do you actually prioritize these and decide what you want to use social media for?

Of course, there are many mechanisms – such as measuring financial impact (RoI etc) — to measure the impact of any initiative,  it’s not really trivial to apply these measures for something like social media.

One way to measure the impact or to be able to prioritize is by using some other outcome – a performance outcome or something similar.

 “Competitive Advantage” is one such outcome

A social media initiative can be analyzed in the context of whether or not you are you able to achieve competitive advantage by using social media. Again, there are many ways to define competitive advantage but for the purpose of this post, let us consider the concept of competitive advantage based on Michael Porter’s Value Chain Analysis (VCA).

There are numerous examples of social media usage in an organizations’s value chain activities. Some of these are shown in the figure below.

social media usage and its impactFigure: Social media usage and its impact on different sources of competitive advantage

The figure  above also provides an example of how an analysis of social media for competitive advantage could look like. For each activity in the value chain, the table shows examples of how social media can contribute to that activity and resulting impact on cost, differentiation and focus (the factors of competitive advantage, as described by Porter) for an example organization. Note that this figure only shows the summary of a sample analysis. In practice, an organization will typically use a detailed methodology and tools to access the cost, differentiation and focus impact of each of the activities and outcome will vary from one organization to another.

Once such an analysis is done, an organization can find out what are the most important activities in terms of their impact on different sources of competitive advantage. Based on organization’s own strategy and vision, appropriate activities can then be chosen as a use case for social media implementation.

In next few posts, i’ll go into details and describe how to analyze social media in context of competitive advantage. It’s all part of my PhD work in which I define several constructs (or factors) that have an impact on usage of social media and ultimately on competitive advantage. We’ll also explore how these different constructs play with each other, and finally look at a framework (and an excel-based tool) that can help you with any digital initiatives in your organizations.

PhD update – invitation to participate in my research

As you might be aware, I am currently doing my PhD from the Faculty of Management Studies, University of Delhi. The topic of my PhD is: Social Media for Competitive Advantage. Basically, the idea is to understand usage of Social Media, and map it to Porter’s concept of Competitive Advantage. Here are some details.

I’d like to invite you to participate in my survey.

The objectives of this survey are to understand:

  • How social media is used within organisations?
  • Social media’s impact on an organisation’s competitive advantage.

Your responses will remain confidential. Data from this research will be kept securely and reported only as a collective combined total. No one other than me will know your individual answers to this questionnaire. And if you like, i’d be happy to share my findings.

If you agree to participate in this project, please answer the questions as best you can. It will take approximately 15-20 minutes to complete the survey. Here is the link to survey:

http://www.surveygizmo.com/s3/2320776/smblog

Thank you for your assistance in this important endeavour.

Categorizing IoT Devices and Wearables Part 2 – Screen Dependency

In an earlier post, we looked at how digital workplace and marketing professionals can categorize the IoT marketplace based on device dependencies.
Another way to categorize the marketplace is in terms of device screens.
Absence or presence of a screen
Most Internet devices that users have so far interacted with — computers, mobile phones, tablets and even handheld devices — have a scr…

Read More

Post abstract cross-posted from Real Story Group.