![]() |
Articles |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
By Neil Raden The analysts are in agreement that Microsoft's long-awaited OLAP offering, codenamed Plato, will cause a shake-out in the OLAP market. Although some vendors will clearly be affected, the OLAP space is evolving rapidly into two distinct markets, and Plato appears to address only one of them--at least for the time being. (Indeed, the whole concept of OLAP has become somewhat less adhesive to the page, with each new definition slipping away before the ink is dry.) Essentially, because of Microsoft's marketing clout and bundling strategy, existing OLAP players in the lower end of the OLAP market will likely suffer, but the higher-end players will be unaffected. Microsoft's Data Warehousing Strategy Microsoft desperately wants to expand its desktop hegemony to the enterprise level. The first step is to establish NT as a viable operating system for enterprise tasks--a reputation that is beginning to gain momentum, largely as a result of increased confidence in the capabilities of its flagship database server product, SQL Server. By attracting some of the most capable database scientists in the industry, Microsoft has spent the last three years improving the product's scalability and performance, an effort that will only begin to bear fruit with the release of version 7.0 (codenamed Sphinx) in late 1998. While great strides have apparently been made, especially in transaction processing scalability (database size, number of users, and parallelism), Microsoft initially missed OLAP as a crucial element and is only now catching up with the rest of the market. After jump-starting its OLAP effort by acquiring Tel Aviv-based Panorama Software Systems in 1996, Microsoft quickly announced two major initiatives: the OLE DB for OLAP (codenamed Tensor; see "Attack of the Killer APIs,") specification and the Plato project. Tensor Tensor is an extension of the overall OLE DB API set that Microsoft foresees as the standard set of interfaces to databases, both SQL and non-SQL, in its COM architecture. Tensor, however, is the part of OLE DB API set that Microsoft foresees as the standard set of interfaces to databases, both SQL and non-SQL, in its COM architecture. Tensor, however, is the part of OLE DB that specifically addresses multidimensional data access. It isn't the first proposed standard API for OLAP data: The OLAP Council proposed the MD-API in 1996, but nearly every vendor in the market has already announced support for Tensor, so MD-API is effectively dead. The two holdouts--who are, incidentally, the market leaders--are Arbor Software and Oracle. The first obvious question is: Why is there a need for a separate API for multidimensional data at all? Ultimately, there may not be one, but for the time being, SQL-based relational databases have been too slow to provide OLAP capabilities natively. As shown in Figures 1 and 2, two of the most basic requirements for an OLAP server are the need to add up numbers by business rules and display the results in either tiering reports or active displays enabled with drill-down capabilities, and the need to display the data in cross tabulations as opposed to simple lists. Even the aggregation function itself is severely limited in SQL databases because not all aggregations are additive. Queries frequently need to summarize data by ranking, tiling, running, or moving averages and/or percent totals, such as market share. With the exception of some specialized relational databases (Red Brick and Teradata, for example), none of the major database products perform even these simple OLAP tasks smoothly.
This gap is filled by multidimensional database vendors such as Arbor Software (Essbase), Oracle (Express), Applix Inc. (TM1), Seagate Software (Holos), Gentia Software (GQL), and others at the database server level by providing their own proprietary multidimensional databases. Query-tool vendors (such as Brio Technology and Business Objects) address the problem by reading relational data more or less on-the-fly into their own temporary multidimensional databases or by creating small "cubes" of multidimensional data, as in Cognos' PowerPlay. The relational OLAP vendors--Information Advantage, MicroStrategy Inc., and Platinum Technology--provide many more robust capabilities to interact with very large relational databases and provide advanced OLAP capabilities, though each one uses a somewhat different model. (See Table 1 for a summary of features.)
Table 1. OLAP product capabilities. A powerful motivation exists for relational database vendors to incorporate these capabilities into their core engines. All the major vendors, with the exception of Microsoft, improved their products by, at a minimum, adding support for dimensional schemas in their query optimizers, advanced indexing techniques, and much-improved bulk loading and indexing routines, as well as by dramatically increasing the scale they can address, mostly through the application of parallelism. None of the vendors have released true OLAP capabilities embedded into the server kernel, but they're all working on it: Oracle will eventually embed the Express OLAP engine, Informix is doing the same with MetaCube, Sybase is still equivocating with Sybase IQ and Whitelight (but both products are moving along well), and IBM made what appears to be a poor choice in attempting to meld the Essbase engine with DB2 Universal Database. As I'll explain, Microsoft is poised to leapfrog the competition on this score by delivering true OLAP engine capabilities as part of a relational database kernel. Plato Most of what we know about Plato derives from the excellent Panorama technology Microsoft acquired last year. We can presume that, at a minimum, Plato will be a hybrid OLAP product--it will operate either with a temporary multidimensional database built from SQL queries, or with a persistent multidimensional cache similar to that used in multidimensional OLAP (MOLAP) tools. Precisely how Microsoft will implement Plato is the subject of much speculation; but we do know that it will include several components in a multitiered architecture consisting of data warehouse, OLAP server, and client tiers (see Figure 3). The "glue" that binds these tiers together is Microsoft's COM architecture--specifically, OLE DB/ODBC between the data warehouse and the OLAP server, and OLE DB/ActiveX between the OLAP server and the desktop. Obviously, at the data warehouse tier, SQL Server is being positioned as the database of choice. In an address at the DCI Data Warehousing Conference in Phoenix in October 1997, Steve Ballmer, Microsoft's executive vice president, pointed out that most data warehouses are less than 100GB in size--a scale he claims can be easily managed by SQL Server. (In my opinion the actual figure differs by at least an order of magnitude, but vendors tend to make claims one or two versions early.) According to this line of reasoning, it makes sense to embed certain OLAP capabilities directly into the relational database kernel, such as a multidimensional cache that can be accessed by the Tensor API. The cache is a memory-resident structure that is optimized for analysis, built either all at once in response to a query or incrementally as requests are serviced. This approach is similar to that taken by many of the high-end OLAP vendors, except that they build this structure outside the relational database. Presumably, through Tensor, tools will be able to query this cache directly. At the OLAP server tier, Microsoft will provide the kind of "persistent multidimensional cache" (which is just another term for a proprietary database) that is currently all the rage (such as in Arbor Essbase, a database designed solely for the analytical processing of OLAP tool queries). What isn't clear, however, are the specific characteristics of this server: How well will it scale? What kinds of calculations will it support? Will the link between the OLAP server data and the finer-grained relational database detail be seamless? Will it include advanced calculation capabilities such as financial, time-series, and statistical functions? These questions remain largely unanswered, but we can look to Panorama itself for some clues: no OLAP "groundbreaking" here; just an economically packaged set of capabilities that address the most frequently used aspects of OLAP. The client for Plato is a container such as Excel, Access, or Internet Explorer holding ActiveX controls for viewing and manipulating OLAP data. Excel 98 will almost certainly provide built-in support for Plato. With Tensor, we can expect all the desktop OLAP tools to provide expanded capabilities over those bundled with other Microsoft products, but the cost differential will become critical: Microsoft's tool will be free, at least marginally. Market Model Microsoft will offer a compelling strategy to encourage market acceptance of Plato. As a savvy player in the new information economy, Microsoft learned long ago that the best strategy for increasing brand equity is simple: Give it away. Nigel Pendse, coauthor of The OLAP Report, sums it up: "Microsoft is concentrating on deployability rather than high-end functionality, so the product will be particularly easy to install and tune. Thus, not only will the software be effectively free, but the other costs of getting applications working on it will be much lower than with today's complex OLAP servers." Although Microsoft may decide to price Plato separately, expect the price to be so much lower that competing products will seem inordinately expensive in comparison. You will probably see Plato bundled with SQL Server version 6.5 or even with a BackOffice application if Sphinx is delayed again. (At press time, although no final decisions about packaging had been made, Microsoft insisted that Sphinx and Plato will be released at the same time.) The initial release will include ActiveX controls and an Excel viewer at the very least; most of the desktop OLAP vendors will have their products hooked up as well. Those whose needs are met by the initial functionality of Plato will flock to it in droves. Not everyone, however, will fit into this category. The Emerging Two-Tier OLAP Market The terminology used in the data warehousing industry is more confusing than it is helpful. This fact is nowhere more obvious than in the terms used to describe the capabilities of OLAP tools. For example, the ability to perform "complex calculations" is frequently cited as a needed feature, but what exactly does that mean? There are at least four separate types of "complex calculations" that I encounter in my practice, and each one poses challenges to most of the tools. In addition, the ability to perform one of these calculations is no guarantee that the others can be addressed. The four types are:
Although all these features are useful, most of them could be implemented as add-ons to almost any OLAP tool; none of the underlying data models preclude the ability to perform them. Therefore, whether or not a vendor chooses to implement them depends on how it positions its product, not necessarily on technology. Latent Value and the Question Gap There are, however, a set of analytical manipulations that are not as appealing or exotic as the four I've listed here but are considerably more complex to implement. As the data warehousing industry matures, it is becoming abundantly clear that most tools cannot exploit the volume of managed data, even though there are compelling business reasons to do so. Sanju Bansal, COO and executive vice president of Microstrategy, refers to this untapped potential as the "latent value" in data warehouses. One type of analysis that illustrates this point is that based on the customer-centric data warehouse. Multidimensional databases involve models containing a handful of dimensions and databases of less than 10GB of raw data, usually less than even 1GB. This "reach" is usually adequate for modeling at the product level of detail, but not at the customer level. Adding customer granularity can explode the model by one or more orders of magnitude and add many more dimensions, exacerbating the already difficult challenge of managing sparsity. ("Sparsity" refers to the condition where the "facts" being measured--such as sales--do not exist at most combinations of dimensions; for example, sales of product A don't occur in every time period to every customer at every location.) In addition to simple scaling up, these proliferating databases give rise to different relationships that just can't be addressed by the data models of most OLAP tools--including explicit many-to-many relationships, recursive hierarchies and multidimensional attributes, and qualities and roles. In fact, even the recently respectable star schema is severely limited in economically accommodating these types of models. For example, in the star schema, "attributes" of a dimension are represented as columns. When the largest dimension has a few thousand members, such as a product dimension, carrying redundant attributes for members at every level of every hierarchy is manageable overhead. However, when a customer dimension has millions of members and hundreds of attributes, the basic star schema becomes increasingly unwieldy. In a banking schema, say, where you have attributes at each of three levels--Customer, Household, and Zip 5--every record in the table carries fields for levels other than its own. At a certain scale, this structure becomes uneconomical in terms of storage as well as performance. Promotional analysis is another example of a complex function that is not well addressed by current tools. The promotion of a product is an inherently cross-dimensional problem that depends on time, product, and location. Consider this hypothetical question: "Did our recent promotion improve sales and profitability in those areas where our market-share growth is more than one standard deviation lower than our top three competitors?" In answering this question, being able to compute the standard deviation on a set of 10, 20, or even 1,000 numbers is one thing, but calculating it against a 1TB database is a different matter altogether. Issues of this nature demand a different approach to scalability. When the actual size of the database, including the "grain" of the data, is an issue, data "depth" is the concern. The inability to answer these exceedingly difficult questions creates a gap--a question gap--that is only now being addressed by a few tools. Data "breadth" is the issue here, and it can only be solved by improving tools to address with these kinds of analysis. This problem will require support for, at the very least, many different types of relational schema designs, including operational ones. This market doesn't have a universally accepted name yet, but the ones I've heard include "true ROLAP," "operational OLAP," and even "transactional OLAP." Whatever name you prefer, only those vendors that move toward the upper right quadrant in Figure 4 will be players.
Where does Plato Fit? In its current form, Plato is targeted directly at the lower-level OLAP market, and especially at the OLAP server market. Considering that Plato is likely to offer 70 to 80 percent of the functionality of Arbor Essbase at perhaps 10 percent of the cost, OLAP vendors are already scrambling to reposition themselves. No consensus exists about the prospects of front-end tools such as PowerPlay and Business Objects, however. According to Pendse, "The desktop OLAP vendors should have a ball as Microsoft hugely increases the number of OLAP servers in the market, many of which will need client software from companies like Cognos, Brio, and Business Objects." Bansal feels that all the low-end OLAP vendors are in for a challenge because Microsoft will quickly add "viewer" support to both Plato and any other OLAP source that is OLE DB compliant. As for the OLAP server vendors, Pendse sees several options:
Plato Approaches Plato will offer only minimal OLAP capability, but the "80/20" rule applies: Most customers use only the most basic OLAP capabilities anyway, and that is precisely the kind of mass market that Microsoft likes to exploit. For serious analytical work, especially in data-intensive industries like retail, telecommunications, financial services, and healthcare, the high-end vendors can discount Microsoft's impact, at least for the next two years. Plato will cause the most pain in entry-level sales opportunities, which until now have often resulted in the selection of an overpowered tool. Plato is Microsoft's first foray into the data warehousing space. Its success is not guaranteed, but there is no doubt that it will capture significant market share. Neil Raden is the president and founder of Hired Brains Inc., an international data warehousing consultancy with offices in California, Chicago, Toronto, and New York. He has authored dozens of magazine articles, books, and white papers on information technology implementation, decision support, data warehousing, and OLAP, and is a frequent speaker on these topics. You can reach him at nraden@hiredbrains.com . Copyright 1997 Miller Freeman Inc. All Rights Reserved Redistribution without permission is prohibited. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © Hired Brains
Incorporated, 2002. All rights reserved. |