![]() |
Articles |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Merging Business and IT Architecture Perspectives to Evaluate Query and Analysis Tools Successfully By Neil Raden (This article appeared in the August, 1997 issue of DBMS magazine.) The contemporary decision-support architecture (DSA), including data warehouses and OLAP tools, should be designed to accommodate the widest range of sources and uses of information. The kind of flexibility and speed demanded by today's business environments require that these architectures embrace change as a design principle, rather than trying to iterate to a set of fixed requirements. This acknowledgment of the sheer power of contingency is the driving force behind the explosion of data warehousing. The most tangible result of these efforts are the so-called "front-end" tools that face the clients (especially nontechnical end users), but unfortunately, the ultimate effect of these much-anticipated programs often leaves customers with the feeling of anesthetized anticlimax. So what does all of this have to do with selecting the query and/or analysis tools for a data warehouse? Everything. In an earlier attempt to provide some framework for selecting OLAP tools in my article "Data, Data Everywhere" (Information Week, October 25, 1995, pages 60-74), I suggested that five qualities neatly partitioned the evaluation space: functionality, fit, performance, scalability, and future. Although much has changed in the past two years, those categories as axes of evaluation are holding up. Architectural issues are still necessary criteria for evaluation (it's rather pointless to choose a tool that cannot connect to your database, for example), but they are not sufficient. More than anything else, the often competing issues of performance and flexibility are the critical decision points. In this article, I attempt to segment the offerings in this area based on how people use them (a cognitive approach). For example, when evaluating the issue of scalability from a technical perspective, we can make a pretty clear case that a two-tier SQL generator will hit the wall at some point when the number of simultaneous users increases. A cognitive approach deals with scalability from the application perspective: Can I expand my range of analysis from daily sales reporting to event- or alert-driven notification, without having to rearchitect the entire system? Although both approaches are necessary, the cognitive approach is designed to meet the requirements of the business. The Current Evaluation Approach Carefully deployed decision-support architectures can genuinely dissolve the boundaries between information systems and the analytical work that people do, while radically challenging the boundarizing and conceptual structure of the old organizational hierarchies. By providing an environment where reasonably well-qualified workers can react to business issues in real time and make the right decisions most of the time (rather then the reliance on "a few supermen," as Peter Drucker would say), organizations can realize the benefits these systems promise but rarely deliver. If I could fix the blame for this failing on a single cause, it would be the almost universal lack of a "big picture" in the groups entrusted with the purchase decisions and program management of DSA. IT-led initiatives provide invaluable ideas and assistance and are often the catalysts for productive change in organizations. The problem is that the nature of work today is too complex for simple solutions and rules of thumb. Worthy goals of standardized access, data quality, consistent development methodologies, and scalable architecture float to the top in the projects, but the business objectives are poorly understood and often missed completely. Content to decompose these complex issues into arbitrary categories of components (typically given currency by aggressive marketing of the vendors) -- such as data warehouse/data mart, MOLAP/ROLAP, relational/multidimensional, and thin/fat -- purchase decisions are based on these technical issues instead of their use and impact in an organization. Too many products, approaches, methodologies, and service offerings in this area are nothing more than repackaged jetsam of last year's latest thing. Because they aren't designed specifically to support a networked community of knowledge workers, the vendors create new categories of one and demonstrate how their product can fill it. Any categorization of the market is arbitrary, because the tools are evolving rapidly. For example, Express (Oracle Corp.), Holos (Holistic Systems Inc., a subsidiary of Seagate Software-Information Management Group), Essbase (Arbor Software Corp.), and Pilot (Pilot Software Inc., a Cognizant Corp. company) have moved aggressively to position their multidimensional database (MDD) products as true hybrids, combining the features of a MDD with those of a relational OLAP (ROLAP) tool such as those of MicroStrategy Inc. or Information Advantage Inc. Also moving up the food chain are managed query tools such as Business Objects (Business Objects Inc.) and Esperant (Software AG of North America Inc.), that now offer some measure of desktop OLAP capabilities formerly the realm of PowerPlay (Cognos Corp.) or Brio (Brio Technology Inc.). The area is trendy, and new entrants are proliferating riotously. And, of course, everyone is waiting for the next shoe to drop, which is most likely to be a big entrance by Microsoft Corp. Microsoft acquired technology from Panorama Software Systems of Tel Aviv, Israel in the fall of 1996 and has been integrating it into what will be the company's first OLAP tool. Currently code-named Plato, the tool is expected to be released by the end of this year. An Alternative Evaluation Method Available information about the tools is contradictory; in-depth accounting of the strengths and weaknesses of the products is rare; and making sense of the claims by vendors, analysts, and "gurus" depends entirely on your perspective. Nowhere do the conflicting requirements and viewpoints of information technology (IT) and its clients become more obvious, and the conflicts more emotional, than in the selection of the technology that a client faces. In most cases, these viewpoints end up colliding rather than collaborating. The technical perspective is concerned with issues such as data organization (relational, multidimensional, "other"), processing location (server-based, desktop, cooperative, n-tier, "fat," or "thin"), standards compliance (DCOM vs. CORBA, 3NF vs. Star Schema, Notes vs. Web, ActiveX vs. Java), and openness. This leads to distinctions such as MOLAP vs. ROLAP, "fat" vs. "thin" client, or "open" vs. "proprietary." Although these distinctions are useful for making decisions about the technical architecture, how they clarify the decision-making process for business purposes is less clear. In Table 1, I compare 10 differentiating categories. There is a yawning gap in the interpretation of apparently simple concepts such as openness, scalability, and security. Rather than inform and educate, the currently used terminology in IT just confuses businesspeople in the worst possible way -- it leaves them with the presumption of understanding where, in fact, there is none. I'll examine a few of these criteria in more depth. Database Organization IT organizations rarely look at the "container" of business intelligence data with a separate set of requirements from those of any other database application. In most cases, a "standard" database is supported, based on the assumption that some sort of economy of scale can be realized by standardizing on a single technology. For this reason, databases that are optimized for decision-support applications, such as multidimensional databases (Arbor Essbase, Oracle Express, Holos, Pilot, Gentia [Gentia Software], for example), relational databases tuned specifically for these applications (Red Brick [Red Brick Systems Inc], NCR Teradata [NCR Corp.]), or other, specialized structures (Sybase IQ [Sybase Inc.], HOPS [HOPS International Inc.], and AnalytiX [Customer Insight Co.]), are either shunned completely or later dismissed as "proprietary." To a nontechnical knowledge worker, all databases are proprietary. What the "consumers" of these databases need is reliable, timely, clean, and understandable information that is presented through a medium that facilitates and enhances the consumers' ability to perform their jobs. Specifically, they require performance in loading, querying, and updating. The last aspect is especially worthy of note. Most data warehouse designs are serial and unidirectional, overlooking completely the value of data generated at the desktop, such as scenarios, what-if analysis, commentary, discussion, and even frameworks, models, and calculations. The notion that only operational systems capture data is a limiting factor for data warehouses, and most of the tools offered in the analytical space extend this notion. The only exceptions are the multidimensional databases, in varying degrees of strength. Because of their positional arrangement of data, with relatively coarse indexing, it is rather easy to add or update data to most multidimensional databases, but it's correspondingly difficult to change the structure of a model. Table 2 (page 50) demonstrates that a "hypercube" design such as Essbase can accommodate desktop data input effortlessly; changing the arrangement of variables or structure can only be accomplished by an administrator. Gentia, with a more EIS-type of approach, requires much more effort to create the tableau for data entry, but its ability to construct new models on the fly gives it a real advantage. Holos combines the aforementioned elements of Essbase (namely its ability to capture values that fit into a preexisting structure) and Gentia (the ability to create fairly complex models on the fly) and can accommodate both types of updates nicely. In fact, its ability to separate "rules" from the rest of a structure makes it quite powerful as a generator and collector of on-the-fly scenarios. Express, based on a multicube model, is extremely flexible at viewing data but quite cumbersome at updating or calculating on the fly without extensive development in either the Express 4GL or Express Objects. Relational databases, with few exceptions, can be modeled to accept input data effortlessly (the OLTP schema for transaction processing) or to provide good query performance (the "star" schema that is common in data warehousing), but not both. The approaches are mutually exclusive at this point, or else they require extensive behind-the-scenes work to perform. Because of their general-purpose approach, they provide no native support at all for modeling or what-if analysis. There has been some movement in this area, especially from Informix Software Inc.'s DataBlades and from Oracle, with impending integration of the Oracle database server with the Express OLAP server. Efficiency In the two decades I've been involved in DSS, sooner or later in every project someone will assess the relative merits of every tool with a single, overarching evaluation. Transcendent in its simplicity, oddly almost comforting in its familiarity, it is offered with an air of qualified omniscience, despite its complete lack of meaning: "It's a hog, it'll bring the machine to its knees." The sublime irony is that the evaluator is often a serial offender who most likely draws insight from the same well that brought you the DP backlog, the not-invented-here syndrome, and the Year 2000 problem. How something so central to the well-being of a firm as DSS can remain such a mystery to IT practitioners is beyond my understanding. Performance should be defined in terms of output, not throughput. To businesspeople, efficiency of the solution has nothing to do with machine throughput, storage density, cache hits, or clever use of threads. Instead, the desired effect is to improve turnaround time, create a more fact-based domain for decision making, and turn personal discovery into organizational learning. Slice-and-dice and drill-down tools are only a piece of the puzzle. The different perspectives provide a good example of "doing things right or doing the right thing." Unless the tools can contribute to a meaningful change in the nature of the work, they accomplish little in the long run. One of the current industry buzzwords is "closed-loop decision support," which is meant to describe the interoperability of analytic processes with operational ones -- for example, the ability to explore a customer database to uncover desirable (or undesirable) attributes and to immediately dispatch an action, such as creation of a mailing list. In our practice, we consider all decision-support systems incomplete unless they can "close the loop." Analysis of sales results is only useful to a point. The engine of value creation in DSS is the linking of the analysis to discussion, consensus, action, and results in a systematic way. Only those tools that can cooperate with workflow, the new "push" technologies, and alternative access methods and operational systems are capable. Single-platform technologies; closed, proprietary front ends; arcane 4GL languages; weak platform support; and obsolete architectures are the enemy. Pretty screens, nice charts, simple navigation, and adherence to "standards" can easily divert your attention. Scalability Scalability is a perfect example of different conclusions being drawn from different perspectives. Consider Figure 1 in assessing scalability. If the business process to support is one of unguided data exploration of a limited set of data, fat OLAP may be a workable model. For example, data separated by time and territory, so that no single desktop has access to more than a fraction of the total, lends itself very well to this approach, especially if there is an additional requirement of disconnected "takeaway" of the data. In essence, relatively simple SQL is dispatched to the database, which causes the database to return a large amount of data. The data is "cubed" on the client, and after a delay ranging from seconds (unusual) to minutes (more likely) to hours (occasionally), the structure is ready for slice and dice. There are many variations on this theme, and the database may not be accessed by SQL at all, as is the case with a PowerPlay or Brio interface to Essbase. More traditional examples are Business Objects, Impromptu, and Esperant accessing a relational database. Figure 1. Fat vs. thin OLAP In the thin OLAP model, with each request for data -- whether an original template or a drill, pivot, or slice operation -- a new request is dispatched to the database for processing, and the resulting data for display is returned to the client. This avoids building the "cube" on the client completely, requiring instead that the database perform the joins, aggregations, and calculations, returning only the final result for display. DSS/Agent (in two-tier mode) is a good example of this arrangement. Obviously, the "fat" OLAP model can look very much like this when the query is simple, but the difference between the approaches becomes more marked as the magnitude of the sampled data increases. If viewed from the technical perspective, fat OLAP may look like a better bet with a growing community of clients, relieving the database server from escalating loads of complex queries. In fact, the criticism most often applied to the thin OLAP model is that it can crush the database over time. But this analysis is too limited. The fat OLAP model is very useful for certain kinds of analysis, but it falls apart completely for some very common problems. Unfortunately, the distinction between these types of queries is subtle, and the lack of understanding stems from taking a purely technical perspective. In practice, many seemingly innocent queries cannot be run in the fat OLAP model. Regardless of the size of the final result set, any analysis that needs to "touch" a great deal of data to reach a conclusion will create scalability problems, because all of the data must be transferred to the client first. An example of this is queries that compare variable time periods, where selection is based on the existence of a record, such as, "Show the growth, by product group, by territory, during the last two Christmas selling seasons, for only stores that carried the products in both periods."
Figure 2 depicts the more robust three-tier OLAP models. The most common implementation is the fat/thin model, employed for the most part by hybrid OLAP tools (HOLAP) such as Holos, Express, and Pilot. Hybrid OLAP creates multidimensional models on the fly from data stored in relational databases, combining aspects of ROLAP and MOLAP. Other MOLAP tools, such as Essbase, can extract data from relational databases as source data, but all reading and writing from the Essbase "cube" must be from a persistent Essbase database -- thus I can't include it here in the hybrid mode. However, if we used a middle-tier server to build the cube from an Essbase database, the combination would qualify as a "fat/thin" model. The client in this arrangement sends thin requests to the server, which dispatches "thin" SQL (namely a simple select statement) to the database, returning large results of usually raw data. The fat OLAP server performs most of the calculation and filtering work, and requests for information from the clients are dispatched by the OLAP server to the database server only as needed. In the thin/thin model, the thin server creates complex requests that drive the database server in precisely the same manner as the two-tier thin OLAP model. Although caches of result sets may be stored on the thin OLAP server, most data requests flow back to the database engine. ROLAP tools use this model for the most part, although Information Advantage's tools can be tuned somewhat to act as either a thin or fat server; the same is true for Platinum Technology Inc.'s InfoBeacon product. Only MicroStrategy's DSS/Agent and DSS/Server suite adhere strictly to the thin/thin model. A new entry on the scene, InfoSpace Inc., appears to work in the same way, providing a full set of OLAP functions through a 100-percent Java implementation on both server and applet. (For more information on InfoSpace's SpaceOLAP product, see this month's review on page 27.) The issues in thin/thin vs. fat/thin are somewhat different from those in two-tier mode. The fat/thin model allows for much more complex modeling, rather than relying on the limited capabilities of SQL, although MicroStrategy has some ingenious ways to circumvent this problem. In general, however, fat OLAP servers add capabilities that can't be easily duplicated by thin/thin models without further integration at either the server or client level. But the same scalability problems apply in three-tier systems that hold in two-tier: If your application requires the manipulation of much source data, fat/thin architectures have built-in overhead that is absent in the thin/thin models. Again, make sure that you understand the full range of analyses that are required. Underestimating this is a major cause of failure. Trends Three factors at work will have a dramatic effect on the query and analysis tools market. First, just as in every other area of information technology, is the pervasive influence of the Web. Java may be immature and ActiveX slow and insecure, but client software as we know it will be based on one or both of these approaches going forward. Those vendors without network-ready architectures (desktop OLAP, report writers, point-and-click query tools) are at a severe disadvantage, and even the strong vendors have new entrants to worry about, from both startups such as InfoSpace and Sagent as well as movement up and down the food chain, with Microsoft leading the pack. Some OLAP vendors jumped in front with Web offerings early, notably Arbor, Information Advantage, and MicroStrategy -- but those early, heroic efforts were mostly based on HTML and are being quickly replaced by Java and/or ActiveX-based products. The feature gap between the Web products and the existing client tools will disappear as well. The second factor affecting the query and analysis tools market is the rising popularity of data mart suites. These bundled packages of software, hardware, and services will take a major part of the market. This will tend to put pressure on many vendors to join a coalition, but the lower margins will not be able to support their direct sales forces, and many vendors will disappear or be merged into larger organizations. Pay particular attention to the midterm viability of any vendor you are evaluating. The newest, most promising technology is the "push" broadcasting started by PointCast Inc. Some industry analysts estimate that push technology will account for $6 billion per year by 2000, or almost one-third of the entire Web market. Decision-support applications are particularly well suited for this approach, with the ability to integrate OLAP applets, discussion groups, customizable agents, alert reporting, and, finally, a way to integrate operational and analytical workflows in a single network view. Pay close attention to vendors that are making progress in integrating with these technologies. Going Forward Front-end tools for data warehousing no longer exist as a standalone category. Any analysis of their strengths and weaknesses is based on separating the field into categories that are ultimately defined by the features of the tools themselves, leading to an infinite regress of interpretations. Good information is difficult to assemble and has a half-life of only a few days. Organizations that foot the bill for all of this have not been served well. As happened to other good ideas before it, data warehousing is in danger of losing relevance unless it can produce sustainable results. Decision support requires a focused view on businesspeople, not just processes. IT is generally more skilled at specifying and developing systems that perform predictably under a finite number of conditions. When this approach is used in decision support, the results are always suboptimal. These highly structured solutions, though derived of good intentions, leave end users, in the etymologically radical sense of the word, a-stone-ished rather than empowered. Taking a cognitive perspective will pay dividends by helping you make the right choices in query and analysis tools. The even more valuable byproduct will be your improved insight into how your organization really works. ------------------------------------------------------------------------ TABLE 1. Differentiating Categories for Query Tools
------------------------------------------------------------------------ TABLE 2. Interactive Input Characteristics of Databases
------------------------------------------------------------------------ Neil Raden is president and founder of Hired Brains Inc., a consulting company that specializes in data warehousing and business intelligence. He is well known in decision support and information delivery, widely published, and a frequent speaker on these topics. You can email Neil at nraden@hiredbrains.com . Reprinted by permission from DBMS Magazine, August 1997, (Vol. 10, No. 9) (C) Copyright 1997 Miller Freeman Inc. ALL RIGHTS RESERVED |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © Hired
Brains Incorporated, 2002. All rights reserved. |