Defining Data, Information, and Content

Computers were built to process data. Data consists of small snippets of information that have all the human meaning squeezed out of them. Today, people call on computers to process content. Like data, content is also information, but it retains its human meaning and context.

In this white paper I lay out one of the basic challenges of content management: Computers are designed to deal with data that's stripped of any context and independent meaning. Users want computers to deal with content, however, which is rich in context and meaning. How can you use the data technologies to manage and deliver very non-data like content? This challenge isn't easy. If you err toward making your information too much like data, it looks mechanical and uninteresting to consumers. If you make your information too rich, varied, and context-laden, then you can't get a computer to automate its management.

The compromise, as you see in this white paper, is to wrap your information in a data container (known as metadata). The computer manages the data and the interesting, meaningful information goes along for the ride.

Questioning your Users

This paper includes the portions of a CMS audience analysis that you might include in a user survey for any software application.

I've never heard programmers use the word audience, but as they talk about users, programmers are using the same concept. Users are the consumers of computer applications. Users access an application through a user interface. To be successful, a user interface must be usable. Usability testers recruit representatives of user groups and watch them use the application to see whether it works well for them. What are these user groups if they're not audiences?

Today's hot design process, Unified Modeling Language (UML), makes the link to audiences even more tangible. Programmers use UML to model the way that you use an application before they put any effort into programming it. UML defines roles as the types of people who are likely to use an application. In UML, you create a set of "use cases" that define what a type of person wants to accomplish and how you may expect to go about accomplishing it.

For an electronic publication, audiences are users. In fact, I call audience members users throughout this white paper as I discuss people interacting with Web sites and other electronic publications. Thus application usability, user groups, and use cases literally apply to much of what a CMS produces.

The Branches of Content Management

One of today's content management companies is fond of saying that content management is the operating system for e-business. Hyperbole aside, this statement does contain some truth. An operating system is the infrastructure that lies below applications. It provides a common set of services that applications draw on. Similarly, content management can underlie many of the Web technologies and applications that constitute e-business. In this white paper I discuss how content management can underlie key business applications.

Staffing a CMS

In this white paper I use the last system here to categorize the jobs in a CMS because it balances being comprehensive with dividing the jobs into more than just a couple of categories.

Your CMS may affect the jobs of numerous people, in diverse areas, throughout your organization. Few of the tasks that need to be accomplished to start up or run a CMS require full-time long-term staff. Rather, you start up your system with a large short-term staff and run it with a small full-time staff and a larger casual and part-time staff.

Content management is a difficult task - not only does it bring together a large number of people from quite diverse backgrounds, but it also requires individuals who are personally split in their skills and attitudes between different, often conflicting, disciplines. You will find that almost all of the jobs in this white paper call for at least two widely different skills.

The array of jobs I present assumes a very large organization with a big team. I do this to show the most complete picture of the jobs that need to be accomplished. Obviously, in smaller organizations, or in large projects in earlier stages, one person will do many of these jobs. Understand, however, that each job does need to get done in its entirety, even if there is only one person to do them all.

Getting Ready for a CMS

Your organization creates and distributes content today. Before planning an entirely new process, you are well-advised to study the current process. Before designing your new system, come to understand the ways your organization creates and publishes information and functionality, and what constraints they will put on the system you want to create. Your goal here is to work outward, from your CMS project team, through to the sponsors of the project within your organization, to the audiences you hope to reach, and to the contributing groups in your organization, in an effort to find their needs, constraints, assumptions, and blind spots.

In this white paper I'll provide an overview of the CMS project process and discuss how you might go about getting yourself and your organization ready for a CMS.

Doing Requirements and Logical Design

After you have secured a project mandate, you can begin to collect all of the information that you need to design the system, with your sponsors and other stakeholders in mind. You can begin by gathering your organization's requirements for content, publications, and CMS infrastructure. From there, you can create a product-independent (or logical) design for your CMS that defines exactly how you intend to collect, manage, and publish information.

In this white paper I'll lay out some project techniques and deliverables you can use to collect and organize requirements and construct a logical design.

Selecting Hardware and Software

In years past, there were no CMS products on the market. In the future, you'll no more think of creating your own CMS than you'd think of creating your own ERP system. Today, it's likely that you can get most of what you need from one of the commercially available CMS products. Still, it's likely that you need to do a fair amount of custom programming to get the results that you desire.

In this white paper I provide an overview of the build vs. buy decision, giving you some basis on which to decide whether you're better off building or buying your own system. Assuming that most organizations prefer to start from a commercial product, I spend the bulk of the white paper discussing the process for selecting the product most suitable to your needs.

Working with Metadata

Metadata is the small snippets of information (or data) that you attach to content so that you can more easily catalog, store, and retrieve it. A coherent system of metadata draws diverse classes of content into a coherent scheme in which content components relate to each other as well as to the collection, management, and publishing systems that you devise.

In this white paper, I dive into the concept of metadata, discussing its meaning, types, and uses.

What Are Content Markup Languages?

Almost all text that is authored and most that you want to acquire is provided in some sort of markup language (ML). A markup language wraps the content with formatting and structural codes. To understand markup languages is to understand how format and structure can be represented in text. (It also goes a long way toward helping you understand how these qualities are represented in other media.)

In this white paper I illustrate the concepts of markup languages, drawing primarily from the most familiar markup language, HTML. I also present examples from XML and the Microsoft markup language RTF to give you a full picture of what markup languages are and what they do.

XML and Content Management

In this white paper I move from talking generally about markup languages to talking specifically about XML. My intention is to give you a conceptual overview of XML that enables you to see how it works and how you may use it in a CMS. My intention isn't to teach you the hard facts, syntax, and programming behind XML. Plenty of other resources do that.

I warn you ahead of time that XML gets pretty hard pretty fast. I try to keep the story at a general readership level, but a few places crop up where it gets a bit thick for the nonprogrammer. For the programmers, your challenge is to move beyond the mechanics of XML to understand the content management concept that the XML in this white paper demonstrates.

Processing Content

With a good grasp of markup languages, and especially XML, you can begin to dive into the mechanics of content processing. Loosely speaking, content processing is conversion. To do conversion you need to be able to fully parse (that is, get at) each markup tag of the source files. You must know exactly what markup you want in the target files or database, and you must devise a feasible plan for the transformation.

In this white paper I work through many of the issues that you may need to confront as you a plan and implement a content processing project. I try to stay as non-technical as possible in the beginning of the white paper but end with a lot of programming code for the benefit of those who may need to implement content processing systems.

Understanding Content Management

I assume that most people come to this white paper because they want to know how to make large and well-managed Web sites. You can learn that here. In the process, I hope that you also find that content management isn't about Web sites, although that's where it's mostly practiced today.

Content management is about gaining control over the creation and distribution of information and functionality. It's about knowing what value you have to offer, who wants what parts of that value, and how they want you to deliver it. Knowing that, you can build a CMS machine to help you get the right stuff to the right people in the right way.

In this white paper I put a definition around the phrase content management, relate it to the very young industry with the same name, and link it to some of the Web technologies that you perhaps are now using to deal with sites that have gotten out of control.

Introducing the Major Parts of a CMS

A CMS is a system that collects, manages, and publishes information and functionality. In this white paper, I present a top-level view of content management that's partly hardware and software, partly process, and partly an organizational vehicle.

Knowing When You Need a CMS

Any organization that creates publications practices some form of content management. Even the sole proprietor organizes files on her hard drive and tries to keep track of her content and share it across publications. The sole proprietor, however, has little need for the formality and tight structure that I present here. But, if the sole proprietor grows to a small organization and then to a large organization, then file-system directories and informal content sharing begin to cost too much and take too long. A content management system (CMS) may then become necessary to help organize and automate the process.

You need a CMS if your collection, management, and publishing processes are too complex to manage informally.

CM System Design Process

This paper presents the standard application design process and contrasts it with the process you might use in a CMS.

From Data to Wisdom

This paper describes the increasing level of abstraction from the most concrete notion of information (data) to its most ethereal (wisdom).

The contrast between data and information is all you really need to know to manage content. You take the methodologies of data processing and wrap them around human-created information to create information methodologies. Still, to put content in the context of the wider world of communication and meaning, I'd like to reach beyond the basics, moving from data, the most concrete communication, to wisdom, the most abstract.

Where to Start with Content Management

This paper gives an overview of how to approach a content management project. It is most applicable to a large organization but would work with little modification in a smaller organization.

Content Management and the Information Age

This paper juxtaposes a set of excerpts from "Content Management Bible." Taken together, the excerpts argue that content management helps to define the Information Age and provides new concepts for working with information.

Content and Management

Content is information plus metadata. Management is a process of control. Taken together, content management is the process of controlling your information through the application and use of metadata.

In this presentation we explore the meaning and applications of this concept.


A template adds presentation to the content you manage. You store content apart from presentation so that you can later apply a number of different presentations to it.

Templates are an excellent way to get into the heart of CM with the least effort. We will use the idea of templates to begin the descent into CM.

Content Types

While templates are an easy way into CM, content types are at its heart.

Simply, content types are the kinds of content you want to deliver. However, deciding what your types are or should be is not at all simple. In this presentation we will work with the idea of types in order to bring you as close as possible to the center of the discipline.

CM in Organizations

While you might love to avoid dealing with your organization as you prepare for a CMS, you do so at your grave risk. More CM projects fail due to political and financial issues than to technical ones. In this presentation, we will look at how you might go about assessing the atmosphere for CM in your organization and how you can be sure it has the greatest chance of non-technical success.

XML Basics

Working directly with markup code can be trying at best and completely baffling at worst. It takes a certain attitude and a few key concepts to get past the complexity and to work successfully with markup code.

In this presentation, we will try to get you comfortable with the structure and uses of XML.

XML Schemas

XML is a wonderful way to express any sort of content structure you can imagine. But, if you can make up a tag for anything, what keeps XML from getting out of hand? How can you ensure that tags are spelled correctly and that they fall in the right place under only particular parent elements? How can you ensure that only certain attributes and attribute values are allowed? In short, how can you ensure that the wonderful structure that you create can ever be enforced?

This was a major issue for the creators of XML's parent, SGML. They ruled, so to speak, that all SGML documents would follow the structure that was defined in a Document Type Definition (DTD). DTDs list, in exacting detail, all the rules behind a particular set of tags. A DTD is meta markup; it's not the markup itself, but rather markup about markup.

XML schemas have mostly replaced DTDs as the major way that meta markup is defined. In this presentation, we will go over the rules and visual symbols of schemas so you can work with them in the rest of the series.

XML Transforms

XML is a structural representation of content, which is a good thing for managing that content. You need to know the structure of content to track and manipulate it. Because XML isn't a presentation language, it must be translated into one as it comes time to view it.

There are three basic nonexclusive approaches to adding formatting to XML:

  • You can cheat by putting formatting tags into the XML file.
  • You can write a custom program. It can read the XML file and transform it.
  • You can use Extensible Stylesheet Language Transformations (XSLTs). These are add-ons to XML that transform it into whatever other markup that you want.

In this presentation, we will study the use of XSLT to transform XML into a end-user friendly format.

Readiness Assessments

We cover the concepts and deliverables that let you see how ready your organization is for a particular information system.

A good way to start an information project is by getting a firm feel for what the organization has accomplished so far. Such an assessment gives the project team an immediate, action-oriented task. Go through the whole organization and find out what has been done and what the current assumptions are.

In the process of assessing the current situation, the team will become acquainted with all the players and significant documents. The people with whom you interact get the chance to assess you informally and see that you are ready and interested in what they have to offer. Conversely, you can assess the various organizational contributors and decide what offers are worth following up on. If you do this first job well, you build an enormous amount of brand equity for your project team within the organization and initiate just the relationships needed to continue and complete the project. Of course, the main reason for this task is to uncover a lot of great information to be used in the coming project.

Information Audits

We discuss the concepts and deliverables behind assessing the information situation within an organization.

No organization would allow its financial resources to be managed as haphazardly as we currently manage our information resources. If, as everyone seems to be fond of saying, information is power and information assets have real value, then why shouldn't we audit our information resources and systems with the same rigor that we use to audit our financial resources and systems?

Mandates Part 1

We will discuss the concepts and deliverables that allow you to build organizational consensus behind an idea for an information system.

A typical information management project begins with wide agreement that there is a problem—too much information to manage informally. There is tacit approval from organizational decision makers that some solution must be found, and there are the beginnings of a project team. On the other hand, the project begins with little or no understanding (let alone agreement) about what the solution will be, what parts of the organization will be affected, how long the project will take, or how much it will cost. Without compelling answers to these and many other questions, the project will flounder from the start.

Mandates Part 2

We will discuss the concepts and deliverables that allow you to build organizational consensus behind an idea for an information system.

A typical information management project begins with wide agreement that there is a problem—too much information to manage informally. There is tacit approval from organizational decision makers that some solution must be found, and there are the beginnings of a project team. On the other hand, the project begins with little or no understanding (let alone agreement) about what the solution will be, what parts of the organization will be affected, how long the project will take, or how much it will cost. Without compelling answers to these and many other questions, the project will flounder from the start.

Information Inventories

We discuss the idea and techniques of information inventorying.

Information Architecture Part 1

We discuss some of the tools and methods IA's use to develop an information strategy.

Information Architecture is an umbrella term for the various professionals and processes we use to organize information and make it accessible to consumers. In this part of the course, we will focus on tools IA's use to decide, model, and tag information.

Information Architecture Part 2

We discuss some of the tools and methods IA's use to develop an information strategy.

Information Architecture is an umbrella term for the various professionals and processes we use to organize information.