Theoretical Framework

The most important theoretical point here is the identification of conditions under which a capacity limit can be observed (see Section 1); reasons for this limit also are proposed. The theoretical model in this section provides a logical way to understand the empirical results presented in Section 3. A fuller analytic treatment, consideration of unresolved issues, and comparison with other approaches is provided in Section 4.

The basic assumptions of the present theoretical framework are (1) that the focus of attention is capacity-limited, (2) that the limit in this focus averages about four chunks in normal adult humans, (3) that no other mental faculties are capacity-limited, although some are limited by time and susceptibility to interference, and (4) that any information that is deliberately recalled, whether from a recent stimulus or from long-term memory, is restricted to this limit in the focus of attention. This last assumption depends on the related premise, from Baars (1988) and Cowan (1988, 1995), that only the information in the focus of attention is available to conscious awareness and report. The identification of the focus of attention as the locus of the capacity limit stems largely from a wide variety of research indicating that people cannot optimally perceive or recall multiple stimulus channels at the same time (e.g., Broadbent, 1958; Cowan, 1995), although most of that research does not provide estimates of the number of chunks from each channel that occupy the focus of attention at any moment. There is an additional notion that the focus of attention serves as a global workspace for cognition, as described for example by Cowan (1995, p. 203) as follows:

Attention clearly can be divided among channels, but under the assumption of the unity of conscious awareness, the perceived contents of the attended channels should be somehow integrated or combined. As a simple supporting example, if one is instructed to divide attention between visual and auditory channels, and one perceives the printed word "dog" and the spoken word "cat," there should be no difficulty in determining that the two words are semantically related; stimuli that can be consciously perceived simultaneously can be compared to one another, as awareness serves as a "global workspace" (Baars, 1988).

Cowan (1995) also suggested two other processing limits. Information in a temporarily heightened state of activation, yet not in the current focus of attention, was said to be time-limited. Also, the transfer of this activated information into the focus of attention was said to be rate-limited. Importantly, however, only the focus of attention was assumed to be capacity-limited. This assumption differs from approaches in which there are assumed to be multiple capacity limits (e.g., Wickens, 1984) or perhaps no capacity limit (Meyer & Kieras, 1997).

The assignment of the capacity limit to the focus of attention has parallels in previous work. Schneider and Detweiler (1987) proposed a model with multiple storage buffers (visual, auditory, speech, lexical, semantic, motor, mood, and context) and a central control module. They then suggested (p. 80) that the control module limited the memory that could be used: "50 semantic modules might exist, each specializing in a given class of words, e.g., for categories such as animals or vehicles. Nevertheless, if the controller can remember only the four most active buffers, the number of active semantic buffers would be effectively only four buffers, regardless of the total number of modules...Based on our interpretations of empirical literature, the number of active semantic buffers seems to be in the range of three to four elements."

The present analysis, based on Cowan (1988, 1995), basically agrees with Schneider and Detweiler, though with some differences in detail. First, it should be specified that the elements limited to four are chunks. (Schneider and Detweiler probably agreed with this, though it was unclear from what was written). Second, the justification for the particular modules selected by Schneider and Detweiler (or by others, such as Baddeley, 1986) is dubious. One can always provide examples of stimuli that do not fit neatly into the modules (e.g., spatial information conveyed through acoustic stimulation). Cowan (1988, 1995) preferred to leave open the taxonomy, partly because it is unknown and partly because there may in fact not be discrete, separate memory buffers. Instead, there could be the activation of multiple types of memory code for any particular stimulus, with myriad possible codes. The same general principles of activation and de-activation might apply across all types of code (e.g., the principle that interference with memory for an item comes from the activation of representations for other items with similar memory codes), making the identification of particular discrete buffers situation-specific and therefore arbitrary. Third, Cowan (1995) suggested that the focus of attention and its neural substrate differ subtly from the controller and its neural substrate, though they usually work together closely. In particular, for reasons beyond the scope of this target article, it would be expected that certain types of frontal lobe damage can impair the controller without much changing the capacity of the focus of attention, whereas certain types of parietal lobe damage would change characteristics of the focus of attention without much changing the controller (see Cowan, 1995). In the present analysis, it is assumed that the capacity limit occurs within the focus of attention, though the control mechanism is limited to the information provided by that focus.

In the next section, so as to keep the theoretical framework separate from the discussion of empirical evidence, I will continue to refer to evidence for a "capacity-limited STM" without reiterating that it is the focus of attention that presumably serves as the basis of this capacity limit. (Other, non-capacity-limited STM mechanisms that may be time-limited contribute to compound STM measures but not to capacity-limited STM.) Given the usual strong distinction between attention and memory (e.g., the absence of memory in the central executive mechanism as discussed by Baddeley, 1986), the suggested equivalence of the focus of attention and the capacity-limited portion of STM may require some getting used to by many readers. With use of the term "capacity-limited STM," the conclusions about capacity limits could still hold even if it were found that the focus of attention is not, after all, the basis of the capacity limit.

A further understanding of the premise that the focus of attention is limited to about 4 chunks requires a discussion of working assumptions including memory retrieval, the role of long-term memory, memory activation, maintenance rehearsal, other mnemonic strategies, scene coherence, and hierarchical shifting of attention. These are discussed in the remainder of Section 2. In Section 3, categories of evidence will be explained in detail. Finally, in Section 4, on the basis of the evidence, the theoretical view will be developed and evaluated more extensively with particular attention to possible reasons for the capacity limits.

2.1. Memory retrieval. It is assumed here that explicit, deliberate memory retrieval within a psychological task (e.g., recall or recognition) requires that the retrieved chunk reside in the focus of attention at the time immediately preceding the response. The basis of this assumption is considerable evidence, beyond the scope of this article, that explicit memory in direct memory tasks such as recognition and recall requires attention to the stimuli at encoding and retrieval, a requirement that does not apply to implicit memory as expressed in indirect memory tasks such as priming and word fragment completion (for a review see Cowan, 1995). Therefore, any information that is deliberately recalled, whether it is information from a recent stimulus or from long-term memory, is subject to the capacity limit of the focus of attention. In most cases within a memory test, information must be recalled from both the stimulus and long-term memory in order for the appropriate units to be entered into the focus of attention. For example, if we attempt to repeat a sentence we do not repeat the acoustic waveform; we determine the known units that correspond to what was said and then attempt to produce those units, subject to the capacity limit.

A key question about retrieval in a particular circumstance is whether anything about the retrieval process makes it impossible to obtain a pure capacity-based STM estimate. A compound STM estimate can result instead if there is a source of information that is temporarily in a highly accessible state, yet outside of the focus of attention. This is particularly true when a subject's task involves the reporting of chunks one at a time, as in most recall tasks. In such a situation, if another mental source is available, the subject does not need to hold all of the to-be-reported information in the focus of attention at one time. In a trivial example, a compound, supplemented digit capacity limit can be observed if the subject is trained to use his or her fingers to hold some of the information during the task (Reisberg, Rappaport, & O'Shaughnessy, 1984). The same is true if there is some internal resource that can be used to supplement the focus of attention.

2.2. The role of long-term memory. Whereas some early notions of chunks may have conceived of them as existing purely in STM, the assumption here is that chunks are formed with the help of associations in long-term memory, although new long-term memory associations can be formed as new chunks are constructed. It appears that people build up data structures in long-term memory that allow a simple concept to evoke many associated facts or concepts in an organized manner (Ericsson & Kintsch, 1995). Therefore, chunks can be more than just a conglomeration of a few items from the stimulus. Gobet and Simon (1996, 1998) found that expert chess players differ from other chess players not in the number of chunks but in the size of these chunks. They consequently invoked the term "template" to refer to large patterns of information that an expert can retain as a single complex chunk, with reference to expert information in long-term memory (see also Richman et al., 1995).

The role of long-term memory is important to keep in mind in understanding the size of chunks. When chunks are formed in the stimulus field on the basis of long-term memory information, there should be no limit to the number of stimulus elements that can make up a chunk. However, if chunks are formed rapidly through new associations that did not exist before the stimuli were presented (another mechanism suggested by McLean & Gregg, 1967), then it is expected that the chunk size will be limited to about four items because all of the items (or old chunks) that will be grouped to form a new, larger chunk must be held in the focus of attention at the same time in order for the new intra-chunk associations to be formed (cf. Baars, 1988; Cowan, 1995). This assumption is meant to account for data on limitations in the number of items per group in recall (e.g., see Section 2.7). It theoretically should be possible to increase existing chunk sizes endlessly, little by little, because each old chunk occupies only 1 slot in the capacity-limited store regardless of its size.

2.3. Memory activation. It is assumed that there is some part of the long-term memory system that is not presently in the focus of attention but is temporarily more accessible to the focus than it ordinarily would be, and can easily be retrieved into that focus if it is needed for successful recall (Cowan, 1988, 1995). This accessible information supplements the pure capacity limit and therefore must be understood if we are to determine that pure capacity limit.

According to Baddeley (1986) and Cowan (1995), when information is activated (by presentation of that information or an associate of it) it stays activated automatically for a short period of time (e.g., 2 to 30 s), decaying from activation unless it is reactivated during that period through additional, related stimulus presentations or thought processes. In Baddeley's account, this temporary activation is in the form of the phonological buffer or the visuospatial sketch pad. As mentioned above, there is some question about the evidence for the existence of that activation-and-decay mechanism. Even if it does not exist, however, there is another route to temporary memory accessibility, described by Cowan et al. (1995) as "virtual short-term memory" and by Ericsson and Kintsch (1995), in more theoretical detail, as "long-term working memory." For the sake of simplicity, this process also will be referred to as activation. Essentially, an item can be tagged in long-term memory as relevant to the current context. For example, the names of fruits might be easier to retrieve from memory when one is standing in a grocery store than when one is standing in a clothing store because different schemas are relevant and different sets of concepts are tagged as relevant in memory. Analogously, if one is recalling a particular list of items, it might be that a certain item from the list is out of the focus of attention at a particular point but nevertheless is temporarily more accessible than it was before the list was presented. For example, if one is buying groceries based on a short list that was not written down, a fruit forgotten from the list might be retrieved with a process resembling the following stream of thought: "I recall that there were three fruits on the list and I already have gotten apples and bananas...what other fruit would I be likely to need?" The data structure in long-term memory then allows retrieval. One difference between this mechanism and the short-term decay and reactivation mechanism is that it is limited by contextual factors rather than by the passage of time.

If there is no such thing as time-based memory decay, the alternative assumption is that long-term working memory underlies phenomena that have been attributed to the phonological buffer and visuospatial sketchpad by Baddeley (1986). In the present article, the issue of whether short-term decay and reactivation exists will not be addressed. Instead, it is enough to establish that information can be made temporarily accessible (i.e., in present terms, active), by one means or another and that this information is the main data base for the focus of attention to draw upon.

2.4. Maintenance rehearsal. In maintenance rehearsal, one thinks of an item over and over and thereby keeps it accessible to the focus of attention (Baddeley, 1986; Cowan, 1995). One way in which this could occur, initially, is that the rehearsal could result in a recirculation of information into the focus of attention, reactivating the information each time. According to Baddeley (1986), the rehearsal loop soon becomes automatic enough so that there is no longer a need for attention. A subject in a digit recall study might, according to this notion, rehearse a sequence such as "2, 4, 3, 8, 5" while using the focus of attention to accomplish other portions of the task, provided that the rehearsal loop contains no more than could be articulated in about 2 s. In support of that notion of automatization, Guttentag (1984) used a secondary probe task to measure the allocation of attention and found that as children matured, less and less attention was devoted to rehearsal while it was ongoing.

It appears from many studies of serial recall with rehearsal-blocking or "articulatory suppression" tasks, in which a meaningless item or short phrase is repeated over and over, that rehearsal is helpful to recall (for a review, see Baddeley, 1986). Maintenance rehearsal could increase the observed memory limit as follows. An individual might recall an 8-item list by rehearsing, say, 5 of the items while holding the other 3 items in the focus of attention. Therefore, maintenance rehearsal must be prevented before pure capacity can be estimated accurately.

2.5. Other mnemonic strategies. With the possible exception of maintenance rehearsal, other well-known mnemonic strategies presumably involve the use of long-term memory. In recoding, information is transformed in a way that can allow improved associations. For example, in remembering two lines of poetry that rhyme, an astute reader may articulate the words covertly so as to strengthen the temporary accessibility of a phonological or articulatory code in addition to whatever lexical code already was strong. This phonological code in turn allows the rhyme association to assist retrieval of activated information into the focus of attention. Another type of recoding is the gathering of items (i.e., chunks corresponding to stimuli as intended by the experimenter) into larger chunks than existed previously. This occurs when an individual becomes aware of the associations between items, such as the fact that the 12-letter string given above could be divided into four 3-letter acronyms. Elaborative rehearsal involves an active search for meaningful associations between items. For example, if the items "fish, brick" were presented consecutively, one might form an image of a dead fish on a brick, which could be retrieved as a single unit rather than two unconnected units. Recoding and elaborative rehearsal are not intended as mutually exclusive mechanisms, but slightly different emphases on how long-term memory information can be of assistance in a task in which memory is required. These, then, are some of the main mechanisms causing compound STM limits to be produced instead of pure capacity-based STM limits.

2.6. Scene coherence. The postulation of a capacity of about 4 chunks appears to be at odds with the earlier finding that one can comprehend only one stream of information at a time (Broadbent, 1958; Cherry, 1953) or the related, phenomenologically-based observation that one can concentrate on only one event at a time. A resolution of this paradox was suggested by Mandler (1985, p. 68) as follows:

The organized (and limited) nature of consciousness is illustrated by the fact that one is never conscious of some half dozen totally unrelated things. In the local park I may be conscious of four children playing hopscotch, or of children and parents interacting, or of some people playing chess; but a conscious content of a child, a chessplayer, a father, and a carriage is unlikely (unless of course they form their own meaningful scenario).

According to this concept, a coherent scene is formed in the focus of attention and that scene can have about 4 separate parts in awareness at any one moment. Although the parts are associated with a common higher-level node, they would be considered separate provided that there are no special associations between them that could make their recall mutually dependent. For example, four spices might be recalled from the spice category in a single retrieval (to the exclusion of other spices), but salt and pepper are directly associated and so they could only count as a single chunk in the focus of attention.

This assumption of a coherent scene has some interesting implications for memory experiments that may not yet have been conducted. Suppose that a subject is presented with a red light, a spoken word, a picture, and a tone in rapid succession. A combination of long-term memory and sensory memory would allow fairly easy recognition of any of these events, yet it is proposed that the events cannot easily be in the focus of attention at the same time. One possible consequence is that it should be very difficult to recall the serial order of these events because they were not connected into a coherent scene. They can be recalled only by a shifting of attention from the sensory memory or the newly formed long-term memory representation of one item to the memory representation of the next item, which does not result in a coherent scene and is not optimal for serial recall.

2.7. Hierarchical shifting of attention. Attentional focus on one coherent scene does not in itself explain how a complex sequence can be recalled. To understand that, one must take into account that the focus of attention can shift from one level of analysis to another. McLean and Gregg (1967, p. 459) described a hierarchical organization of memory in a serial recall task with long lists of consonants: "At the top level of the hierarchy are those cueing features that allow S to get from one chunk to another. At a lower level, within chunks, additional cues enable S to produce the integrated strings that become his overt verbal responses." An example of hierarchical organization was observed by Graesser and Mandler (1978) in a long-term recall task. The assumption underlying this research was that, like perceptual encoding, long-term recall requires a limited-capacity store to operate. It was expected according to this view that items would be recalled in bursts as the limited-capacity store (the focus of attention) was filled with information from long-term memory, recalled, and then filled and recalled again. Studies of the timing of recall have indeed found that retrieval from long-term memory (e.g., recall of all the fruits one can think of) occurs in bursts of about 5 or fewer items (see Broadbent, 1975; Mandler, 1975). Graesser and Mandler (1978, Study 2) had subjects name as many instances of a semantic category as possible in 6 min. They used a mathematical function fit to cumulative number of items recalled to identify plateaus in the response times. These plateaus indicated about 4 items per cluster. They also indicated, however, that there were lengthenings of the inter-cluster interval that defined superclusters. Presumably, the focus of attention shifted back and forth between the supercluster level (at which several subcategories of items are considered) and the cluster level (at which items of a certain subcategory are recalled). An example would be the recall from the fruit category as follows: "apple - banana - orange - pear (some common fruits); grapes - blueberries - strawberries (smaller common fruits); pineapple - mango (exotic fruits); watermelon, canteloupe, honeydew (melons). By shifting the focus to higher and lower levels of organization it is possible to recall many things from a scene. I assume that the capacity limit applies only to items within a single level of analysis, reflecting simultaneous contents of the focus of attention.