USING NEMP TO INFORM THE TEACHING OF SCIENTIFIC SKILLS
 

SECTION THREE: THE TASK ANALYSES


This section reports the initial findings from the analysis of the NEMP tapes. The analysis is divided into 3 main sub-sections. The first reports on children’s actions as they were introduced to the tasks and asked to plan their investigation. The second sub-section discusses events at the “carrying out” stage and the third reports on the reflection/discussion stage that completed each task.

PLANNING STAGE
In all 3 tasks, children were asked to plan what they would do before they began the practical part of their investigation. The observation schedules were designed to capture comments about all the key variables that would need to be managed in each task. Those that were actually mentioned by one or more groups of children are reported next.

Truck Track
Fifty-one groups of Year 4 students were observed as they carried out the Truck Track task. It is very evident from the results in Table 5 that there was very little discussion of the management of variables at the planning stage of this task, although children had been asked to “think about things you will need to keep the same”. Just 4 variables were identified at this stage and one group mentioned 2 of these. Those variables that were most commonly discussed were concrete features of the context – things that these younger children could readily observe and anticipate:

Keep the rug straight and the lines lined up.

One person takes the bumps out of the mat before each turn, and then hold it flat so surface is smoother.

Measure from the mat edge to where the truck stops.

Table 5
Management of variables/task discussed by groups of Year 4 children when planning for the
Truck Track task

Item
Frequency of mentions
(N = 51)
Consistent ramp and/or mat set up
1
Accurate measurement of truck travel
2
Consistent set up and release of the truck
1
Total mentions
4

Ball Bounce
Fifty-two groups of Year 8 students completed this task. As Table 6 shows, there were many more instances of explicit discussion of variables amongst these older children than we saw in the Truck Track discussions amongst the Year 4 children. Thirty-three groups were able to identify at least one variable that should be controlled, and one group identified 4 variables. Consistency in release of a ball (31 mentions) is a comparable factor to consistency in release of a truck (1 mention) in the Truck Track task. In this respect, at least, more than half of the Year 8 groups showed an awareness of the need to manage at least one variable in “fair testing”.

Table 6
Management of variables/task discussed by groups of Year 8 children when planning for the Ball Bounce task

Item
Frequency of mentions
(N = 52)
Consistent height from which to drop balls
31
Ruler held straight vertically
5
Strategy for determining the top point of bounce
5
Other strategies for anticipating/accurately measuring bounce
1
Ruler oriented the same way each time
(there are different scales on either side)
1
Checking consistency by replicating
1
Total mentions
44

Emptying Rate
Forty-eight Year 4 groups and 53 Year 8 groups were observed carrying out this task. It was the first task that any one group completed and so may have been intended as a “warm up” for the second task that followed. This second task was Truck Track for Year 4 children, and Ball Bounce for Year 8.

Perhaps because the NEMP designers anticipated that this task would be less familiar to the children than either the Ball Bounce or Truck Track tasks, each group was initially shown a video of the task set-up, and then they were asked to plan their own investigation. They were not specifically asked to think about things to keep the same. Indeed there was little that they could decide in this respect since the task was already tightly defined. Nearly half the groups discussed the requirement to vary the volume of liquid to be tested, although this was more often in the nature of repeating/clarifying task instructions than planning for accuracy by taking specific care with the actual measuring process.

Table 7
Management of variables/task discussed by groups of Year 4 and Year 8 children when planning for the Emptying Rate task

Item
Frequency of mentions
(N = 101)
Aspects of measuring to the unit marks on the bottle
22
Aspects of using stopwatch, measuring time
4
Keeping bottle level
3
Avoiding cross-contamination of water/detergent
1
Totals
30

Nine Year 4 groups identified 1 of these variables and one group identified 2. Fifteen Year 8 groups identified at least 1 variable to be controlled, with 3 of these groups identifying 2 or more variables. Thus at both year levels, the groups who did identify and discuss variables at the planning stage were in the minority.

Children’s seeming lack of planning skills
Children’s planning skills appear to be very context sensitive. As we watched the tapes it seemed to us that 4 factors constrained children’s ability to demonstrate such “fair testing” planning skills as they may actually have had.

1. The type of instructions given
At the beginning of each activity, for all 3 tasks, teachers read a set script of instructions to the students. This is the Emptying Rate script, read out after children watched the video that modelled the inquiry procedure:

…plan how you will do your experiment. Do this now and when you are ready I will ask you to tell me your plan. Remember you need to plan your activity so everyone has a job to do.

This is the Ball Bounce script, read out after the context was introduced, and before students were given the folding ruler with which to measure:

You are to work as a team, and try to make sure that everyone helps. First you should plan how you will do the experiment. Think about what things you will need to keep the same. Think about what you will need to measure. Think about how you will use numbers to say how bouncy each ball is. Sort out who is going to do the measurements and who will do the other jobs. Everyone should have a job.

The introduction to the Truck Track task was very similar. There is an important similarity, and an important difference in the emphasis given by each of these sets of instructions. The difference is that the Ball Bounce and Truck Track tasks gave direct guidance about the fair testing aspects. “What to keep the same” introduces an important aspect of fair testing — the control of variables. Although both age groups exhibited little unprompted awareness of the need to “keep things the same” when talking about the Emptying Rate task, many of the Year 8 groups were able to anticipate at least one way to do so for the Ball Bounce task that they subsequently completed. In contrast, the Year 4 children were able to generate more ideas about variables for Emptying Rate than for Truck Track – perhaps because the Emptying Rate video instructions allowed them to anticipate more of the actions they were about to undertake.

Table 8
The contrast in types of discussion topics raised during the planning conversations of Year 4 and Year 8 children

Year Level and Task
Social (roles)
Management of variables/task
Emptying Rate
(Yr. 4/N=48)
47
11
Truck Track
(Yr. 4/N=51)
45
4
Emptying Rate
(Yr. 8/N=53)
50
19
Ball Bounce
(Yr. 8/N=52)
50
47

All 3 scripts emphasised that each student should have a part to play. Consequently, ideas about “what to keep the same” were often displaced by conversations in which children determined the roles they would play:

I will hold the ruler.

Can I do the stopwatch?

In the light of the scripted emphasis on role allocation, this is perhaps not surprising. As shown in Table 8, the younger children focused on this aspect of their planning almost to the exclusion of actual fair testing planning. Many of these Year 4 children occupied their planning time in playing games such as “Rock, Scissors, Paper” to determine the allocation of their roles.

2. Private and public planning conversations
We gathered some evidence that aspects of the assessment context could contribute to an under-reporting of children’s ability to plan for managing variables in an investigation. During our observations of the planning stage, we recorded a number of instances where children did not report back to the teacher all that they had actually discussed amongst themselves. Typically, the allocation of roles was what children reported at the end of the planning time. Where students had talked amongst themselves about keeping the bottle straight for Emptying Rate, or holding the ruler straight for Ball Bounce, the teacher was likely to be told something along the following lines: John will hold the ruler, Sarah will drop the balls, and Amy will record the results. The children were most likely to report role decisions even when their private discussions had included decision-making about aspects of control of variables.

After detecting this pattern in the first 2 tasks observed, we decided to record instances of private/public ideas for Ball Bounce, the final task to be analysed. In this task, not all students spoke within their group and in some instances it was hard to hear what students said to each other as they spoke quietly or the microphone was placed too far away. Nevertheless, themes of the conversations we heard encompassed all those aspects reported in Table 6 – 47 ideas in all. Teachers were only told about 25 of these 47 ideas, not least because 27 of the 52 teachers we observed in the Ball Bounce task did not ask students to discuss their planning before commencing the task itself.

3. The absence of a meaningful sense of purpose?
One important aspect of the investigative context is the sense of meaning or purpose that children bring to their investigations. Children are typically encouraged to begin their investigations from questions they have generated either individually or as a group. This approach is clearly elaborated in curriculum support materials such as the Making Better Sense of… primary science series produced by New Zealand’s Ministry of Education to support Science in the New Zealand Curriculum. However within a national assessment context such an approach would create issues of comparability and validity. Children “investigate” a task of which they have had no previous ownership. Can they develop a sense of purpose in such contexts? If not, how might this lack of purpose impact on their ability to demonstrate planning skills?

The script for the Emptying Rate task instructed the students to “Do this now [the planning] and when you are ready I will ask you to tell me your plan.” What sense did the children make of the purposes for doing this task, beyond an imperative to do as they had been told? The implicit purpose – to compare the emptying rates of liquids of differing viscosity/density – did not emerge until the “discussion” stage at the end of the experiment.

The task script for both the Truck Track and the Ball Bounce tasks requested children to “Plan your experiment now, and tell me when you have finished your planning.” With this small shift in emphasis, a number of the teachers, especially for Ball Bounce, did not ask the students to report back about their planning before moving on to the carrying out stage. While it may seem that the purposes of the Truck Track and Ball Bounce tasks were self-evident, their theoretical underpinnings were never discussed (and indeed are not easy to determine for Ball Bounce since the balls appeared to vary in a number of their material features). We wondered whether and how children would address this aspect at the planning stage. We found that just one group discussed the purpose of the Truck Track task, 3 groups discussed this for Emptying Rate and 8 groups discussed purposes for Ball Bounce. However the sense of purpose expressed was task-orientated rather than related to a conceptual science idea/question:

We need to measure bounce [of the balls].

In Ball Bounce 3 groups also expressed an opinion about what they thought may happen:

The orange ball [the smallest] will bounce the highest.

The smallest [ball] will bounce the highest.

These are seemingly guesses although it is possible that children think smallness is the property that confers bounce. Because their causal theorising was not probed we cannot know for certain. Whether and how this absence of conceptual links impeded planning is something about which we can only speculate. Indeed the literature reported in Section Four would suggest that in the absence of such links children are not really planning in a scientific sense at all.

4. The necessity to become familiar with the task at hand
We have already noted that children’s task-related planning for Emptying Rate was weighted towards familiarisation with the requirements of the task rather than management of variables for fair testing. While there was more discussion of variables for Ball Bounce, the lack of ownership of purpose, combined with the presentation of a set of sometimes unfamiliar equipment, meant that practicalities often dominated the so-called “planning” time.

After the children had completed the task, ideas related to its management and its purpose did emerge in the group discussions. We wonder if “planning” would be more appropriately assessed if placed at the end of an initial familiarisation task. Since children were asked to make predictions for new situations at the end of all 3 tasks, this would be relatively easy to do, although it would make the task time longer.

Being systematic about sequencing experimental tests
The literature that we read in conjunction with our observations of the NEMP tapes has led us to reflect on why (Section Four) and how (Section Six) teachers should and can improve children’s investigative skills by helping them to visualise the whole “experimental space”. While such visualisation should accompany a coherent sense of purpose, which we have established seemed to be lacking, we nevertheless watched to see whether and how children organised and sequenced the separate test episodes in a pre-planned way.

The Emptying Rate demonstration video and the accompanying instruction card both stated that students must time all 3 water levels before those of the detergent. We wondered if the children would discuss this sequencing issue prior to carrying out the task. They were also requested to pour the water back into the bottle between tests, doubtless for purely practical reasons. The Truck Track task required a series of tests, with 4 ramp positions and 2 truck orientations to be individually tested. Would the Year 4 students discuss how they would systematically keep track of all 8 possible tests? Ball Bounce featured a series of balls ranging in size, with no inherently obvious sequencing criteria. Would students even consider the order in which they should test these and, if so, how would they decide? Table 9 reports the differences between instances of sequencing discussions for the 3 tasks.

Table 9
Instances of sharing planning ideas on how to sequence separate test events within a task

Task/Level
Sequencing mentions
Comment
Emptying Rate Y.4/8 12 Year 4 (N=48)
15 Year 8 (N=53)
Sequencing comments largely a reiteration of task instructions.
Truck Track Y.4 3 (N=51) Mainly comments to do with starting with a different number of corks.
Ball Bounce Y.8 1 (N=52) One group decided to “do small balls first”. Most groups followed sequence on task sheet

Did the more detailed instructions (video/directive to “do water first”) divert children from other possible planning topics when talking about how to carry out the Emptying Rate task? Or was it that this task had a greater number of unfamiliar contextual details to which children felt they needed to attend? Or should we reflect that children moved into the tasks, mostly ignoring sequencing implications, because in the absence of a sense of ownership of the initial question they did not visualise the experimental space beyond “one step at a time”? The literature reported in Section Four suggests that these are not trivial questions. In that section we draw the conclusion that supporting children to visualise the whole planning space may be a very important aspect of actively teaching science investigation skills.

THE CARRYING OUT STAGE
In this part of the report, we discuss the observations we made as children carried out each of the 3 tasks in the manner they had planned. During the preliminary observations it became very evident that children often do more than they say. To capture this tendency we created observation schedules that could distinguish between intuitive and explicit actions. “Intuitive” actions were deemed to be those where children simply carried out an action without any comment being made by anyone in the group (for example straightening up the improvised PEP bottle funnel for the Emptying Rate task). “Explicit” actions were those that were accompanied by a specific explanatory comment:

We need to keep this bottle straight.

As the following results illustrate, we found that actions to do with management of variables were more likely to be intuitive, while those that concerned measurement processes were more typically explicit. These 2 aspects of each task are reported next.

Truck Track task
Management of variables
Table 10 records the numbers of Year 4 groups who took action to manage variables as they carried out their successive Truck Track tests. There was some inconsistency between actions taken in the “truck forward” and “truck backwards” series of tests in a few groups. This pattern has been reported by listing the number of truck forward actions, followed in the same cell by the matching number of truck backward actions. (Usually, students tested the trucks in a forward-facing position first.) While there were 51 groups in total, some groups did not carry out both series of tests, or the camera was positioned such that not all actions could be observed, and so not all data sets are complete. The overall tendency, as noted, was for children to do much more than they can say.

Table 10
Truck Track variables attended to by Year 4 children when testing trucks facing forward and then backwards

Truck forwards/truck backwards
Intuitive
Explicit
Ignored
Consistent positioning of ramp and mat
31 / 30
4 / 3
16 / 17
Ramp straight onto mat
33 / 27
0 / 3
17 / 19
Consistent set up and release of the truck
29 / 25
10 / 11
10 / 12
Total instances
93 / 82
14 / 17
43 / 48

Trucks were intended to roll onto a standard type of mat at the end of the ramp. These mats typically had bumps along their set folds that were hard for the children to remove. While many children intuitively or explicitly devised ways of dealing with this, others chose to ignore the bumps and to continue on with their task. Over half the teams had trucks that ran off the mat onto one or more other types of surface. In one instance the teacher had used masking tape to secure the edge of the mat, so that in a short distance the truck travelled over 3 different surfaces. Groups who attempted to manage this complexity typically positioned the ramp as close to one edge of the mat as possible, to lengthen the run room.

Aspects of measuring
While actions to manage the positioning of the ramp and mat, and the consistent release of the truck, were often intuitively kept the same for both the forward and backward tests of the truck, the students tended to explicitly comment on their actions when measuring was involved. In part, this may be because of the novelty of the measurement tool. Many children had clearly not seen a three-fold carpenter’s rule before and there was much “playing around” with this at the start of the task. As a result some groups did look closely and discovered that there were separate scales for centimetres and millimetres on the opposite edges of the same side of the rule. However, for some groups, this made the ruler difficult to use consistently.

Table 11
Aspects of measuring attended to by Year 4 children during the Truck Track testing stage

Truck forwards/truck backwards
Intuitive
Explicit
Ignored
Accurate measurement of truck travel
5 / 6
31/ 28
11/ 11
Awareness of measuring units e.g., cm/ mm
3 / 3
22 / 22
24 / 23
Total instances
8 / 9
53 / 50
35 / 34

Several aspects of the context made it easier for children to carry out and record measurements in the Truck Track task than in the other 2 tasks. The ruler was positioned horizontally on the ground and the truck had stopped moving before the students measured its travelling distance. In comparison, Ball Bounce required the students to hold the ruler erect and then ascertain the top point of the bounce while the ball was still moving. However the measurement aspect of Truck Track was not without its management challenges. Children needed to decide where to fix their measure of the truck’s position. In most cases they measured from the back of the truck, and if not, they tended to be consistent in the alternative measurement position they had determined. However some groups remained unaware of the necessity to control this aspect of the task, measuring from a random selection of parts of the truck.

The collective travel trajectories of the individual truck runs made for some interesting patterns. Trucks sometimes slowed down when they hit the side of the ramp. On 2 occasions trucks stopped completely at a bump on the mat. They frequently careered sideways, and 21 groups set the ruler up in line with the ramp to attempt to keep the truck on a straight line or path of travel. This again slowed the truck when it hit the ruler. In these cases children typically released the truck again, and moved the ruler sideways slightly in an attempt to prevent further collisions. Some groups improvised various strategies to use the inflexible ruler to measure curved travel trajectories. Some used the lines on the mat to help them align the truck with the ruler, others moved the truck across to the ruler in line with its resting position, and some used paper or fingers as alternative measures. A few groups positioned the ruler in line with the corks at the back of the ramp. This meant that trucks usually outran the length of ruler available and the children had to devise a method for estimating. One team denoted the area after the ruler as “past the mark”, and they recorded this on their result sheet.

Twenty-nine groups repeated test runs so that they could replace “faulty” runs with fresh data. On the other hand, some groups were observed to ignore measurement anomalies altogether. Many groups who did attempt to be consistent would nevertheless have recorded very compromised measurements. We note this here because Section Four cites literature which recommends that children at earlier stages of investigative skills development are likely to recognise meaningful data patterns more easily when the data are relative or categoric rather than absolute or continuous. This task seems well suited to the implementation of a simpler, more visual, form of data gathering. In Section Six we describe a strategy that teachers can easily use to help young children see the data patterns in the trucks’ capricious travel runs. In the light of the literature discussed in Section Four, we think that developing awareness of patterns of data variability is more important than requiring children to make judgments about the “most accurate” measurement.

Ball Bounce task
Management of variables
Many of the Year 8 students continued to discuss their plans for managing the Ball Bounce task at the carrying out stage. Table 12 shows the aspects to which the students attended. Comparing these data with those from the planning stage (Table 6) we see that just 2 additional groups explicitly recognised the need to drop the ball from a consistent height. However, faced with the practicalities of the situation in action, 9 more groups now explicitly identified the orientation of the ruler as an aspect to be managed, and 11 more groups began to discuss strategies for determining the top of the bounce. For these groups, “planning” was facilitated by action. There were more instances of explicit management of variables by these older students but many intuitive actions were still observed, and a number of groups continued to ignore key variables.

Table 12
Comparison of Ball Bounce variables attended to by Year 8 children at the planning and carrying out stages

Action
Planning
Carrying out stage
Explicit
Intuitive
Ignored
Keeping the ruler vertical
5
14
20
18
Consistent height for dropping balls
31
33
12
6
Strategy for determining the top point of the bounce trajectory
5
16
4
32
Ball bounced on same place on surface (to avoid cracks between desks)
-
1
41
10
Total instances
41
64
77
66

Aspects of measuring
Several contextual features made measurement challenging in this task. Table 13 summarises aspects of measuring that were taken into account as the task was carried out. As was the case in Truck Track, aspects of measuring were more likely than management of variables to be explicitly discussed. Again, the challenges presented were strongly linked to the contextual specifics of the investigation.

As already noted, the dependent variable (height of the bounce) posed considerable difficulty because students needed to find a way to “stop” the ball at the very top of its bounce, and the measurement point was fleeting. Most groups recognised that they then had an issue with accuracy and up to 3 students sometimes attempted to read each bounce. However this created new issues when, as often happened, different readings for the same trial resulted. Reading the vertically oriented scale at eye level was seen as one solution to this dilemma by 16 groups, while in other groups children of differing heights took measurements that we could clearly see included errors of parallax. Some groups simply ignored such differences, opting for whichever measurement they thought seemed right. Other groups adopted an “averaging” strategy. Some took the median measurement of their conflicting readings – a crude type of instant averaging process. Some repeated bounces of each ball several times to try and better establish the most commonly occurring measurement – again an attempt at averaging, albeit via a more explicit process. Students were not consistent in these averaging attempts. In many groups really bouncy balls were dropped more times than less bouncy balls.

Seven groups attempted to count the number of times each ball bounced in addition to taking a height measurement. This simultaneous measuring and counting was not easy, particularly with the balls that bounced a lot, or bounced more rapidly towards the end of the bounce, as did the table tennis ball. Since “bounciness” was not defined in the task instructions, this appeared to be a legitimate interpretation of the task that complicated the challenge for these groups. No groups tested the balls in an order that suggested thinking about causes for bounciness. In fact most simply worked their way down the order of balls listed on the Results Sheet

These Year 8 students, like the Year 4 Truck Track groups, found the folding ruler a novelty. Although the ruler had different scales on either side, 29 groups kept the ruler consistently oriented, and if at times the ruler did twist, members in the group would bring this to the attention of the person holding the ruler.

Table 13
Aspects of measuring that were attended to by Year 8 children during the carrying out stage of the Ball Bounce task

Item - Ball Bounce
Intuitive
Explicit
Ignored
Keeping the ruler vertical
20
14
18
Discuss need for accurate measuring
0
43
9
Checking consistency by replicating
12
21
19
Managing data variation – e.g., by use of an averaging strategy
11
13
28
Measures number of times ball bounces
0
7
45
Trials done to decide likely range of the bounce
1
0
51
Total instances
44
98
170

Three groups devised novel scales to measure the ball bounce. Two groups designated readings between 0–100 mm as a “1”, 100–200 mm as a “2”, 300–400 mm as a “3” and so on. The third group devised essentially the same solution, but used centimetres. This creative solution to a tricky set of problems interested us because, in effect, these groups devised a way to collect categoric rather than continuous data, thereby avoiding the vexed accuracy issues with which other groups had to contend. They were still able to compare bounciness, and to quickly run repeat trials to be sure of their emerging data patterns. In Section Six we suggest a modified version of this strategy as an effective means of helping students to recognise data patterns and data variation in “noisy”6 tasks such as this.

Repetition
Typically, a ball bounce was repeated when children perceived there had been an error in the specific test. For example, a ball might hit the hand of the child holding the ruler or the ruler itself, either on its descent, or after the bounce. Some groups also re-tested when they had been unable to determine the height of a bounce in relation to the ruler, for practical reasons already outlined. As already noted, some groups did attempt a type of averaging strategy for repeat bounces. The end result of this repetition was always to obtain a single “best bounce” measurement. The significance of this type of thinking about data variability is explored in Section Four. For now, we note again that, as in the Truck Track task, children could have collected categoric data to make the comparisons required if the task had been structured differently for them.

Emptying Rate task
Management of variables

In this more tightly prescribed and unfamiliar task it was difficult for children to identify variables to control, even once the actual task was underway. This is reflected in Table 14, which reports just 2 variables either explicitly discussed or implicitly managed as the task proceeded. The first number on each case represents children’s actions taken while water flows were timed, the second figure represents actions taken while timing detergent flows. Interestingly, one of the two aspects most consistently “managed” is actually marginally relevant once the test run is underway. While accurate measuring did require that children hold the bottle-funnel level, the tilt of the bottle makes no discernible difference to the rate of flow of this small volume of water.

Table 14
Emptying Rate variables attended to at the carrying out stage

Water test/detergent test
Intuitive
Explicit
Ignored
Consistent level orientation of bottle
92 / 86
8 / 12
1 / 1
Consistent measurement of liquid to be emptied (“up to the mark”)
7 / 4
87 / 88
7 / 7
Total instances
99 / 90
95 / 100
8 / 8

Aspects of measuring
This task required children to carry out a measuring action at both the beginning and the end of the investigation. Most children took explicit care in attempting to measure water/detergent volumes to the pre-specified marks on the improvised bottle funnel. Similarly, most groups explicitly devised a simple protocol to co-ordinate release of the water/detergent flow with starting the stopwatch, and end of the flow with stopping the watch. Perhaps because the stopwatch was another relatively unfamiliar piece of equipment7, children’s discussions typically explicitly differentiated the flow time as measured by seconds, not minutes. Younger children sometimes struggled to determine which of these units of time measurement to employ, otherwise there were no clear Year 4/Year 8 differences in the children’s investigative actions reported here.

Table 15
Aspects of measuring that were attended to during the carrying out stage of the Emptying Rate task

Water test/detergent test
Intuitive
Explicit
Ignored
Protocol for use of stopwatch button
4 / 6
97 / 92
0 / 1
Measuring to the mark/adjusting to the mark?
7 / 4
87 / 88
7 / 7
Measuring units e.g., seconds or minutes
2 / 4 
95 / 94
4 / 1
Total instances
13 / 14
279 / 274
11 / 9

Children tended to repeat a measurement only when they had made an error such as forgetting to reset or start the stopwatch, or when they experienced problems in co-ordinating this with the timing of uncovering the hole in the bottle funnel. No groups at either age repeated measurements for accuracy or “fair testing” purposes. Fifty-eight groups actually chose to ignore errors they had made, seemingly in the interests of task completion: It doesn’t matter, come on. Seventy-seven groups (76 percent) sequenced the water/detergent test series as demonstrated on the video and explained on the instruction cards.

The issue of an endpoint
When to stop measuring was an issue in this task, especially when children moved from the water to the detergent tests. Because the water flowed quickly and consistently, only 3 groups of students discussed ways to define an endpoint, debating between determining when the water had completely passed through the hole when looking down, or when the water stopped dripping. When detergent was tested, the slower rate of flow and increased adhesion to the container sides made dripping a much more obvious feature of the system. Some groups now talked about the thickness of the detergent as the factor that slowed draining rate and about how it stuck to the sides of the bottle. Consequently, 14 groups now recognised that there was an issue with the endpoint. However, as the following quotes show, the children did not necessarily have strategies to deal with this in a controlled way:

It had not stopped.

But it was just bubbles.

It’s still going.

Doesn’t matter. It’s only drips.

Nor did any of these groups backtrack to discuss the same aspect of the water task. Seemingly this was an isolated feature of only this series of 3 detergent trials. This would appear to suggest these children are not working with an overall “fair test” plan in mind, but rather are moving from one test episode to the next. The significance of such a view is discussed in Section Four.

In 37 groups we observed instances of comments that anticipated the results of the trial that was about to be run:

Detergent will take ages.

That [3 cm of water] will take 3 seconds. [And it did!]

Some groups drew on the results from the water series to predict draining times for the detergent series:

It will take double the time as it [detergent] is twice as thick.

It’s going to take ages, squeeze hard, it must be two times slower than water.

Some teams felt that making a correct prediction indicated that they understood the purpose of the task:

Detergent will take longer as it is thicker than water. I think I am getting this.

We saw flip-flopping predictions in one group who were surprised that the detergent initially came out of the bottle more quickly than they had thought it would. Their initial prediction: “It will go slower” became: “No faster, no slower. It will stick to the sides.”

DISCUSSION AND REFLECTION STAGE
At the end of each task, teachers asked questions that challenged children to think about the “patterns” that emerged from the task they had just carried out – that is, to make simple inductive generalisations. In all 3 tasks they then asked the children to make a prediction about a novel aspect of the situation (emptying tomato sauce, testing a “5th cork” height ramp, and bouncing a squash ball respectively). The intention seemed to be to probe for each group’s ability to transfer their generalisations to similar contexts. The youngest children were also asked what they could have done to make their Truck Track tests “more accurate”. Since many of these children did not know what “accurate” meant, teachers often clarified at this point. However this word did not necessarily have the same meaning for the teachers themselves. Some defined accurate as “exact”, “perfect”, or “more correct” while some emphasised “so it is fair” or “a fairer test”. Presumably the teacher who said, “so you don’t cheat” also intended to emphasise the fair testing aspect.

Truck Track discussion
Table 16 shows that almost all groups could interpret their results to see a pattern. Most described a qualitative relationship between ramp height and the distance travelled by the truck. Only a few groups made reference to the quantitative data they had so laboriously collected as they struggled with the measuring demands that have been described above.

Table 16
Year 4 children’s interpretation of the Truck Track task

Recognising Patterns and Trends
Frequency (n)
Patterns in words only
38
Pattern refers to comparison of numbers/measurements
11

Reflecting on accuracy
Table 17 summarises children’s responses to the question: “Could you have done anything to make your results more accurate?”

Table 17
Year 4 children’s reflective ideas for making Truck Track more accurate

How they would make it more accurate
Frequency (n)
Better management of truck release and/or uneven mat surface
12
Measuring: accuracy and/or placement of ruler
10
Contextual comment on ramp arrangement: longer, straighter
10
Confounding factors: suggest changes to make test less “fair”
5
Replication to “double check”
1
Social: sharing, having turns
1
Total no of responses
39

Comments relating to better management of variables and/or more accurate data gathering obviously addressed the question of improving accuracy directly. However some of the comments the children made focused on the context of the investigation:

Use a bigger ruler.

Make the ruler more straight.

While they related to the overall experimental situation, their implementation would not necessarily have resulted in making the test any “fairer” or the data recording any more accurate. Some children, struggling to find something to say about the situation, described contextual changes that could make the test less fair – the opposite of what was intended. For example, some children suggested pushing the corks further under the ramp to make it steeper, but did not say if this positioning should be controlled to be the same for each test. A few children made suggestions that would most definitely confound the fair test, for example, suggesting blowing or pushing the truck down the ramp. One group, reflecting on the challenge of accurate data gathering, responded:

Measure really, really, carefully.

In view of the measuring challenges outlined above, this would have been a hard ideal to live up to. The comment also appears to imply that with sufficient care it is possible to get one “right” reading. It seems likely that this belief is reinforced when teachers focus on the development of young children’s measuring skills as the key aspect of “scientific” data collection. However this type of thinking is described in the literature as becoming a hindrance to the ultimate development of scientific thinking about ways to manage data variability. We return to this issue in Sections Four and Six.

Making predictions
When the perceived pattern had been described, the children were asked to predict how far the truck would travel both forwards and backwards if there were 5 corks under the ramp. Once they had made their predictions, the children carried out their task to see whether or not they were right. Some groups were helped to do this with teacher support:

Teacher: Do you understand what I am asking you? Ok, I will show you again — so with one cork have it [the ramp] very low to the ground, so not steep, but with four corks the slope is a lot higher. So what did you find?
Children remain silent.
Teacher: How does the slope affect how far it [the truck] goes?
She uses the cork to show what she means as she asks the question.
Children: When it was on the higher slope it went further. When it was backwards it went faster.
Teacher: I wonder why? Any ideas?
Children: Heavier. More weight in front. We couldn’t push it as it would go faster. So we didn’t because it is cheating.
Teacher: You mean not fair?
Children: Yes.

Without such support, some other groups remained unable to clearly articulate the thoughts they may have had.

Table 18
Year 4 children’s predictions for the Truck Track task

Prediction Data
Frequency (n=51)
Accurate prediction
47
Incorrect prediction
4
Results used to justify prediction
29
Mention of features of context
17

While most groups correctly predicted that the truck should go considerably further when facing backwards down a “5-cork” ramp, some groups had their prediction compromised by obstacles or off-track runs. All trucks were slowed on their forward run because, at this steep ramp angle, the front bumper connected briefly with the ground. Backward runs were not affected because there was no back bumper. One group who initially made an incorrect prediction reasoned that coming off the steep ramp the truck “would hit the ground and not go anywhere”. This did in fact happen when this group released the truck to go forwards.

Some teachers drew a range of ideas from the children as they shaped their prediction via a group discussion. As Table 18 illustrates, just over half of the groups used their data pattern to justify their prediction. Seventeen groups discussed their prediction in relation to features of the context such as the slope of the ramp and the distribution of weight in the truck:

The truck will go further with 5 corks because the ramp is steeper.

The ramp is steeper so when it hits the bottom it hits the front bumper so the truck slows down.

The truck has 4 wheels on the back and the front has 2. The back of the truck is lighter than the front.

It has more weight on that side [the back of the truck] (prediction that it would go further when released backwards).

Some groups also introduced contextual knowledge from other sources at this point. They related the truck patterns to other moving objects such as cars, bikes, and trucks:

Trucks have to put their brakes on when they go down a hill because they go fast.

Unsurprisingly, since the investigation began from a “ready-made” question, no groups related their contextual knowledge (slope, “weight”) to conceptual ideas of cause and effect (i.e. mass, gravity). At no stage did children plan or carry out this activity to test a causal theory of their own, or a scientist’s theory that they had discussed in advance. In this sense, it could be argued that they were not actually given an opportunity to plan “scientifically” at all. This lack of a theoretical component became a significant issue in the Ball Bounce task, as outlined next.

Ball Bounce discussion
At the discussion stage 3 groups noticed they had made a mistake with their task, or had not tested all of the balls. These groups started again, and this time paid more careful attention to control of variables and taking measurements. However only one of the groups attempted to use a strategy for determining the point of bounce of the ball — clearly a very challenging aspect of this task. As one group said:

They [the balls] bounce high and fast so we can’t get an exact measurement.

Two other groups identified that they had also made mistakes in their technique that could explain their results. One group said that:

It depends on where the ball was bounced on the desk. When it is on a join it does not go very well.

Another group thought they had measured incorrectly, but they declined the teacher’s invitation to re-do the task.

Despite these difficulties, many groups were able to describe patterns with respect to bounciness. Qualitative descriptions were again favoured by a majority of groups. The data in Table 19 are incomplete because some teachers did not ask about patterns of results as the basis for predictions.

Table 19
Children’s interpretation of the Ball Bounce task

Recognising Patterns and Trends
Frequency (n)
Patterns in words only
25
Pattern refers to comparison of numbers/measurements
15

Qualitative comments often made reference to personal theories of causality. The discussions of these Year 8 children, like those of the Year 4 children, focused on observable features of the context:

The least is the tennis ball, so the biggest was the less bouncy.

Small balls bounce more than heavier balls.

Small ones bounce the highest. The smaller the ball the bigger the bounce, and softness of the ball does not bounce much.

A difficulty with this contextual approach is immediately apparent. There were several material characteristics that could be implicated in bounce and the children often conflated several of these into one personal theory. Section Four identifies the understanding that there can be interactions between variables as a significant developmental step in learning to investigate scientifically. However these children were not given an opportunity to disentangle their multiple theories of causality, nor to discuss testing strategies to see which really were making the difference. It is highly likely that several of these variables do indeed interact but we do not know for certain which and how. Nor, we suspect, did the teachers have resolved and coherent ideas about this.

The variables that were collectively implicated in bounciness are listed in Table 20. While size and weight were most often mentioned, some groups suggested that solid balls would bounce better than those that had air in them or were hollow because they would not compress or lose their shape when they hit a surface. Rubber was seen as a bouncier material than foam. This posed challenges at the prediction stage because the small, heavy, rubber squash ball presented a combination of variables that generated conflict amongst common personal theories of causality.

Table 20
Children’s personal theories of the causes of bounce

Cause for the difference in bounce
Frequency (n)
Size
21
Weight
16
Material composition
10
Springiness (elasticity when squeezing the ball)
9
Solid or hollow
6
Other
16

Via extensive studies of children’s science investigations, researchers in the UK have identified and described 6 broad types of investigations (Watson, Goldsworthy and Wood-Robinson 1999). “Fair testing” is one of these. Another is “Exploring”. It seems to us that this rich Ball Bounce context presents great exploratory material. The identification of all the various possible causes of bounciness, and clear clarification of personal theories concerning these, would be an engaging focus for an investigation in its own right. Only once that stage has been carried out would children be ready to shape fair tests to begin to differentiate amongst the many possible combinations of variables. As the literature in Section Four makes clear, this process presents many intellectual challenges, but also rich opportunities to teach in ways that could help children actively learn to think more conceptually and metacognitively about what investigation actually entails. We return to the challenge in Section Seven.

Making predictions
After the discussion of their results (if this actually took place) students were shown a squash ball and asked to predict how high it would bounce. They were able to touch the ball but not drop it during this discussion. Fifteen groups immediately recognised the ball and drew their prediction from their contextual knowledge of the game:

Squash balls don’t really bounce, you have to hit it hard.

The squash ball should be warmed up for it to bounce high.

One teacher made similar links after the group had tested their predictions and found they were wrong:

In squash, you have to hit the ball hard for it to bounce. (For this task, this was the only teacher reference to contextual knowledge that we observed.)

Students often referred to the previous tests to place the squash ball in order of bounciness with the other balls. Some gave numerical estimates as well as reasons for their predictions:

Student: Not very high — about 80–90 mm.
Teacher: Why?
Student: Not very high because it’s quite heavy.

The students were confronted with a dilemma at this stage. Depending on the personal theories they had espoused during their reflection on the results, it was possible to justify both “high” and “low” predictions. Those groups who attended to the weight of the ball tended to correctly predict a lower bounce. Those who focused on the material composition (rubber) or the hardness of the ball typically predicted a higher bounce. Those who selected squashiness as the influencing property sometimes said this made balls more bouncy and sometimes said it made them less bouncy.

It would bounce higher because the ball can be squashed and bounce back to its shape.

Won’t bounce very high as it is soft and squashy.

After making their predictions, most groups tested these with one drop only and many did not attempt to take a measurement as the ball bounced so little. Two groups identified possible errors in their technique as an explanation for the unexpected result. One of these groups felt their measuring was at fault and that they had used the ruler incorrectly, while the other group said the squash ball landed on a crack and they re-did their test. However we noticed that this stage was usually rushed. Once the prediction was tested the task was over, no matter how astonished the children seemed if they had predicted incorrectly. Thus the conceptual dilemmas we have just described were never addressed.

Emptying Rate discussion
At the completion of the Emptying Rate task, the teacher made the scripted response: “Now I would like you to tell me what you found out.” At this point 84 groups could successfully describe a pattern in words. Some also attempted to identify quantitative patterns, saying for example that the detergent took “double the time of water”. Some groups simply read their data aloud without attempting to extract a pattern at all.

As part of this reporting back, some groups also gave reasons for the patterns they described:

It was the pressure and weight of the detergent.

The detergent is heavier than water.

Water went faster because it was not as thick as detergent.

Some groups recognised that the size of the hole was implicated in the draining time. Did they register that this key variable had been controlled for them in the provision of the ready-to-use equipment?

If the hole was bigger, it [the detergent] would have gone faster.

This comment seems to suggest that this group only considered this variable in relation to the detergent tests, although in the absence of any clarification via teacher probing we cannot be sure. The next comment also suggests that other groups were not thinking about the whole “experimental space” of the entire test series:

The bubbles from the detergent clog the hole. So we need time for the bubbles to pop and turn into liquid before they go through.

On the other hand some groups seemed surprised at the speed with which the detergent had drained. Here, as in the tasks they were to do next, personal theories of cause and effect were inherent in the discussion. Teachers did not probe or elaborate on these, perhaps because this was the opening task. They had scripts to follow and processes to complete.

Reflecting on accuracy
During the discussion of their results, 5 groups identified an error that they believed had affected their results. For some the issue was determining an endpoint:

It depends on when we stop it.

Stopped the detergent too early.

One group then re-did the detergent tests at the 6 and 9 cm marks, this time correcting for an endpoint. Others recognised errors with the use of the stopwatch, starting it too early and/or stopping it a bit too late at times. One group simply said:

Did it wrong, we didn’t have a plan.

Now, belatedly, some groups recognised why the order of the tests had been so carefully prescribed:

You should tip the water out so it won’t mix [with the detergent].

At this point, one group which was alternating between the water and the detergent realised that they had to wash out their container between tests.

Making predictions
The children were next asked to use their results to make a prediction about whether tomato sauce would be faster or slower at emptying from bottle-funnel. Unlike the tasks that were to follow, students did not test tomato sauce after they made their predictions, nor was tomato sauce available for observation. Fortunately, most groups could draw on a rich contextual knowledge of the relevant material properties of tomato sauce. They combined this knowledge with their test findings, and their personal theories about the cause of those findings, in a number of interesting ways. Table 21 shows that 5 groups confused units of time when making their predictions, but most groups at both year levels were able to predict correctly.

Table 21
Patterns in predictions made for the Emptying Rate task

Prediction Data
Frequency
Year 4
(n=48)
Frequency
Year 8
(n=53)
Word prediction accurate with appropriate time estimate
39
46
Word prediction accurate but estimate of less seconds/minutes
2
3
Incorrect prediction
7
4

Most students predicted correctly that the tomato sauce would take longer to drain:

Tomato sauce would take twice as long as the detergent.

Material properties mentioned to justify this prediction can be grouped in 3 main categories, although students used a wide variety of adjectives for 2 of these:

• Texture (thick, fat, like a jelly, slimy, sloppy, chunky, has blobs in it)
• Viscosity (sticky, less water, more liquidy)
• Weight (heavier)

Some groups who predicted correctly revealed interesting conceptions about the nature of the material world:

It [detergent] is mixed with chemicals so it takes a longer time and is thicker.

There was a sense in a number of the comments made that “the more 'chemicals' a material contains, the heavier it will be”.

One Year 8 girl rehearsed a helpful “thought experiment” to visualise the comparative viscosity of detergent and tomato sauce:

If you pour detergent on your plate it would run all over it so it [tomato sauce] is thicker.

Five groups correctly predicted that the tomato sauce would take longer to drain, but got confused over units for the measurement of time and so specified, for example, 10 seconds instead of 10 minutes.

Some incorrect predictions also involved interesting reasoning:

It [tomato sauce] has no bubbles in it, so no air in it. No bubbles means it can come out faster than detergent.

Other incorrect predictions seemed to be made to avoid a clear decision:

Between the two.

A SHORT COMMENT ON TEACHER ACTIONS
Although teachers are trained to be NEMP facilitators, we saw differences in approach that seemed to us to impact on the progress of the children’s actions and thoughts. While the focus of their work was on assessing children’s abilities with respect to the planned tasks, it became very evident that some teachers’ actions supported children to reveal what they could do and/or did know, and that not all groups got that type of support.

Table 22
Patterns in teacher interactions with groups/tasks

Category
Emptying Rate
(n=101)
Truck Track
(n=51)
Ball Bounce
(n=52)
Interruptions to re-emphasise instructions
26
5
5
Focus of the discussion is on results
28
24
10
Focus of discussion is on the meaning of purpose
3
3
0
Focus is on language (both difficult words, e.g. accurate, and science words, e.g. viscosity)
1
0
1
Teacher makes links to other relevant contextual knowledge
8
0
1
Extra variables introduced by the teacher
No
Yes
Yes

We have already noted that the preliminary planning discussion emphasised roles for children and that this was a scripted feature of each task. Perhaps because it was the first task, and because the method was clearly specified in advance, teachers made many more interruptions to re-emphasise method in the Emptying Rate task. Some teachers used hand gestures and/or demonstration runs to give both visual and verbal instructions. This was clearly helpful for some more hesitant groups. Some asked children to demonstrate the use of the stopwatch, even when the children had said they could do this. It was apparent that in fact, many did need help with this.

In all 3 tasks teacher-student discussions were seldom focused on purposes of the tasks, nor on the meaning children drew from their results. Table 22 also shows that a focus on the language demands inherent in the tasks (whether less familiar everyday words, or the specialist language of scientific terms) was almost never a feature of the teachers’ task facilitation. Teachers who asked probing questions “Why do you think that?” or “I wonder why…” drew more responses from children.

Some teachers did make links to children’s contextual knowledge, especially in the Emptying Rate task when predictions about tomato sauce were to be made:

We put it [tomato sauce] on our chips.

What we put onto our hamburgers.

Some children did indicate that this allowed them to compare the substances mentally:

Wattie’s tomato sauce. Oh yeah, it would be thicker.

Some teachers introduced extra variables by the manner in which they facilitated the task. This was particularly an issue for the Truck Track task. As already noted, the mat often had semi-permanent bumps or creases because it was folded in the same way after the conclusion of each testing episode. The ramp was not always set up in relation to the mat as directed by the photograph provided. However some teachers actually had the foresight to modify this arrangement to maximise the mat distance for the truck’s longer runs. Sometimes the mat was set up over a join in two or more tables so that trucks hitting this bump stopped or slowed down. In some cases a teacher moved the students to the floor when this happened. Other teachers ignored this issue. Ten groups of Ball Bounce students also had to contend with cracks between desks where the experimental space was set up over 2 or more tables.


6. By this we mean the number of distractions presented by the busy context and the problematic data, not the physical noise of the classroom setting.

7. Some teachers modelled the use of the stopwatch and some did not.

 

prev page / next page

top of page    |    return to Probe Studies - INDEX   |    return to Probe Studies menu
  For further information and contact details for the Author    |    Contact USEE