### Participants

The seventh-grade students from two intact classes from a suburban public English speaking high school in Quebec, Canada^{Footnote 1} were initially asked to participate in the study. Twenty-four students from the first class and 28 students from the second class were invited. The final sample consisted of 20 students from the first class (hereafter referred to as the Intervention First group) and 24 students from the second class (the Intervention Second group). Of the eight students who were excluded, one was transferred to another class, two did not return the consent form, one did not have parental consent, and the remaining four students did not complete all of the assessments.

We did not have access to the exact ages of the students, but all participants were at the government-mandated age for the seventh graders, that is, from 11 years and 11 months to 12 years and 11 months. All students were part of the French Immersion program at the high school, but all mathematics instruction was in English. In the Intervention First group, there were 12 males and 8 females, and in the Intervention Second group there were 15 males and 9 females.

The high school was composed of middle- to high-income families, and was rated on the higher end of the school board’s socio-economic index, measured by family income levels and mother’s education [29]. Both classes followed the identical seventh-grade mathematics curriculum, as mandated by the Quebec Education Plan of the Ministère de l’éducation, enseignement supérieur, et recherche [30]. *Canadian Mathematics 7* [35] was used as the textbook for the delivery of the curriculum.

### Design

We used a multiple baseline design, presented in Fig. 1. A paper-and-pencil test of prior mathematics knowledge (PK) was first administered to ensure that no differences existed between the two classes. The next day, at Time 1, two assessments, the equivalence test (ET) and relational thinking test (RTT), were administered, in that order, to both classes prior to any intervention. The following day, the mental mathematics (MM) intervention began in the Intervention First group. The MM instructional sessions took place in the first 20 min of each of 15 mathematics classes over a 4-week period. During these 4 weeks, the students in the Intervention Second group did not receive any MM instructional sessions, but rather received the regular seventh-grade curriculum. The day after the MM intervention was completed in the Intervention First group, isomorphic versions on the ET and RTT assessments were administered at Time 2 to both classes. The following day, the MM intervention began in the Intervention Second group using the same procedures, while the Intervention First group received the regular curriculum. The day after the second delivery of the intervention, isomorphic versions of the ET and RTT assessments were then administered a third time, at Time 3, to all students in both classes.

### Measures

We administered three assessments: the prior knowledge test (PK), the equivalence test (ET), and the relational thinking test (RTT). The PK test was designed to assess the students’ procedural knowledge in arithmetic and their skills working with exponents, converting decimals to fractions, and ordering rational numbers. The ET was designed to measure the students’ understanding of the equal sign (based on Sherman and Bisanz [39] and used by Falkner et al. [11]; and Perry [36], with older students), and the RTT was designed to assess students’ ability to determine the truth value of equations using relational thinking. Students who were absent completed their assessments immediately upon their return, up to three days after the official assessment day, provided their class intervention had not yet begun.

At the start of each day of testing (day 1: PK; day 2: ET and RTT), the teacher indicated to the students that they would be completing tests. She told them that the assessment(s) would not count in terms of their mathematics course grade, but that they should complete the tests to the best of their ability and to take the exercise seriously. No feedback or clarification was provided to any student at any time during the assessments, and the students completed the tests independently.

#### Prior knowledge test (PK)

The PK was a paper-and-pencil multiple-choice measure consisting of 16 items, and was used to assess the students’ knowledge of the sixth-grade mathematics curriculum. The assessment consisted of procedural knowledge items because they were similar to the types of mental computation skills required of the students during the intervention. Specifically, the assessment comprised two whole number computations requiring knowledge of the order of operations (e.g., 14 + 3 · 8), six decimal number (e.g., .46 + .72 = __) and two fractions (e.g., 5/8 + 5/12 = __) computation questions, two exponent questions (e.g., 24 = __), two questions requiring conversion from decimals to fractions (e.g., Which of the following decimals is equivalent to 6/25?), and two questions asking students to select the greatest number from a set of four rational numbers. All computation questions were presented horizontally, and the students were required to circle their response from a list of four choices.

At the start of the PK test, the teacher delivered a paper copy to each student, placing it face down on his or her desk. Once every student had a test, the students were asked to turn them over, and the teacher stated that the class had 30 min to complete the PK. The students were permitted to use the margins for computations and other written work, but the teacher told them that only their multiple-choice selections would be graded. The teacher also instructed the students to look over their answers once they were finished, and to remain silent for the duration of the assessment. Calculators were not permitted. After the 30 min were completed, students were asked to turn over their papers, and the teacher collected them.

Correct answers received 1 point and incorrect answers 0 points. The points were summed to obtain a total PK score, which was out of a possible 16 points. The scores were then converted to percent. Cronbach’s alpha reliability estimate for the PK was .67.

#### Equivalence test (ET)

The ET is a paper-and-pencil test created by Sherman and Bisanz [39] designed to assess students’ interpretation of the equal sign symbol. The test consisted of 29 problems, including 9 canonical and 20 noncanonical problems, and each contained only single-digit numbers. The canonical items involved addition and subtraction and the noncanonical problems involved only addition. Examples of noncanonical problems were: 7 + 8 = 6 + __, 7 + 3 = 7 + __, 4 + 7 = 7 + __. A different isomorphic version of the ET was used at each time point. Each version contained the same number of each type of problem (i.e., four identity, *a* + *b* = *a* + __ or *a* + *b* = __ + *b*; four commutativity, *a* + *b* = *b* + __ or *a* + *b* = __ + *a*; eight part–whole, *a* + *b* = *c* + __ or *a* + *b* + *c* = __ + *d*; and four combination, *a* + *b* + *c* = *a* + __ or *a* + *b* + *c* = __ + *a*; see Sherman and Bisanz [39], for a description of these problem types). The numbers in the problems and the order of the problem types varied in each version.

At the start of testing, the teacher delivered the ET to each student, placing it face down on his or her desk. Once every student had a test, the teacher went through the instructions orally in front of the class. Specifically, she stated that the class had 15 min to complete the ET, and that they were to complete the assessment by writing down their answer on a blank line provided in each equation. The teacher also instructed the students to look over their answers once they were finished, and to remain silent for the duration of the assessment. After the 15 min were completed, students were asked to turn over their papers, and the teacher collected them.

Only the responses for the 20 noncanonical problems were used in the analyses. Correct answers received 1 point and incorrect answers 0 points. The points were summed to obtain a total ET score, which was out of a possible 20 points. The scores were then converted to percent. Cronbach’s alpha reliability estimates for the ET was .99 at Time 1, .98 at Time 2, and .98 at Time 3.

#### Relational thinking test (RTT)

The RTT is a paper-and-pencil test based on Osana et al. [34] and Carpenter et al. [8] and designed to assess the degree to which students use relational thinking, as opposed to computation, to judge the truth value of noncanonical equations. There were four items on the RTT, one for each operation (addition, subtraction, multiplication, and division) and each item consisted of a number sentence, such as 228 ÷ 6 = 456 ÷ 12. The students were asked to indicate whether the sentence was true or false by circling the word “true” or “false” on the test paper. The students were then asked to provide a written justification for their responses in a blank space provided on the test. Examples of RTT items are: 65 + 36 = 67 + 38, 105 − 45 = 106 − 46, 228 ÷ 6 = 456 ÷ 12, and 29 × 52 = 28 × 53.

Three isomorphic versions of this assessment were administered for the repeated administrations across the study. In each version, all four operations were used, but the order of the items varied across versions. The numbers used in each version were also different, but the structure of the numerical relationships across versions remained the same for each operation (e.g., Version 1: 67 + 48 = 65 + 46; Version 2: 55 + 36 = 53 + 34; Version 3: 73 + 57 = 71 + 55).

Immediately following the collection of the ET, the teacher delivered the RTT to each student. The students were given 20 min to complete the RTT because they were required to provide written justifications for their answers. Again, the teacher reviewed the instructions orally with the class. She instructed the students to indicate in each question if the number sentence was true or false and to justify their answers in the space provided. The students were also told that they were not permitted to communicate during the test. After 20 min, students were asked to turn over their tests, and the teacher collected them.

Only the students’ written justifications were coded using the following rubric: (a) Category 1: Relational thinking without computation or with computation only as a means to justify a written relational response; (b) Category 2: Computational; (c) Category 3: Other. Examples of students’ responses for each category can be found in Fig. 2. Responses that were placed in Category 1 demonstrated that the student engaged in relational thinking by considering the relationship between the numbers without computing the quantities on both sides of the equal sign to determine the truth value of the equation. Computation in this category was permitted only if the student had first justified the response relationally and only if the computation was used to support or illustrate the relational response. Students’ responses that were placed in Category 2 demonstrated that the student had an understanding of the equal sign symbol and that they were able to determine if the response was true or false with the use of computation only. Category 3 responses were those where the student either had an operator view of the equal sign, did not supply any justification, or provided responses that were impossible to interpret.

Category 1 responses received 2 points, Category 2 responses received 1 point, and Category 3 responses received 0 points. Category 1 responses were awarded more points than Category 2 and 3 responses because they indicated that the students responded relationally and did not need to compute to arrive at their answer. Category 2 responses were considered relational as opposed to operational, because the students appeared to understand the meaning of the equal sign, but used computation to justify their responses. Category 3 responses received 0 points because they contained no evidence of relational thinking or an understanding of the equal sign.

Item scores were summed for a total RTT score that ranged from 0 to 8. Total scores were then converted to percents. A random sample of 20% of the responses was coded by a second rater, and inter-rater reliability of 90% agreement was achieved. Cronbach’s alpha reliability estimates for the RTT was .83 at Time 1, .84 at Time 2, and .80 at Time 3.

### Mental mathematics intervention

#### Intervention tasks

Before the study began, we created 26 sets of expressions for the mental mathematics intervention. Each set contained four expressions, one for each operation. The first set, for example, contained the expressions 62 + 38; 73 − 31; 21 × 9; and 225 ÷ 25. The second set of expressions contained the same four operations, but presented in a different order, such as 77 − 26; 17 × 5; 600 ÷ 4; and 42 + 58. Each subsequent set contained all four operations in a different order from the preceding set. This ensured that the discussion time for each operation over the 15-day intervention was as similar as possible within and across classes. The teacher began the intervention in each class by starting with Set 1 and continued across the 15 sessions through as many of the 26 sets as possible. In each mental mathematics session, she continued through the sets until 20 min were completed. During the next session, she picked up where she had left off in the previous session and again continued through the sets in the specified order. In any given session, between six and seven expressions were computed mentally and discussed. Across the 15-day intervention, 92 mental mathematics expressions were used for the Intervention First group, which included 23 addition problems, 22 subtraction problems, 23 multiplication problems, and 24 division problems. For the Intervention Second group, 94 mental mathematics expressions were used, which included 24 addition problems, 23, subtraction problems, 24 division problems, and 23 multiplication problems.

#### Instructional procedure

The intervention for both classes was delivered by the first author, who was the students’ regular mathematics teacher. At the beginning of each MM instructional session, the students were seated at their own desks with their own individual white board and dry erase marker. No other materials were provided. The teacher began the session by writing a mathematical expression on the white board (e.g., 37 + 58) and then gave the students 30 s to compute the answer mentally. The students were instructed to remain silent during the 30 s and not to write anything down but their final answer on their white boards. Once the 30-s time period was up, the teacher asked the students to hold up their white boards so that she could view the answers.

After raising their white boards, the students then shared their strategies with their peers during whole-class discussions, with guidance from the teacher. First, the teacher asked a student with an incorrect answer to describe how he or she had reached the answer. The teacher then asked a student who had computed a correct answer to describe his or her strategy. The discussion centered on both incorrect and correct responses, with students discussing the merits of one strategy relative to another. The discussion for each expression lasted no more than 4 min, during which time the equal sign was never displayed. The teacher then erased the mathematical expression from the board, and the session continued with the next mathematical expression.

During the discussions, the teacher pointed out how the students rearranged, transformed, and substituted numbers to make the computations easier to compute mentally. In addition, she pointed out that certain strategies were particularly suitable for specific operations. For example, she illustrated how dividing large numbers by factors of the divisor made it easier to divide mentally, and she also encouraged students to estimate what their answer should be before they used any strategy. In addition, the teacher highlighted applications of the fundamental properties of arithmetic without directly naming them. For example, while discussing 22 × 6, the teacher underscored how the distributive property was used in one of the student’s strategies. Specifically, she explained how the 6 was substituted for 4 + 2, yielding 22 × (4 + 2), which then resulted in the sum of two products, 22 × 4 and 22 × 2, yielding 88 + 44. The latter computation was also discussed; the student had transformed it into the equivalent expression of 80 + 40 + 8 + 4, ultimately leading to 132 as a final answer.