First, we make it into a general fact:

**Fact 1:** *Whenever n is a positive real number, 1/(n+1) < 1/n.*

Furthermore, since 1/7 < 1/6, we can subtract each fraction from 1 to see that 6/7 is closer to 1 than 5/6 is; we easily deduce that 5/6 < 6/7 and get a new fact! Another tool for the toolbox of the intuition:

**Fact 2: ***Whenever n is a positive real number, (n-1)/n < n/(n+1).*

Now for a more exciting question:

*Find integers a and b such that*

*5/6 < a/b < 6/7.*

The most common (reflex) reaction is to find the common denominator (42), and to find a fraction between 35/42 and 36/42 (for example, 71/84).

*But let’s make this problem much more interesting* by requiring that you find the fraction a/b with the smallest possible integer denominator. A denominator much smaller than 84 would surely mean a numerator much smaller than 71.

Thus, inspired by our two Facts, (and eager to use them wherever we can) let’s focus on the numerator.

**Warm-up question 1**:* Is it possible that a=b-1?* (can our in-between fraction a/b have a numerator that’s 1 less than its denominator? Recall *5/6 < a/b < 6/7.)*

Fact 2 is very well-suited for assisting with the answer this question! Indeed, for a/b to satisfy both inequalities, b would need to be between 6 and 7. Our first warm-up question was even a bit silly. Undeterred we ask:

**Warm-up question 2**: *Is it possible that a=b-2? (Recall 5/6 < a/b < 6/7.)*

As an example, we consider b=8. One of the inequalities holds, namely 6/8 < 6/7, which we see by using Fact 1 with n=7. However, the left inequality does not hold: 5/6 is not smaller than 6/8. Why? Because 6/8 = 3/4 and 3/4 < 5/6 (by Fact 2).

Let’s try another example, b=9. By way of analogy, we expect 7/9 < 6/7 but we don’t believe the other inequality (i.e. we expect that 5/6 > 7/9). Last time, we reduced a fraction, noticing that 6/8 = 3/4, so that we could use Fact 2. It doesn’t seem that we have such luck here; but we do: 7/9 = 3.5/4.5, and Fact 2 works for all real numbers; we see exactly where 3.5/4.5 fits!

If we want to use Fact 2 on the fraction a/b we would, in the same way, “reduce” it by using half of the original numerator a and half of the original denominator b. The advantage is that Fact 2 now explains everything. In particular, the new denominator b/2 must be strictly between 6 and 7. Since b is an integer, b/2=6.5. Thus, b=13. Our fraction is thus found:

5/6 < 5.5/6.5 < 6/7… Whoops, I mean:

*5/6 < 11/13 < 6/7*.

But is 13 the *smallest* possible (positive integer) denominator? The answer lies with:

**Warm-up question 3**: *Is it possible for a < b-2? (Recall 5/6 < a/b < 6/7.)*

At this point, the question is hardly a “warm-up” but I’m calling it that, since it’s similar to the other warm-up questions.

Another way of asking the same thing: Is 2 the smallest option for *b-a*? We know *b-a* is at least 2 and that 13 is the smallest denominator with *b-a=2*. Could *b-a=3*? At this point the same style of argument works. Thus, we “reduce” the fraction by replacing the numerator a with a new numerator *(1/3)a* and replace the denominator *b* with new denominator *b/3*. Even if the denominator is no longer an integer, the numerator is 1 less than it, so we can still use Fact 2. With *b/3 > 6*, we see that *b > 18*. If *b-a > 3* then *b* would be even bigger. Thus, if we are looking for the smallest denominator, then *b-a=2*.

Now for a similar question, from the 54^{th} William Lowell Putnam Mathematical Competition for Undergraduate students:

**Question from the 54 ^{th} Putnam:**

*Find the smallest positive integer b such that for every integer m with 0<m<1993, there exists an integer a for which m/1993 < a/b < (m+1)/1994.*

It’s not quite the same as the question we’ve been considering, but if we alter the question by fixing m=1992 then it’s fairly close to it!

Indeed, we just finished a miniature version of the m=1992 special case, by discovering that *5/6 < a/b < 6/7* has a solution with *b=6+7*. Thus, we might guess that *b=1993+1994=3987* is the answer for the Putnam Question. It’s suggested by the special case where m=1992. But will this choice of b work for every m?

Let’s return to our smaller question and make the comparison with the Putnam question more exact.

**Small version of Putnam Question:**

*Find the smallest positive integer b such that for every integer m with 0<m<6, there exists an integer a for which m/6 < a/b < (m+1)/7.*

We know from our work that the answer is at least b=13 (to accommodate m=5) and we expect the answer is b=13. To be sure, we need to know that there are integer answers to the following questions:

**Question 1***: 1/6 < ?/13 < 2/7 (corresponds with m=1)*

**Question 2***: 2/6 < ?/13 < 3/7 (m=2)*

**Question 3***: 3/6 < ?/13 < 4/7 (m=3)*

**Question 4***: 4/6 < ?/13 < 5/7 (m=4)*

We’ll focus on Question 1 and the rest will fall into place. We want to know if:

*1/6 < a/13 < 2/7* has an integer solution for *a*. We need a slightly improved version of Fact 1:

**Fact 1 (improved):** *If m > n > 0 then 1/m < 1/n. *(The previous version essentially had m=n+1. We will need m=n+0.5).

We use this by observing that 2/13 = 1/6.5; hence we see that 2/13 is too small (i.e.* a>2*). On the other hand, since 4/13 = 2/6.5, we know that 4/13 is too big and hence *a<4*. (We used the improved version of Fact 1 again). Hence *a=3* is the only possibility. We check if it works:

1/6 < 3/13 < 2/7. This may look more appealing to us (based on our work so far) if we write:

1/6 < 1.5/6.5 < 2/7. Perhaps one more equivalent form:

*2/12 < 3/13 < 4/14.*

This looks very compelling and suggests another fact:

**Fact 3:** *Assume y > x > 0. Then x/y < (x+1)/(y+1).*

This can be very intuitive. If you follow any sports team who wins x of their first y games, and they win their next game, their proportion of games won has surely gone up! The proof is almost as easy. Read it backwards:

**Proof of Fact 3:**

*x<y*

*x+yx<y+yx*

*(1+y)x<y(1+x)*

*x/y < (x+1)/(1+y)*

**End of proof.**

Question 2 is now easy, since Fact 3 yields

* 4/12 < 5/13 < 6/14.*

Question 3 and 4 are almost the same.

Thus, while we are still far away from a formal proof; our made-up “small version” of the Putnam Question is not that hard; the actual Putnam Question works on the same principles (try it!). It’s fun if we’ve been thinking about the ideas. A formal solution and full proof can be quite short and elegant. Three such solutions are given in “The William Lowell Putnam Mathematical Competition 1985-2000” by Kedlaya, Poonen and Vakil. Our above discussion is an attempt to make the problem seem like a natural extension of very elementary ideas.

**Notes:**

- If a, b, c and d are all positive real numbers, and if a/b < c/d then a/b < (a+c)/(b+d) < c/d. This is a popular, more general version of Fact 3 and just as easy to prove. We will explore this further in an upcoming follow-up post, “Fun with Fractions—from elementary arithmetic to the Putnam Competition—the second 1/2.”
- Among many (social) circles, Putnam Questions have a reputation of being too difficult to be useful. My attitude is that on every Putnam Competition, there is at least one question per session where a less experienced problem-solver can play with the ideas and learn a great deal, even without solving it. This may be such a problem.

The meeting is so big, it’s almost too big for some of us shy people. It’s surprisingly easy to get lost in a crowd. Thus, it had been more than 10 years since I had been to one of these mammoth events. To my surprise, I found it friendly and inviting. I knew several people, accidentally ran into a few others (this was wonderful) and met many people as well. The exhibits were excellent and inspiring. The undergraduate poster session had more than 400 posters. The students I talked to about their work were all impressive, even outstanding.

I was there for a purpose, giving a talk at a special session (on this preprint). At this kind of conference, there are always many, many things going on at the same time. It’s possible to go to your little area and not much else. Fortunately, my stay outlasted my special session, so I had lots of opportunities to look around. There are plenary talks and award winning talks, intended for the widest possible audience. But instead of going to these, I deliberately went to small sessions; talks here and there, partly randomly, partly to see what other people in completely different areas talk about, and how they talk about it. By way of analogy, consider a tourist, who can certainly do touristy things, but it’s much more fun to mingle with the local people. The small sessions gave me some nice insights as to what people are doing and how hard everyone is trying. Some of these sessions were close to my interests and others were completely new. This type of exploration is rarely possible at a small, specialized conference. I wish I had talked to more people after their talks. I won’t wait ten years again!

P.S. On the day I returned home, my flight was delayed by a few hours. I decided to just stay at the airport. I had work to do, and no distractions. The airport was crowded (many flights were delayed). To my pleasant surprise, I met three people at the airport who had been at the meetings, including two people who sat down beside me at different times, asking if I was a mathematician–two of them were people whose talks I had attended!

]]>- Consider the sequence, 12, 121, 1211, 12111, 121111, … etc. Are they all composite? Well the first 100 are. Is that enough evidence to be confident that they are all composite? Nope—you can find the first prime number in the sequence yourself, or see this post by Alon Amit, with this and other much better examples.
- Primes at most x are more often congruent to 2 mod 3 (as opposed to 1 mod 3) until x=608, 981, 813, 029. (I.e., more than 600 billion. See page 203 of “A Friendly Introduction to Number Theory” by Silverman, 4
^{th}edition). - Fermat numbers, of the form, 2^(2^i) + 1 were known (by Fermat) to be prime for i=0, 1, 2, 3 and 4. So that’s only 5 numbers. But the fifth one, 65 537 is so big! Perhaps this influenced Fermat’s unfortunate (or perhaps fortunate?) decision to conjecture that they are all prime. Perhaps he even had a guess regarding the significance Gauss would find for his primes 150 or so years later. Well the next (at least) 28 Fermat numbers aren’t prime. Maybe there’s none others? (How embarrassing would that be?) Here is “compelling evidence” by Boklan and Conway that the ONLY Fermat numbers that are actually prime may be the first 5 Fermat numbers. Of course, since the paper of doesn’t prove that there are no more Fermat primes, it’s inclusion here is perhaps ironic. So it is currently conceivable that Fermat’s conjecture was not merely incorrect, but spectacularly so… Naturally it was the great Euler who originally factored the first composite Fermat number; 2^32+1=641×6700417.
- However, even the great Euler had his own, almost maximally incorrect conjecture. Indeed, he hypothesized that there are mutually orthogonal Latin squares of order n if and only if n is not 2 mod 4. It turns out that there are none for n= 2 (this is trivial), 6 (this is hard, and surely where Euler put in some effort); but there is one for all other values of n. (Together, Bose, Parker and Shrikhande showed there are mutually orthogonal Latin squares for all even n congruent to 2 mod 4, n>6). (See p. 122 “Introduction to Combinatorial Designs”, Wallis, 2
^{nd }edition). - Graph labeling problems are a great place to find other potential “maximally incorrect” conjectures. Gallian has a wonderful survey of recent work, progressing so fast that it would otherwise be quite difficult to keep track of it.

Here’s one of personal significance: Conjecture 1 from this beautiful paper of Gray and MacDougall is very reminiscent of Euler’s conjecture.

The main part of this conjecture is that certain graph labelings exist; but the negative part–that some don’t–fail, completely, as soon as they stopped checking! Here’s the constructive proof. (I’m still proud of this contribution, with Jeremy Holden and James McQuillan.)

- Another example (this one famous) involves the product of sums of squares with a strikingly wrong conjecture. After Degen provided a formula for showing that the sum of 8 squares multiplied by the sum of 8 squares is again the sum of 8 squares, he believed, quite incorrectly, that there were similar larger formulas for sums of 2^k squares for any k.

Hurwitz proved that the (bilinear) formulas for sums of 2, 4, and 8 squares known to Degen were the only ones. Oops. (The analogous formulas for sums of 2 squares and sums of 4 squares are also significant for complex numbers and quaternions, respectively

- The circle division problem is often used as an excellent example of how small examples may suggest a striking, yet incorrect answer.

Given n points on a circle, a chord for each pair; chords divide the circle into how many regions? (The maximum number possible). For n=2, 3, 4, 5, 6 it’s 2, 4, 8, 16 and— 31. That’s 31 regions, not 32. The answer is the sum of binomial coefficients (n,4) + (n,2) + 1. It’s so tempting to think it’s a power of 2.

Here is an excellent video for this example by Grant Sanderson.

8. So now we wonder about our research:

The crossing number of the complete graph with n vertices is conjectured to be the square of a particular binomial coefficient for n odd. This seems very compelling, partly just because a binomial coefficient is a really simple expression for such a complicated question.

More precisely, if n=2m+1, crossing # of K_n is conjectured to be the square of the binomial (m,2). It’s true for many drawings, e.g., these ones.

And yet, the smallest unknown case is just m=6. With Pan and Richter, we provide a few clues for the first unknown case. My best shot at understanding so far, (with Richter, and work that I’m very proud of) is summarized here.

I expect this conjecture will be verified for m=6, but (personally) feel very unsure of the conjecture beyond m=6.

So in conclusion…

I hope you get the idea. When mathematicians seem obsessed with rigor, it is at least in part due to our history of making mistakes, seemingly every time we tried to jump to a conclusion. But perhaps there’s something more. So for that, we end with a quote from William Thurston:

*what we are doing is finding ways for people to understand and think about mathematics… what they really want is usually not some collection of of ‘answers’-what they want is understanding.*

This post gives one example of such a perfect formula along with the question: what might we learn from starting with a much weaker version of it?

**Geometric Series**

The formula for the sum of a geometric series with first term *a* and common ratio *r*, is well known. Perhaps you can write it down. The formula is perfect—it provides answers efficiently and the proof is easy to understand.

** Now pretend that you don’t know it.** Instead you are familiar with just one infinite sum:

Perhaps you discovered it on your own and don’t quite know what it means (suppose you haven’t yet learned about convergence). Nevertheless, let’s see how far you can get with it—an imaginary contest, if you will, between the use of the standard perfect formula and your smaller formula (the one above).

The most famous geometric series is, of course:

Eg.1:

Your formula, with *x=2*, wins—no contest. (I’m deliberately not providing answers). How about:

Eg. 2:

Well there are two options for how to adapt your formula. Either add 1 to the answer from eg.1, or else multiply every term from eg. 1, (and hence also the answer).

After *experimenting on your own examples*, you may find that multiplying every term (and hence the answer) by a constant is very useful!

Now for an alternating series:

Eg. 3:

How about using x=–2 and then multiplying the terms (and hence the answer) by –1? Are there other strategies? Your formula still seems to have the advantage. Let’s try ** finite** sums. This one has only 5 terms:

Eg. 4:

Piece of cake! Use the difference of two infinite sums.

Now for another finite sum and something really cool:

Eg 5:

**2+4+8+16+32**

Do we dare use the difference of two infinite series again? Let’s try:

Hmm. Using *x=1/2* gives the right answer but some of the steps don’t seem right!

Here’s another option:

Easy—we’ve seen the sum in the brackets before.

The first attempt was still useful. It may help us to understand this one:

Eg 6:

Here set* x=3/2* and the answer comes out right away.

Can you find an example where the standard perfect formula is really better? (Hint: I can’t). The restriction required for convergence (x>1) seems very natural. The process of adapting the formula by shifting it in various ways is familiar to a mathematician and it is fun; and now you have your own questions to answer: what question might lead a student to discover this simpler formula in the first place? Do you have a direct simple proof? Try it!

]]>Thank you for everything you put into good teaching. I know that’s it’s rare to see solid evidence that your extra efforts are really working. So here are two tiny examples of ways that teachers helped me, decades ago, probably dramatically, but probably in ways that would not be detected by any study on teaching effectiveness. I intersperse these examples with two equally tiny suggestions for mathematics teaching. My hope is that these suggestions may also be helpful.

- How does one read out loud? How does one become prepared for calculus without learning calculus?

I might have been 6 years old. My classmates and I were challenged to read passages from a book. It sounded like we would read a word, then, eventually, read the next word. One day there was a fleeting comment from the teacher—telling us to have our eyes a little bit ahead of the word we were saying. Was the start of great reading? It probably took years to get good enough at it for it all to pay off. I don’t believe my teacher got better reviews for this transformational suggestion, and I don’t even remember her name. But the idea works everywhere. Sight reading on the violin?—easier. Preparing for your future?—it’s just more natural to anticipate the next step.

I would love to see more of these fleeting comments in grade school mathematics classes! Here is a possible example involving the formulas for the area of a triangle and the area of a trapezoid. To make it easier, we’ll use a right angle triangle. One commonly used way to approach a trapezoid, is to view it as a difference of two triangles. (See the green trapezoid in figure 1).

*The green area represents the difference in area between two triangles—both triangles have a vertex at the origin.*

Now take it further—imagine a growing right angle triangle, one side formed by part of the x-axis, another side formed from part of the line y=cx, shown in figure 1. (The third side is parallel to the y-axis). Ask about the last bit of growth in the last second—measure the change. Calculus is all about measuring change, anticipating growth, looking slightly ahead, and having a measurement for it. I believe that middle school students will be inspired by gaining a sense of what calculus is, even without having to learn any of the details beyond the ones they are already learning anyway. I have no proof that this line of discussion would prepare any student for calculus courses 10 years later; but surely it can’t hurt; my sense is that many university students pass courses deprived of really knowing or caring why they’re doing what they’re doing. So your role in this process, and therefore your potential influence on your young students, may be much bigger than it appears. Much of this piece, especially “Example 1, Section 3” is accessible without too much prior knowledge of mathematics. While it was originally intended for calculus teachers, I now believe all math teachers could use a small part of it to help their super young students prepare for *the idea of calculus*.

- How do we memorize? Why do we memorize? How do we generalize?

“Repeat it here, repeat it there, repeat it everywhere. Write it down, compare it with the correct version and ask yourself if the differences are important. Eventually you will discover what it all means.”

I don’t actually know if anyone told me exactly this, or where I learned to memorize this way; but almost for sure I was given advice like this, from somewhere, when I was very young. Don’t just assume that your students will simply pick this up. Tell them, show them, and it will help (although maybe not soon enough to help your teacher evaluations—sorry!). Here’s an example of how memorization in mathematics led to a possibly new, definitely elegant proof of an old theorem (stated near the end of the post). It’s another triangle fact—simple, yet sophisticated.

**Memorize this fact**: The distance from the vertex of an equilateral triangle to the opposite side does not depend on the choice of vertex.

What does this mean? Look at triangle ABC in figure 2.

*Figure 2. An equilateral triangle.*

It means the (drawn) distance from B to AC is equal to the (not drawn) distance from C to AB.

“Repeat it here, repeat it there, repeat it everywhere…” What does our fact mean in figure 3?

It could mean that the distance from P to QB is the same as the distance from B to QP, provided that QBP is an equilateral triangle (so let’s assume that QP is parallel to AC).

*Figure 3. Two equilateral triangles.*

But equilateral triangle ABC is still in this picture, and guess what else is equal to the distance from B to AC?—the sum of distances from P to AB and from P to AC (see figure 4—can you see why?)

*Figure 4. The sum of the lengths of the two dotted blue lines is equal to the altitude of triangle ABC.*

By repeating our old fact in a different place, and with a bit of thinking, we now have a new, **second fact:** If P is on a side of an equilateral triangle (in this case side BC on triangle ABC), the sum of the two distances to the other two sides is the altitude of the equilateral triangle.

Repeat this second fact, looking at figure 5. “Eventually you will discover what it means.”

*Figure 5. Viviani’s theorem.*

This second fact—applied to equilateral triangle MBN and with P lying on MN—means that the sum of two distances from P to MB and P to BN is equal to the altitude of MBN. (As you guessed, MN is parallel to AC).

Now switch to triangle ABC—the sum of the three distances—from P to AB, from P to BC and from P to AC gives the altitude of ABC (see figure 5). Beautiful; we have:

**Third fact:** If P is any point in an equilateral triangle, then the sum of the three distances, from P to each of the three sides is equal to the altitude of the equilateral triangle.

This is known as **Viviani’s theorem**. For a more detailed discussion regarding this proof, please see this preprint.

Memorization leads to generalization, which is essential for understanding how modern mathematics works. The importance of generalization, and therefore of memorization, can be communicated at very early stages of mathematical development.

Notes:

- I first saw Viviani’s theorem in “A Friendly Mathematical Competition: 35 years of teamwork in Indiana,” a problem book edited by Rick Gillman, published by the MAA. Our proof of Viviani’s theorem and the resulting preprint mentioned above was a result of discussions with Dr. Armstrong after Putnam training sessions, using this book.

- I’ve suggested “Example 1” from my paper (coauthored by Darlene Olsen). However, example 2 from the same paper addresses two issues. The first is that it provides a simple calculus-based explanation of the fact that the ratio of area of a circle to the square of its radius is the same as the ratio of its circumference to its diameter. However it also prepares students for the method of cylindrical shells—a common difficult part of a standard calculus course. In fact, an informal preview, 10 years before a calculus class, of the intuitive idea of example 2 could help (we have absolutely no statistical evidence of this). It does seem to me though, that this is one of the possible benefits of society asking that the people who teach math know something about where their students may be headed. I sincerely hope that our society asks for even more mathematics knowledge and passion from our mathematics teachers.

- I recommend that you continue to follow your own thoughtful judgment regarding pedagogy. Research on pedagogical techniques (just like research on which foods are good for you) may take a lot of time to settle. Please keep in mind that research suggesting that teaching via memorization is not effective, may subtly assume that students already know how to memorize; if one experimental class focuses on something students aren’t getting elsewhere then it may appear to be far more valuable than it would be if, say, all classes were focusing on that same thing. Perhaps memorization will be considered far more valuable in the future when fewer teachers are focusing on it! My suggestion is that you continue to try to do everything well and with a purpose, whether it is memorization, inquiry based learning or some combination of many techniques. Thanks!

]]>

For example, Tevian Dray suggests:

“Skip the fine print. Emphasize examples…Emphasize the need to be fluent with multiple representations…[emphasize] geometric reasoning as the key to conceptual understanding.”

Another recent post by Katie Courage summarizes a study featuring the lack of *understanding *of calculus I (not a lack of *performance* in Calculus I!) as a major reason cited by women for dropping out of STEM majors.

So, in order to facilitate an engagement of our full-brains, I would like to encourage the use of imagination and do a better job promoting conceptual understanding.

My main effort in this direction resulted in the (free and open-access Journal of Humanistic Mathematics) paper entitled, “A Truly Beautiful Theorem: Demonstrating the Magnificence of the Fundamental Theorem of Calculus” co-authored with Darlene Olsen, who is a Homer L. Dodge award winner for University Teaching Excellence at Norwich University. Her efforts, and excellence as an expositor ensured the writing of that paper is of high quality!

Before reading it, you may ask yourself (or your students) why the derivative of the area of a circle should be the circumference of the circle, or if you really could convincingly explain the formula for the volume of a cone, or if you’re happy with how students absorb related rates. Do students think of the chain rule as merely a method for calculating derivatives, or can they use it backwards? Do they really get the First Part of the Fundamental Theorem of Calculus? Can you think of a useful classroom example that can be used in the first week of the course, which can illustrate all of the essential features of the course? Ok don’t wait—please read our paper now!

One thing we *challenge* is the following dogma:

“Limits must be taught before derivatives. Derivatives must be taught before integrals. Riemann sums must be taught before definite integrals.”

For this dogma is simply not true. It’s not true historically and it is very strange indeed that our method of teaching calculus is so far removed from the human thinking that led to its discovery.

In fact I see no advantage in hiding derivatives from students while teaching limits. I see plenty of advantages to revisiting the Fundamental Theorem of Calculus many times in different contexts throughout an introductory course. Our paper suggests very minor changes to an introductory course—so they would be very easy to implement for most experienced teachers. We believe such changes would substantially alter the perception that a student would have of calculus.

A few carefully chosen examples may help to connect limits, derivatives and integrals, leading to a more enriched view of the subject.

So instead of introducing the main topics strictly sequentially, please try—at least in a few more places—to introduce them simultaneously. In other words, *let’s integrate calculus!* *Please* also provide me with feedback—I’d love to know how it works out.

Pick an odd positive integer, n. Draw the “complete graph” with n vertices (you may think of the vertices as dots, and between any two of them we draw a line. The lines do not need to be straight lines. The lines are called *edges*. The complete graph with n vertices is denoted K_n). In figure 1 we see two drawings of K_5, the complete graph with 5 vertices. Notice that both drawings (figure 1a and figure 1b) have an odd number of crossings.

In figure 2a there is drawing of K_5 with 4 crossings. However figure 2a has something that’s deemed a bit silly—two of the crossing edges (the red edges) both share a vertex (the bottom right vertex).

In general, there is a technical definition of a “good drawing” that prevents exactly this. There is a reason for this restriction: if two crossing edges are incident with the same vertex, then the drawing can be slightly modified to remove that crossing, without introducing any new crossings in the rest of the drawing, shown abstractly in figure 3a. It is common to ignore a drawing if it is “not good” (a drawing that is not good cannot be part of a drawing of K_n with the *fewest number of crossings*, for example).

Goodness also requires that edges don’t touch each other tangentially (figure 3b), a pair of edges don’t cross each other more than once (figure 3c), and that no edge crosses itself. If three edges happen to cross at the same point we consider it to be 3 crossings (some will simply insist that 3 edges do not cross at the same point, via a slight transition shown in figure 4, that would not affect any other crossings in the drawing).

Figure 2b shows what happens to the drawing in figure 2a after removing the crossing that is not “good,” and guess what? There are an *odd number* of crossings in the *good* drawing shown in figure 2b.

We will soon show that every good drawing of K_5 has an odd number of crossings. First, an easy fact: Every *good* drawing of K_3 must be without any edge crossings. Furthermore every good drawing of K_3 has an “inside” and an “outside”. If beyond the K_3 there are two other vertices, u and v, such that u and v are both on the same side of the K_3 (in figure 5 they are both *outside* the K_3), then an edge joining u to v must cross the edges of the K_3 *an even number of times in total* (possibly zero times if uv doesn’t cross the red K_3 at all. However, if you start to draw uv starting from u, and if your edge uv enters the *inside* of the red K_3 by crossing a red edge, then it must also return to the *outside* of the red K_3 to get to v—resulting in a second crossing involving uv. So if there is a crossing involving uv, there must be at least 2 crossings of uv with the red K_3, etc…) Similarly, if u and v are on opposite sides of the K_3 (one is outside but the other is inside) then the edge uv crosses the K_3 *an odd number of times*. (This simple idea will come in handy when we analyze figure 7 below).

Next, to review and reinforce your understanding of the “goodness” criteria, convince yourself that every *good* drawing of K_4 has at most 1 crossing. If you try to draw K_4 with two or more crossings, there will be a goodness violation (probably there will be one vertex incident with 2 crossing edges—there are only 4 vertices, so check them all!)

In any case, the upshot is that each crossing requires 4 vertices and actually, the crossings in a drawing of K_5 can be counted by *counting the K_4’s that have a crossing*.

This may seem much harder than just counting the crossings, *but it works well on an abstract level—that is, when we don’t know what the drawing looks like, but still have to prove things about it*. This is what makes the theorem and its proof beautiful—we will prove there is something common to infinitely many different drawings of K_5.

To make this abstract approach more concrete, look at figure 1b–if we remove the top vertex and its incident edges, we end up with a K_4 with a crossing in it (figure 1c). If on the other hand, we start with figure 1b and remove the bottom vertex and its incident edges, we get a K_4 that does not have a crossing (figure 1d).

There are three other vertices we could have deleted to get a K_4 (try to draw them!) Only one of the five vertex-deleted K_4’s has a crossing. We see in a different way (other than directly counting crossings) that there is one crossing in the K_5.

Now consider figure 1a. If we delete the bottom vertex and all its incident edges, we are left with a K_4 with 1 edge crossing–the crossing highlighted in red.

As there are 5 vertices you could have deleted, there are 5 different drawings you need to make from figure 1a, each with a different crossing (try it, at least mentally), so we see in a different way that there are 5 crossings in figure 1a.

For a given drawing, if all five of the vertex-deleted K_4’s have a crossing, then the K_5 has 5 crossings, which is an odd number. A good drawing of K_5 either has 5 crossings or else we can find a K_4 inside it so that *if we just look at that K_4*, it does not have crossings. This motivates:

**Lemma:** *Suppose a drawing D of K_4 has no edge crossings. Suppose that D is part of a good drawing D’ of K_5. Then D’ has an odd number of crossings.*

We have an example of a drawing of K_4 without crossings (see figure 6). There is an extra vertex v that we wish to use to extend the K_4 to a drawing of K_5. Although this is a *specific picture*, there are features of it that are *always true*. In particular, deleting the edges incident with a specified vertex of the K_4 will result in a K_3 with an inside and an outside (we obtain figure 7a from figure 6 by deleting the edges incident with the vertex ‘a’. Both ‘a’ and ‘v’ are outside the remaining K_3 with vertices b,c and d). This looks like figure 5!

If we wish to count the total number of crossings of a K_5 that involve the edge va, then we should not look at the other edges incident with ‘a’, since the edge va can’t cross them anyway (by goodness–two edges incident with ‘a’ cannot cross each other). Also, va can’t cross the other edges incident with ‘v’ (also by goodness–two edges incident with ‘v’ cannot cross each other), so we won’t worry that we can’t see the edges incident with ‘v’ in figure 7a.

When we count the number of times va crosses anything in the K_5, it must be an even number since ‘v’ and ‘a’ are both on the outside of the K_3 with vertices b,c and d. A similar argument (looking at figure 7b and the K_3 with vertices a,c and d) shows that vb crosses an even number of times. (Note that the va crossings are all different from the vb crossings, by goodness again!).

The number of times vc crosses in D’ is similarly even (see figure 7c); but the number of times vd crosses in D’ is odd, since v and d are on opposite sides of the K_3 made up of a,b and c as shown in figure 7d.

To summarize: All of the crossings of D’ will involve v, since D has no crossings. We add together the number of times each of the edges va, vb, vc and vd cross anything in D’ to get the total number of crossings in D’. Exactly three of these four numbers are even, so the sum is odd. By “goodness,” these numbers represent distinct crossings, (no single crossing can involve two edges incident with ‘v’) and as such these crossings can be added together meaningfully.

All of these facts depended only on the fact that we started with a drawing of K_4 with no crossings, and not the particular drawing in figure 6. (Imagine ‘v’ had been in a different location, for example, very close to ‘d’ inside the K_3 bounded by a,b and c. Check that the argument still works!)

This essentially proves the lemma. Together with the discussion before the lemma, we now know that every good drawing of K_5 has an odd number of crossings. We’ve also seen that every good drawing of K_3 has zero crossings (and zero is even, so “every good drawing of K_3 has the same parity as every other good drawing of K_3” is a true statement) and we know that K_4 has a good drawing with 0 crossings and another good drawing with 1 crossing (so no parity theorem for good drawings of K_4).

Based on these 3 (measly) examples, n=3, n=4 and n=5, we are now ready to state the theorem. For it’s short proof, miraculously, no more geometry is needed:

**Theorem** (Kleitman): *Let n be an odd integer, and D be a good drawing of K_n. Then the number of crossings in D has the same parity as the binomial coefficient “n choose 5”.*

Before getting to the proof, here is an example with n=7 to illustrate the idea of the proof. If you have 7 objects, there are 21 ways to select exactly 5 of them (try it). So we say “7 choose 5” is 21. In our context, we count crossings in a K_7 by separately counting the number of crossings in each of the twenty-one K_5’s in the K_7. (This counts each crossing more than once—hmm…but we don’t want the *number* of crossings, we want the *parity of the number* of crossings… More explanation to follow!)

Each of these twenty-one numbers is odd (since each of the twenty-one K_5’s has an *odd number* of crossings). So the sum of the 21 numbers is also odd. When we add these 21 numbers, *every crossing would have been counted 3 times* (a crossing requires 4 vertices; when you look at a single crossing it therefore belongs to 7-4=3 different K_5’s). So our odd sum of 21 numbers, divided by three, is the number of crossings in the K_7. *We don’t know what the number of crossings is (that’s why we didn’t try to count the exact number of crossings)*, but we know the number is *odd*, since an integer that is obtained from odd integer divided by 3 is necessarily odd.

In general, to find the number of crossings in any good drawing of K_n, we add “n choose 5” different (unknown, but definitely odd) numbers together and divide by the odd number (n-4) to get the exact number of crossings…And we don’t know what the exact number is but we also don’t care; the sum of “n choose 5” odd numbers must have the same parity as “n choose 5” (regardless of what integer “n choose 5” turns out to be). If this sum is divided by an odd integer and the quotient happens to be another integer, then this quotient also has the same parity as “n choose 5.” But we divide by n-4, which is odd since n is odd. So this completes the proof.

Remarks:

- This presentation is a first attempt at a response to the criticism: “Why don’t mathematicians or scientists ever write the way they think?” In fact, this “unnecessarily” long exposition does not mean that it is difficult; my hope is quite the opposite. The length of this post is an attempt make it as lucid as possible, for as large an audience as possible. I welcome feedback to improve the clarity, and to reduce the prior expectations on the reader.
- This discussion is much different from Kleitman’s original approach. For a full discussion, please see the paper: A Parity Theorem for Drawings of Complete and Complete Bipartite Graphs, by Dan McQuillan and R. Bruce Richter, in the American Mathematical Monthly, Vol. 117, No. 3 (March 2010), pp. 267-273. Furthermore, there are several other topics in the paper not covered in this post. Please check it out and enjoy!

Except that this strategy lowers your chances of scoring a run. We can prove it, and we did so in the April 2016 edition of Math Horizons, a publication of the Mathematical Association of America (MAA). (Our article, *Run for Third! A Defense of Aggressive Base Running* is free to members of the MAA, and most university libraries will subscribe to the journal *Math Horizons*).

The fact is that at the major league level, your chances of scoring from third base with one out are so much higher than your chances of scoring from second base with one out, that you should run even if you don’t know whether or not the ball will be caught.

To be more specific, if I (an aggressive base runner) see the ball hit to right field, and don’t know if the outfielder will get there on time, I will run and I might make it to third base instead of second base. Sometimes this aggressiveness will result in a double play. If you play the conventional way, you may end up at second base. Sometimes (if the ball is caught) you will end up at first base with TWO outs.

The analysis is therefore simple. Compare the advantage of my way versus the advantage of your way. If the ball falls in for a hit, I have the advantage. I’m at third base, where major leaguers score (with one out) more than 60% of the time. You are at second, where major leaguers score (with one out) less than 40% of the time.

If the ball is caught, you have the advantage. But it turns out this advantage is small. At the major league level, a base runner scores from first base with two outs about 14% of the time (only 13% in high leverage situations, like a tie game in the bottom of the ninth).

Each strategy has an advantage—mine when the ball falls in for a hit, and yours when the ball is caught. We can compare these two advantages by using a little bit of undergraduate mathematics (hence our choice of a famously student-accessible journal). It turns out that my advantage is greater than yours. It’s not close. As long as the game situation suggests that playing for one run is the correct strategy, then this version of aggressive base running is also the correct strategy.

This has been missed by modern statisticians simply because it hasn’t been tried. So please try it! *Please* let me know if you see a game where it was used, or a game where it should have been used and wasn’t.

Thanks to my coauthors, Peter Macdonald and Ian McQuillan. We’ve been talking about this for some time. We’ve tried posts before (like this one), as well as other preliminary preprint versions of the paper that were not as clear as this publication. So please read this article for the details!

]]>

Our example for this post appeared on the 2014 Collegiate Mathematics Competition, sponsored by the Northeastern Section of the Mathematical Association of America:

*A husband and wife invited four married couples to a housewarming party. Just before everyone left, the husband asked each person (including his wife) how many people they had shaken hands with. Their responses were 8, 7, 6, 5, 4, 3, 2, 1, and 0. Given that no one shook hands with his or her own spouse and no pair of people shook hands more than once, determine how many times the husband must have shaken hands with a guest.*

In fact this little puzzle is not too difficult, and you may wish to try it for a few minutes before reading further! (Can you do it in your head?)

— Empty space to encourage playing/thinking —

*Congratulations!* By now you’ve either solved the problem or else realized that it’s not that easy. Either way it’s a win—in both cases the next question is:

Can we make up an analogous question with fewer than four married couples invited to the party? Does the question make sense with just one married couple? With two married couples? With n married couples?

(If you haven’t solved the problem, the one-couple version is a very nice playground, which will hopefully help with the original. If you have solved it, then you may be in store for a better theorem and a different proof).

We don’t know yet whether or not this problem will actually generalize nicely. Nevertheless, for the one-couple version, let’s say the responses to the handshake question are 2, 1, and 0. With only 4 people present at this party, the person shaking 2 hands—not shaking the hand of the 0-handshake person—must shake everyone else’s hand. Hence the 2-handshake and 0-handshake people are married to each other. The 1-handshake person shakes hands with the 2-handshake person and nobody else. Now we have a complete picture of everyone’s handshakes except the husband. Fortunately that’s enough to recover all needed information about the husband, who shakes the hand of one person. His wife also shakes hands with one person.

Now for the two-couple version of the problem: assume the responses to the handshake question are 4, 3, 2, 1 and 0. We could proceed in the same way, forming a complete picture of who must shake everyone else’s hand, starting from the 4-handshake and 0-handshake people, who must be married to each other, etc. But here’s a different tact: once we realize that the 4-handshake and 0-handshake people form a married couple, we remove this couple—induction! Note that this removal necessarily reduces the number of handshakes by 1 for each remaining person. We are in fact reduced to the one-couple version of the problem. This suggests both a general theorem and its proof:

**Theorem**: *Let G be a simple graph with 2n+2 vertices. Assume the degrees of 2n+1 of the vertices are given by the numbers 0, 1, 2, … , 2n. Then the last vertex must have degree n. Moreover, for i=1,2,…,n, the vertex of degree n−i is not adjacent to the vertex of degree n+i, and the two vertices of degree n are not adjacent to each other.*

There are a couple of important objections. The first one is that this doesn’t look like the original problem; unless a vertex represents a person and an edge represents a handshake. We’ll address the other objection after the proof.

**Proof**: Let x be the unknown degree. We wish to prove x=n. We proceed by induction on n, the case n=1 being not too hard (and discussed above). We now assume the result true for n−1. Consider a graph G with 2n+2 vertices as in the hypothesis of the theorem. Delete the vertices of degrees 0 and 2n. This reduces the degrees of each remaining vertex by 1, leaving 2n vertices with degrees:

0, 1, … , 2n−2 and x−1.

By the inductive hypothesis x−1=n−1, the two vertices of degree n−1 are not adjacent to each other, and a vertex of degree n−1+i is not adjacent to the vertex of degree n−1−i (for i=1, 2, … n−1). Replacing the two deleted vertices from G gives the desired result. QED.

Now for the other objection: the hypothesis that no one shook the hand of his or her own spouse seems to have disappeared, suggesting this assumption was not necessary to solving the problem. Yet the conclusion of the theorem includes a description of vertices not adjacent to each other. So our theorem is better—it has a weaker hypothesis and it guarantees that the n-couple version of the problem describes a situation that could actually happen. If we do assume that no one shakes the hand of his or her own spouse, then one can show—by induction on i—that the person who shakes 2n-i hands is married to person who shakes i hands.

So this was much more involved then it had to be. Yet it is satisfying on many levels. We made a small (n=1) toy version of the problem, we provided a generalization and we showed that the hypothesis of the general problem is realizable.

**Remarks**:

1. The n=3 version of this problem appears in Loren C. Larson’s outstanding book, Problem-Solving Through Problems, in a section entitled, “Draw a Figure.” We chose this complementary, algebraic approach. It’s certainly instructive to compare the two approaches, to see exactly how the use of induction on n gets the same conclusion without *explicitly* drawing the entire picture.

2. Another written solution (closer to the one in Larson’s book) is available from the Collegiate Mathematics Competition webpage by looking up the practice problems from 2014, question 4. Many thanks to my colleague Rob Poodiack for bringing my attention to these excellent problems.

3. This problem is perfectly adaptable to many different levels of difficulty and discussion. So it’s a perfect question to consider, even on a lazy summer afternoon. Enjoy!

]]>In the post on the Mean Value Theorem (MVT), we proved as a warm-up example that:

7^(1/3)+9^(1/3) < 4

As a quick review, MVT states that if f(x) is a differentiable function on the open interval (a,b) and continuous on the closed interval [a,b], then there is at least one point z in (a,b) such that f′(z)=[f(b)−f(a)]/(b−a). We rewrite this as:

f(b)−f(a)=(b−a)f′(z) (MVT)

Now set f(x)=x^(1/3) observe that f′(x) is decreasing when x>0 and use MVT twice to get:

f(9)−f(8)<f(8)−f(7). Since f(8)=2, this last inequality gives the desired answer.

These two applications of MVT use different choices for b and a, but in both applications

b−a= 1. As a generalization, we imagine

b−a=d>0, with a+d≤b−d and we consider any function whose derivative is decreasing:

**Which is bigger, f(a)+f(b) or f(a+d)+f(b−d)?**

By comparison, our original question had:

a=7, b=9 and d=1.

From two applications of MVT we see that

f(b)−f(b−d) < f(a+d)−f(a)

Hence we have our generalization:

f(a)+f(b)<f(a+d)+f(b−d)

**But is this generalization useful?**

Here’s where the post, “On the value of believing that you know the answer” comes in to play. The goal of that post was to prove the arithmetic-geometric mean inequality. However, instead of directly proving that the arithmetic average of n numbers was bigger than the n-th root of their product, we instead replaced two of the n numbers, a and b say, with two new numbers a+d and b−d (choosing d so that one of these new numbers was the arithmetic average). This change maintained the same arithmetic average, and we showed that the n-th root of the product increased. Finitely many of these two-at-a-time changes produces n equal numbers, and the result follows from comparisons.

The same process works for a very analogous application; given n numbers, whose (arithmetic) average is A, then Jensen’s inequality states that

if f′(x) is decreasing then f(A)≥Z

where Z is the arithmetic average of the n outputs

i.e. Z = [f(x_1)+f(x_2)+…+f(x_n)]/n and

A=[x_1 + x_2 +…+x_n]/n).

If the n numbers are all equal, there is nothing to prove. Otherwise, let a be the smallest one, a<A and there must be another one b say, which is bigger than A. If A−a<b−A then set d=A−a. Otherwise set d=b−A. This guarantees that either a+d=A or b−d=A. To prove Jensen’s inequality, replace the two numbers a and b with a+d and b−d. The average of the n new numbers will still be A, but the average of the n outputs will be larger—*Our generalization is useful!* Hooray!

Fewer of the n numbers will be different from A. We repeat, similarly altering numbers different from A until there are none remaining.

**Remarks**

- This proof is well-known, but this discussion suggests a natural way to find it.
- It is also well-known (and by now probably not surprising) that setting f(x)=log x in Jensen’s inequality immediately yields the AM-GM inequality.
- In “It’s a Mean Value Theorem” the main example was to show that in any acute triangle with angles A, B and C, that sinA+sinB+sinC>2. Let f(x)=sin x. Notice f′(x) is decreasing for acute x. Hence by Jensen’s inequality, f((A+B+C)/3)≥(sinA+sinB+sinC)/3. Rearranging slightly gives us:

sinA+sinB+sinC≤3sin(π/3).

This is a nice upper bound to go with the lower bound of the previous post.

]]>f′(z)=[f(b)—f(a)]/(b—a). In our applications below, we prefer to write this last equation as:

f(b)—f(a)=(b—a)f′(z) (MVT).

The purpose of this post is to see the Mean Value Theorem at work in a couple of places where we may not expect to see it. The second example is much more interesting than the first.

**Example 1. (Warm-up example):** *Which is bigger*

*7^(1/3) + 9^(1/3) or 4?*

If you have a calculator or a phone handy, it will probably tell you that the first expression is approximately 3.993. But that’s cheating. Instead consider the function

f(x)=x^(1/3). A straightforward application of the power rule shows that f′(x) is decreasing for x>0. Hence (using MVT twice):

f(9)—f(8)<f(8)—f(7).

But since f(8)=2, this last inequality gives the desired answer.

**Example 2:** *Prove that in any acute triangle with angles A, B and C, that*

*sinA+sinB+sinC>2.*

I have a vague recollection of trying this problem as part of a Putnam training session as an undergraduate student. Recently I saw it again as a post on Stephen Cavadino’s blog.

In fact I recognize Stephen’s clever solution as being the same one presented at my Putnam club decades ago. It’s such a slick solution that it’s definitely worth reading.

Nevertheless, I’m not sure that I could have come up with it. So here’s an alternative, using MVT.

This solution hopefully also highlights the value of chosing good examples to help understand the problem.

**How to solve it:**

I experimented with the conditions of the problem, quickly realizing why the “acute” hypothesis is necessary (otherwise sinA+sinB+sinC can be made arbitrarily close to 0). However, the biggest angle could be a right angle and still have the inequality holding.

*In fact, a right triangle with hypotenuse 1 makes for an extremely interesting special case.* Indeed if C=π/2 (radian measure) then sinC=1, and our required inequality reduces to a special case of the triangle inequality.

Inspired by the simplicity of this example, I decided to compare an arbitrary acute triangle to it. Since we will be comparing different triangles T and T′, we

let f(T) be the function that adds the sines of the three angles of a triangle.

First let’s rank the acute angles; without loss of generality A≤B≤C. Also, let

D=(π/2)—C. Observe that D<A. (This is because D+C+B<π, but A+C+B=π).

So consider a new triangle T′ with angles A—D, B and C+D (the side lengths won’t matter). The new triangle has a right angle (C+D=π/2) and so by our previous discussion f(T′)>2. It remains to show that

f(T) ≥ f(T′). This follows from:

**Lemma:** *Let D>0 and assume that D ≤ A ≤ π/2—D. Then:*

*sin(A—D)+1≤sinA+sin(π/2—D) —D[cosA—cos(π/2—D)]*

Assuming the lemma is true, we have:

sin(A—D)+1≤sinA+sin(π/2—D)

i.e., we imagine replacing two of the summands in f(T′) with the corresponding summands from f(T).

Hence f(T′) ≤ f(T).

To prove the lemma, we use the mean value theorem twice. Once to show:

sin(A)—sin(A—D) ≥ DcosA (exercise!)

and the other time to show:

sin(π/2)—sin(π/2—D) ≤ Dcos(π/2—D).

Combining these last two inequalities yields the lemma.

Notice that in our application of the lemma, we disregarded the term:

—D[cosA—cos(π/2—D)]. This extra term could easily be used to sharpen the inequality.

Another way to sharpen it is to compare our right triangle T′ with another

right triangle T′′ where the angle B in T′ is replaced by an angle approaching π/2.

This means that f(T′′) is approaching 2.

At the beginning, searching for an example where f would approach 2 was part of the process. Realizing how the inequality may fail to hold for obtuse triangles also suggested having a right triangle. So I see this solution as a natural consequence of playing with the conditions of the problem and experimenting with the right examples.

]]>*Unwritten Rule: With fewer than 2 outs, a base-runner on first base shall be certain that a ball hit in the air will fall in for a hit before straying too far away from first base.*

This is a natural consequence of a written rule allowing for double plays when the ball is caught and the base-runner doesn’t “tag up” before advancing.

So…the purpose of this post is to identify one situation where the unwritten rule is wrong—so wrong that it costs the game.

**One runner, on first base with 1 out**

Imagine there is a runner on first base with 1 out. It’s a tie game in the bottom of the ninth inning and the outfielders are playing back to prevent a double. The ball is hit in the air, but off the end of the bat—a blooper into right field. The right fielder runs; no one can tell if he’ll get there in time to catch it, but if he does the base-runner better get back to first base to prevent a double play. If the ball falls in for a hit there will only be 1 out but the runner only makes it to second base, since he was standing and waiting to see if the outfielder would catch it.

**Should the base-runner stand and wait to see if the ball is caught?**

The answer is “no”, and it’s not even close. This is the risk/reward analysis that is apparently not understood. But it’s a really simple argument. When the ball is hit in the air, let’s say the runner makes a snap judgment that it’s “better than even” that the blooper will fall in for a hit. In other words, let p be the probability that the blooper will be a hit and the runner quickly decides p>0.5. Our analysis involves two possible base-runners, A and B. Player A (A for “aggressive”) breaks protocol and just runs without waiting to see if the ball is caught. His aggressiveness either gets him to third base or else gets him thrown out. Player B (for “baseball professional”) follows protocol, and waits.

**Who wins more often, A or B?**

Remember it’s tied in the bottom of the ninth, so one run wins the game. If the probability of scoring from third base with 1 out is T (T for “third”) then the product pT is the probability that player A scores. This product captures the situation that the blooper falls in for a hit, and then somehow the base-runner eventually scores. (We don’t care how this happens, we just assume it happens with probability T). He won’t score if the blooper is caught since he’ll be doubled up and the inning will be over.

For player B there are two cases. The first case is that the blooper is a hit, putting him at second base. He could eventually score from second—again we don’t care how he might score from second, but we assume he does so with probability S. The second case is that B could score if the blooper is caught (although it would be much harder since there would be two outs and he would be at first base). His probability of scoring is therefore pS+(1-p)F where F is the probability of eventually scoring from first base with TWO outs. So what’s bigger pT or pS+(1-p)F?

**The punchline**

The quantities T,S and F above can all be calculated! (We know how likely it is that a runner will score from third base with 1 out, etc). The short answer is that

pT>pS+(1-p)F whenever p>0.38.

So if you think there is a 38% chance or better that the blooper will be a hit, RUN!

Many more details are available from our preprint:

http://arxiv.org/pdf/1505.00456.pdf

**Rally killers and Mariano Rivera**

Here is the general justification for running like base-runner B (from http://baseball.isport.com/baseball-guides/cardinal-rules-of-base-running):

*“There’s no worse rally killer than a double play. Moreover, there are few plays more devastating for an offense than a line drive with runners on that turns into a double play. To avoid such a huge momentum shift, you need to make it a priority not to get doubled off on the bases.”*

Well I don’t think it’s much of a rally if there are 2 outs and only one runner, on first base. In other words the rally is on life-support as soon as the blooper is caught. Furthermore, if it’s a tie game in the bottom of the ninth, you just need one run, not a rally. In http://arxiv.org/pdf/1505.00456.pdf we consider “high-leverage innings” defined as the 8^{th} or 9^{th} inning where the difference in score is at most 1. Between the years of 2003 and 2008, Mariano Rivera pitched in 58 different high-leverage innings where there were 2 outs and one runner, on first. None of those runners scored. Unlike most pitchers (where 38% certainty is enough) one only needed to be 14% sure that the blooper would be a hit before running aggressively when Rivera pitched. This is not just because Rivera’s pitching record is better than other pitchers (although it certainly is!) but more importantly it is because he was only slightly better than average at preventing a runner scoring from third with 1 out. This is the point—you don’t need a hit to score from third with one out. In high-leverage situations, one is often facing the pitchers against whom hits are most difficult to obtain, and these are the situations that determine the outcomes of games.

**Why don’t they do it this way?**

The simple answer is that there is no *direct* data to show that player A wins more games, because *no one ever does it that way*. If a team tried it this way, then data could be collected to bear out the conclusions of our indirect arguments. We continue to wait for someone brave…

UPDATE: I’ve written a new post about this topic, in part because we’ve improved it enough that it ended up published in the April 2016 edition of the journal Math Horizons. The new blog post, and even better, the published paper, do a better job of focussing on the main point. Enjoy!

]]>The average of the three numbers 4, 5, 6 is clearly 5. A consequence of AM-GM is that the cube root of the product (4)(5)(6)=120 must be less than 5. Since the cube root of 120 is less than the 5 (i.e. the cube root of 125), we gain some confidence that it works. As a natural-born-devil’s-advocate, we try to change the example so as to give the geometric mean a better chance. Keeping the average at 5, the best choice for maximizing the geometric mean seems to be: 5, 5, 5. Thus, all the numbers are equal. In this extreme case, both the arithmetic and the geometric means must be the same. In fact this suggests a proof.

Keep the arithmetic average the same and make the geometric average as big as possible. We suspect that the only way to achieve this is to make all the numbers equal. Let’s just *believe* this and see what happens. Notice that “keeping the average at 5” really means keeping the sum at 15; also “maximizing the geometric mean” actually suggests maximizing the product. When we replaced “4, 5, 6” with “5, 5, 5” we really needed to compare the product (4)(6) with (5)(5). This is a consequence of the difference of squares formula

**Difference of Squares**: *(x — a)(x + a) = x^2 — a^2*

with *x=5* and *a=1*. In this example we effectively only replace two numbers—the 4 and the 6—with their average—5 with another 5.

So let’s try a more challenging example—three numbers 4, 4 and 7. If we try the same trick, perhaps we replace the 4 and 7 with a couple of 5.5’s. This gets us a bigger product, just as in the above example, but we don’t seem to be getting closer to proving that having all 5’s yields the maximum product.

*Here’s where the belief in the answer helps us:* Instead of moving the 4 and the 7 to

There are many ways to see why the product (5)(6) should be bigger than the original (4)(7), but we defer to the difference of squares formula once again. This time let *x=5.5* and compare what we get with *a=1.5* versus *a=0.5* to see exactly what’s happening. (Don’t skip over this step—it’s the heart of the matter).

Now we have three numbers 5, 4 and 6 with the same sum, but a bigger product. This triple should look familiar (our first example!)—we remember that the next step is to replace the 4 and 6 with a couple of 5’s.

The experienced mathematician now instantaneously fills in all the details of a proof, arguing that *if* there is any number other than the arithmetic average, then there must be one both less than the average and another one more than average. So if any of the numbers are other than average we can make the product bigger, and bigger and bigger…etc…

However the main idea can be understood (perhaps at a different level) by elementary school students, provided they are comfortable with the difference of squares formula.

Toward that end, let’s run through a little game on five numbers 8, 8, 9, 9, 10. The total is 44 and the average is 8.8. We repeatedly replace this list of five numbers with another list, always with total 44, and at each stage the product is bigger. Also at each stage, at least one of the numbers is replaced with the average. Thus:

**8**, 8, 9, 9, **10**—>**8.8**, 8, 9, 9, **9.2**

We replaced the first 8 and the last 10. Now the product is bigger—why? What choice of *x*? What choices for *a*?

Keep going, replacing the two bold numbers by moving the closer one to *the average* 8.8 and moving the other one the same amount in the other direction:

8.8,** 8**, 9, 9,** 9.2**—>8.8, **8.4**, 9, 9, **8.8 **

8.8, **8.4, 9,** 9, 8.8—>8.8, **8.6, 8.8**, 9, 8.8

8.8, **8.6**, 8.8, **9**, 8.8—>8.8, **8.8**, 8.8, **8.8**, 8.8

It’s just crazy! If any of the numbers are other than average, we can compare it to the situation where all the numbers are average, in a small number of steps. It’s slick, it’s fun and it works.

**Remarks:**

1. We just had our last Putnam session at Norwich this semester. We solved a problem using the AM-GM and just for fun tried to prove AM-GM. This post came out of that discussion. Thanks to Wise D. and Jessica F. for their role in this discussion! I asked Prof. Rob P. afterwards if he knew of it and he directed me towards this article:

*An inductive proof of the AM-GM inequality*, by Kong-Ming Chong, *American Mathematical Monthly*, Vol. 83, No. 5 (May, 1976), p. 369.

As is often the case with mathematical expositions, the problem is that you need to be something of an expert in mathematics to even realize that the article is saying the same thing as our exposition in this post! So we hope you enjoyed it and can do something like this with your students.

2. Originally in our Putnam group, we used calculus, instead of the difference of squares formula. We considered two positive numbers with a fixed sum of *S*, trying to maximize the product by using the function *f(x)=x(S-x)*. The breakthrough came when we realized that it’s not just about maximizing *f*, but rather knowing the intervals of increase, since our “belief” in this post led us to increasing *f *rather than maximizing it. In this discussion I replace calculus with the most elementary idea possible, the difference of squares, to make it as clear, accessible and fun as possible.

3. “Fix the arithmetic average and make the geometric average as big as possible” means the same thing as “Keep the arithmetic average the same and make the geometric average as big as possible.” At the last minute I opted for the latter phrasing.

]]>**Mathematical induction-the champion of efficiency.**

*The beautiful idea:*

Induction is often used (and usually taught) to prove formulas that hold for all non-negative integers. For example, the sum of the first n non-negative powers of 2, is equal to one less than the next power of 2. If n=4 it looks like this:

1+2+4+8=16-1

Our next instructions are to add 16 to both sides:

1+2+4+8+16=(16+16)-1

and we see that the pattern continues!

Let’s do it one more time. Start with our newly discovered equation:

1+2+4+8+16=32-1

add 32 to both sides, and voila:

1+2+4+8+16+32=64-1.

It’s pretty neat, and also really clear from this kind of discussion…unless of course…

*Efficiency:*

Unless of course, our discussion only uses one equation, which corresponds to n=1, the so-called *base-case*, and you write:

1=2-1.

The base case is just too small to see what’s going on or to notice any patterns. Your students may even feel stupid writing this down, as if it’s a secret handshake and they really don’t know why it’s necessary.

You guessed it; this is the way it’s almost always taught. But wait!–it gets worse; the instructions for seeing how the “pattern” continues are too cryptic for many beginners to follow. They are hidden in the formal details of the so-called *inductive step*. So it’s a pedagogical lose-lose.

It’s done this way because the base case is the smallest example that will make the proof work both completely and rigorously. After all, a full proof would have to include the case where n=1, and a formal proof works just fine with that as our main example. So let’s be efficient and … actually let’s not.

*Focus on the beautiful idea*

The beautiful idea behind the principle of mathematical induction should be given priority. That’s the first part of this post, and the first thing to be communicated regarding mathematical induction. Formal proof-writing etiquette can come a little later at the university level, or not at all in elementary school.

(Yes—I’m suggesting that elementary teachers should introduce math induction!)

So let’s use n=4 and n=5 for our examples, then look at the inductive step, keeping in mind how they relate to these examples, and then rewrite the proof the next day.

In other words, let’s give up some of the efficiency requirement.

]]>*A little calculation *

“10 is the greatest common divisor, or greatest common factor, of the two numbers 130 and 50.” We may abbreviate this as:

gcd(130,50)=10.

We see this by inspection, by factoring our numbers. There is a harder way to find the answer, called the “Euclidean Algorithm.” It turns out that this method will usually be much easier for large numbers where factoring is difficult. Let us use this example to introduce the Euclidean Algorithm. Consider this little calculation:

130=2(50)+30 … gcd(130,50)=gcd(50,30) (why?)

50=(30)+20 … gcd(50,30)=gcd(30,20) (why?)

30=(20)+10 … gcd(30,20)=gcd(20,10) (why?)

and finally, gcd(20,10)=10, since 10 divides 20.

*What is the main pedagocial point?*

The question “why?” appears three times in the little calculation. If we’re clever, we can answer all three questions all at the same time, with a little lemma—and we should do this (if you don’t know the answer, try it!). The answer constitutes the central idea behind the Euclidean Algorithm. The pedagogical problem is that the required lemma is usually discussed first; so that the chain of equalities: gcd(130,50)=gcd(50,30)=gcd(30,20)=gcd(20,10) can be suppressed!

*Conclusion*

My suggestion to the teacher is to introduce the lemma only after a few examples have been worked by the student in this inefficient way. Thus, the lemma is an *eventual recognition of a pattern*—a celebration of just how useful it appears to be, rather than an efficient shortcut in order to avoid it.