149 points by sdenton4 a year ago
Axler seems to have won that battle. His textbook Linear Algebra Done Right is widely used at 308 universities including Berkeley, Stanford and MIT. He has a PDF available without proofs, videos, etc. 3blue1brown likes the book.
I suffered through determinants.
Axler's book is fantastic. But sadly Springer altered the typesetting on the 3rd edition. A really classic and clear LaTeX layout got turned into something much less clear. This freaked me out. Look inside and compare:
* Second edition: https://www.amazon.com/dp/0387982582
* Third edition: https://www.amazon.com/dp/3319307657
I wonder whether widespread adoption of his book pushed editors to make it look flashier and watered down. The contents are the same though.
Wow this is tragic. I'm guessing it serves whatever market that Springer has identified. But I'm not sure that's such a good thing: at some point the more details you add to the exposition the less clear it becomes. The reader needs to stand on their own two feet, especially in mathematics. Some people seem to be good at memorizing endless rules and details, so I can see this serving those people. But those are the people that can just follow the "determinants path" that this book was originally meant to disavow. Sigh.
Doesn't seem that bad, I just checked inside both and the content does look identical other than the visual style and some examples added in the 3rd.
This was one of my favorite books as a math undergraduate. I'm sad to see the highly legible and clear layout has been replaced by something so gratuitous and distracting.
For better or worse, the 3rd edition formatting and styling is something I've come to mentally associate with low-quality cash-grab big-lecture-hall tomes designed and written by committee over the course of a dozen editions. I wonder if I would have written off the book when I was a student if I'd seen it in such a form.
I looked: you are so, so right. The 3rd edition, with all the ugly colors and drop shadows, looks like a middle school textbook. If I ever buy this book, and chances are I will, I'm going to pick up a copy of the 2d edition. Thanks for the warning!
On this topic, I love older textbooks that read closer to prose than whatever passed for flashy textbooks (at least 10 years ago, I can only imagine things have gotten worse)
Along these lines, I prefer black and white matte rather than glossy color.
Indeed. I read this piece primarily as a primary source of recent math (education) history. From that perspective, it's very interesting.
Which course at Berkeley uses this textbook? When I took the class 2 years ago we were using something else.
It depends on the professor but Axler and Friedberg are both used for Math 110:
Determinants have a very basic intuition behind them: it's the stretch factor of the n-volume of a linearly transformed unit n-cube (area of a linearly transformed unit square in 2D, volume of a linearly transformed unit cube in 3D, etc.) Why would one want to banish them from linear algebra?
The author alludes to that by pointing out that they're needed for jacobians, which are essentially also stretch factors.
However, a nice geometric interpretation does not nice math make. If you remember their definition, the one with the sub-products and the alternating +/-es, and then imagine trying to prove that it has that geometric interpretation - you've arrived at a huge pain for undergrad math students that are just being introduced to matrices.
> then imagine trying to prove that it has that geometric interpretation
Exactly. We leaned heavily on determinants in my freshman linear algebra class but I went at least an additional year before I even heard the interpretation, much less could prove it from the standard terrible definition.
> However, a nice geometric interpretation does not nice math make.
I keep seeing proponents of the Clifford Algebra as a much more elegant type of maths that supersedes linear algebra and also makes it much more intuitive, but I haven't really found a clear source on it yet.
for me the definition is precisely the volume
Yes they are intuitive but it is pretty hard to define them rigorously in a way that captures that intuition. Also, every way I've seen to define them requires choosing a basis (or even an orthonormal basis), which isn't that nice. If you read the pdf you'd notice that Axler's proofs completely avoids relying on the fact that every finite-dimensional vector space has a basis, or choosing one.
Defining the determinant definitely does not require choosing a basis (though I don't know how you'd define it without at least knowing that a basis exists, for reasons you can see below).
Let's say you've got a linear transformation T:V->V, and suppose V is n-dimensional. Consider exterior powers of V, Λ^k(V); for each one we naturally get a linear transformation Λ^k(T):Λ^k(V)->Λ^k(V). In particular take k=n, so we get a linear transformation Λ^n(T):Λ^n(V)->Λ^n(V). But Λ^n(V) is one-dimensional, so Λ^n(T) must be multiplication by some constant factor. That factor is the determinant.
That's basically the most "natural" definition of the determinant (notice how multiplicativity immediately falls out of it, assuming of course you already know that Λ^k is a functor). You need the idea of a basis in order to (a) make sense of the statement "V is n-dimensional" and (b) prove that Λ^n(V) is 1-dimensional, but that's it, and you certainly never need to choose one in order to define the determinant.
Probably not a great definition for beginning students of linear algebra, but I do have to correct this idea that defining the determinant requires choosing a basis...
A completely different basis free definition, that works whenever you are working over the complex numbers or any other algebraically-closed field:
The determinant is uniquely determined by the following two axioms:
1. The determinant of the "multiply by lambda" operation on a one-dimensional vector space is lambda.
2. If you have a linear operator T on a vector space V, and a T-invariant subspace W, the determinant of T is the product of the determinant of T restricted to W and the determinant of the operator that T induces on the quotient space V / W.
This actually kinda intuitively meshes with the volume-stretching property: if you are stretching the volume by the factor lambda_1 along one subspace and by the factor lambda_2 along some complementary subspace, clearly the overall stretch factor is lambda_1 * lambda_2.
If you aren't working over an algebraically closed field, you can just tensor with the algebraic closure of whatever field you're working with and take the determinant there. There is also a way of adapting the definition so you don't have to do this, but it makes axiom (1) a bit more complicated.
Another bonus is this definition makes the Cayley-Hamilton theorem completely trivial.
Also, you can give an analogous definition of the trace if you replace multiplication with addition, and of the characteristic polynomial if you replace lambda in axiom 1 with x - lambda.
I think you get a basis of an n-dimensional vector space by the definition of dimension (if e.g. the dimension is the maximum size of a set of linearly independent vectors).
It's a bit trickier when the dimension is infinite, but again most definitions of dimension require there to be a basis of a particular size, the difficult part is proving that this makes sense (i.e. the dimension is unique and defined).
> It's a bit trickier when the dimension is infinite
For those who are wondering, 'every vector space has a basis' is equivalent to the Axiom of Choice.
I mean, in order to define dimension, we have to know that bases exist (and all have the same cardinality). Which means that before I can use the notion of "dimension", I have to know about bases.
I mean I suppose you could just restrict the definition of dimension to vector spaces that have bases, and then you wouldn't have to. I guess that's what you're implicitly suggesting, that dimension would just not apply to vector spaces that don't have bases. That would make sense. But I'm used to thinking of dimension as, well, a function of vector spaces, not a partial function, so as I was thinking of it, you have to prove they all have bases before you can use it!
Yes, it is easy to prove that "a basis exists" (almost by definition depending on how exactly you define dimension) and of course it's legal to use it in proofs, but I think proofs that avoid using this fact directly are more elegant
> almost by definition depending on how exactly you define dimension
What definition of _dimension_ is there that does not rely upon the existence of a base?
You could define the dimension to be the supremum of the cardinality of all linearly independent sets.
In the infinite case this does not trivially give you a basis as 1) the supremum could be strictly larger than the cardinality of all linearly independent sets and 2) adding an extra vector to an infinite linearly independent set doesn't increase it's cardinality, hence there is no reason for the basis to span the entire space.
You could also take the infimum of all sets that span the entire space, but you run into similar problems.
Does "maximum size of a linearly independent set" count? You would have to prove something about spanning sets to get to the existence of a basis
Ah cool, I had not seen this before. I stand corrected.
Are there any books that use this definition? I like this style of linear algebra.
And how do you define exterior powers without using a basis?
The definition of exterior power doesn't rely on a basis. The k'th exterior power is the quotient of the k'th tensor power by things of the form (v_1 ⊗ v_2 ... ⊗ v_k), where two of the v_i are equal. I'm assuming you know how to define a tensor product without resort to bases. If not, see: https://en.wikipedia.org/wiki/Tensor_product#Definition (it's not the clearest exposition of it, but it'll do)
(Or, if you like, you could define the tensor algebra, take a quotient of that to get the exterior algebra, and then restrict to the image of the k'th tensor power to get the k'th exterior power.)
Again, obviously you need to use bases to prove how to compute the dimension of an exterior power. But you don't need them just to define it.
I thought a bit more about this. Strictly speaking the construction of the tensor product that you link to does use a basis. This basis is the Product of the original two vector spaces. Also, the relations that you quotient by, this is a basis for a subspace. We didn't need a basis for the original spaces, but ended up using bases elsewhere.
You can define tensor products and quotient spaces without bases! (It's a bit of work though)
That's certainly true, but that's also clearly not what was being talked about. The problem was (easier) avoiding choosing bases to perform a construction, and (harder) avoiding assuming all vector spaces have bases. Using bases that you directly contruct isn't something anyone ever really has reason to avoid. (And really, neither is the second in the finite-dimensional case, but hey, may as well if you can, right?)
Right. It's kind of interesting how in all of these definitions the "easy" way involves using a basis. (Although we could argue about what the "easy" way is.)
If anyone's looking for a (high level) overview of linear algebra, I'd highly recommend 3blue1brown's video series: https://www.youtube.com/watch?v=kjBOesZCoqc&list=PLZHQObOWTQ...
It's mostly graphical, and is really helpful in forming and cementing an intuition for linear algebra.
I really hate the way that mathematics is taught in most school systems, it's all machinery with none of the beauty and underlying understanding. I was lucky enough to maintain an interest in mathematics sufficient to allow me to basically teach myself, with a heavy reliance on an underlying conceptual understanding. A lot of things like calculus and linear algebra are mostly just about understanding the basic concepts and then building up some experience working with them. There's no need to memorize gazillions of formulae or anything like that, you just need to actually know what you're doing when you do the work, but you will inevitably rely on tables and references except for very simple work.
Which is to say, I could not recommend 3blue1brown's videos more highly, they are an invaluable aid to learning linear algebra and actually helping you understand what is you're doing when you're doing these various operations to "solve problems".
Ran across these videos a few months ago; at one video a night (~15-20 minutes a day over two weeks), it gave me a better intuitive understanding of linear algebra than a full semester in college. These videos should be required before diving into the math in class, I'd've definitely done better had I seen these beforehand.
This is a great paper.
As a counterpoint, one place where determinants are incredibly useful is in Hartree-Fock theory, where they effective encode the Pauli exclusion principle (or anti-symmetry requirements) of atomic orbitals.
Also: Cross products.
I'm sure it's out there somewhere, but it would be an appropriate corollary to the OP if there was a "Down with Cross Products", which argues that multivectors and wedge products should be taught instead of cross products in multivariable calculus. Then, determinants are "the wedge product of N linearly independent vectors", and cross products are "the wedge product of 2 vectors in 3 dimensions", which gives a bivector and trivially encodes their pseudovector properties.
(Also, surface normals in integrals are bivectors, the 'i' of complex analysis is the bivector resulting from wedge product x^y, and e^(i theta) is the exponential map applied to the i operator, and (del wedge vector-function f) is the (bivector-valued) curl while (del wedge bivector-function g) is the (scalar valued) divergence (and that's why del(del(f)) = 0).)
(But differential forms should probably be omitted in a first course, because they get hairy quickly and are hard to wrap one's head around. It's enough to know that dxdy in integrals is actually dx^dy, and therefore the Jacobian appears when changing variables because of the factor that appears from dx'^dy' = dx'(x,y)^dy'(x,y).)
Which books would you recommend to learn this wedge product / differential forms approach to linear algebra and complex numbers?
Doesn't Spivak get into them?
Michael Spivak has written multiple books. Which one do you mean?
Calculus on Manifolds. I'd recommend Hubbard and Hubbard over that as it's a little easier read with the same material.
> I'd recommend Hubbard and Hubbard over that as it's a little easier read with the same material.
John Hamal Hubbard, Barbara Burke Hubbard - Vector Calculus, Linear Algebra and Differential Forms: A Unified Approach
Based on this article, I would venture to guess that Axler is also not a fan of the cross product, though, so he wouldn't feel that much is lost. Cross products are practically a vector-calculus hack that lack generality and risk obscuring intuition at the mathematical level.
One problem with cross product, it only exists in three dimensions.
No, it exists in dimension 3 and 7:
EDIT: Nore precisely: A common way to axiomatize the cross product yields a cross products exactly in dimension 3 and 7.
A wedge product exists in all dimensions but only in 3 & 7 is it identifiable with a unique vector.
I had the same uneasy feeling about determinants when I was studying linear algebra at the university. Years later I found Sheldon Axler’s “Linear Algebra done right”, and I loved it!
Maybe I need a little more handholding than the average linear algebra student, but if the language in that paper made any sense to me at all, I probably wouldn’t need any instruction on determinants.
Unfortunately I think the paper is written for an audience of other math lecturers, to convince them to not use determinants in their own classes, and not for beginning linear algebra students
> A complex number λ is called an eigenvalue of T if T −λI is not injective.
Uhm, what the intuition behind _that_?
It's another way of saying that T - λI is not a one-to-one map, meaning that (T - λI) is non-invertible.
So there are vectors x and y such that (T - λI)x = (T - λI)y for y ≠ x.
Which means that (T - λI)(x - y) = 0, and in general because of linearity every scalar multiple of (x - y) also maps to zero.
Letting (x - y) = v, we could get (T - λI)v = 0 ==> Tv = λv, which is perhaps a more familiar definition.
So, it's a neat way of expressing the concept, but I'm not sure what it buys you in terms of improving one's intuition.
You can think of it in these two steps:
1. λI stretches every vector (the whole space, really) by a factor λ.
2. Saying that the function is not injective means you lose information: when you apply it on some object and get a result, you can't trace back what was the original object, as there may be several. (There is no inverse function, then). In linear algebra, this only happens because there is some direction of space where all the vectors get collapsed to zero.
In short, T-λI collapses some line of vectors to zero.
So, when you took the effect of λI from T, you make it a lossy transformation in some direction. This means that _in that direction_ T had the effect of stretching all vectors by a factor of λ.
You gain some geometric understanding of T.
It is sort of intuitive, but the language may obscure it a little if you are not used to it.
> So, when you took the effect of λI from T,
If I understand right, you’re saying that there’s an interpretation in terms of the geometry of the T transformation, of subtracting this diagonal matrix from T. Multiplication of matrices is composition of transformations, I get that, but I’m not so sure what adddition/subtraction is.
Yes, that's right. Addition is just applying the transformations separately to the same vector and adding the result. So what this is saying is that if you apply λI to a vector in that particular direction, then there is nothing left to add to get the effect of T.
Ideally you would like to do this for all n directions of space, and that way you completely describe what T does in simpler terms: it just stretches things differently in different directions. It's not always possible though. The matrices that allow this are called diagonalizable and the process of finding the stretch factors (eigenvalues) is called diagonalization.
Just a caveat: if an eigenvalue is complex, the effect is not as simple as a stretch, but the interpretation is very similar.
Thanks very much. I'm glad I asked. Clearly, I had failed to really internalize what it means to be a linear operator!
It is equivalent to "A complex number λ is called an eigenvalue of T if Tx = λx for some nonzero x"
Thanks! Looking up wikipedia, "In mathematics, an injective function or injection or one-to-one function is a function that preserves distinctness". From that definition, "T −λI is not injective" means that there exists 2 vectors, y and z, such that (T −λI)y = (T −λI)z. Giving T(y - z) = λI(y - z). Via x = (y - z), I get to your (usual) definition.
The question I have is _why_ use the injective-based definition instead of the usual well-known one? Is there some further insight down the road?
With the injectivity-based definition, you can ask "how injective is it?" in various ways (eg, what is the dimension of the nullspace of T - λI; the Ax = λx definition is rather binary and corresponds to simply "T - λI has nullity > 0"). This is what is meant by "eigenvalue multiplicity" (there are actually at least two different inequivalent ways to measure this, geometric multiplicity and algebraic multiplicity).
Also for finite-dimensional vector space a lot of concepts are equivalent, eg "a linear operator L is invertible <=> L is injective", which are not in infinite-dimensional vector spaces (eg https://math.stackexchange.com/a/2447563/21437). Studying T - λI turns out to be more useful in this case (eg https://en.wikipedia.org/wiki/Decomposition_of_spectrum_(fun...)
The merit in using the purely functional language is in that it opens the door for application of powerful methods and concepts of Category Theory. (Note also that even the very notion of a function being injective can be defined in a purely functional manner, i.e. without any reference to sets or their elements.)
Linear Algebra is a great place to get exposed to some category-theoretical concepts (maybe not when first learning linear algebra, though). My linear algebra prof used the category-theoretical definitions of subspace and quotient space (eg S is a subspace of L if there exists an injective linear map from S to L, a quotient space if there is an injective linear map) and tensor product (universal property)
It’s equivalent to the usual definition, but spares you a few steps when proving the existence of eigenvalues. I.e. you only need to prove that T - hI maps some nonzero vector to zero.
Axlers book is lovely, but my (amateur) opinion is that determinants are pretty damn intuitive and useful in the applied world. They appear quite naturally in the systems of equations I have worked with.
Furthermore, some would argue that mathematics has lost its way as it becomes dedicated to abstraction alone.
I don't understand, can you give an example?
For most of the classical applications determinants are computationally terrible compared to factorization methods, e.g. for matrix inverse elimination is O(n^3) and Cramer's rule is something like O(n!).
I think it's false to equate determinants with "determinants computed by cofactor expansion". One can compute determinants efficiently through Gauss elimination, too.
Fair, but I'm still not aware of any practical applications for "systems of equations" like the person I responded to mentioned. If you know any please share.
The determinant intuition for me is the signed volume factor for a change of basis. I've seen the combinatorial lattice path application and I'm sure there are more in other fields.
But not much reason I can see to have them feature so prominently in an intro linear algebra class. Better to spend more time with SVD for instance, which wasn't even covered in the first linear algebra class I took.
Obligatory mention that geometric algebra, an alternative to linear algebra, doesn't need determinants while maintaining all the power.
Physicist here, don't worry if you don't understand that PDF, it is a pretty terrible explanation.
For one thing, it uses extensive mathematical jargon that won’t make any sense to beginners... or even to advanced students other than math majors.
> For one thing, it uses extensive mathematical jargon that won’t make any sense to beginners
In Germany this is the usual style for lectures for absolute beginners from 1st semester on - even commonly for people who don't major in math. This style is even not uncommon for 1st semester math lectures for student who don't major in mathematics or physics.
Hardly any faculty has a problem with this - they love it that the math departments weed out "unsuitable" students in their lectures so they don't have to
If you don't believe me and know a little German, here are two common German textbooks about linear algebra covering about 1.5 semesters of linear algebra for math majors:
- Gerd Fischer - Lineare Algebra: Eine Einführung für Studienanfänger (note the title "Linear Algebra: An introduction for freshmen" - I am really not kidding)
- Siegfried Bosch - Lineare Algebra
Even more: I know a lecturer from Hungary who had very direct words about how relaxing he considers the curriculum for math majors in Germany (he is used to a Sowjet-Russian-style-inspired math program).
Yeah, I think it's written for an audience of other math lecturers, to convince them to not use determinants in their own classes
I got lost in 2.2, I can't work out how applying the transformation leads to the result. Which is frustrating since it's the only non-trivial line in the proof, lol. Also, after applying the transformation, the author states that "a1(λ1 − λ2)(λ1 − λ3)...(λ1 − λm)v1 = 0" => "a1 = 0". But he never says why we know "λa != -λb for all a, b in 1..m" -- that seems non-obvious to me.
If v is an eigenvector of T with eigenvalue λ, then (T - bI)v = λv - bv = (λ - b)v. The image is a rescaling of v (and in particular has the same eigenvalue). Therefore
(T-λ_2) ... (T-λ_m) v_k =
(T-λ_2) ... (T-λ_(m-1)) (λ_k - λ_m) v_k =
(T-λ_2) ... (T-λ_(m-2)) (λ_k - λ_(m-1)) (λ_k - λ_m) v_k =
(λ_k-λ_2) ... (λ_k - λ_(m-1)) (λ_k - λ_m) v_k
The eigenvalues are all distinct by hypothesis: "Non-zero eigenvectors corresponding to distinct eigenvalues...".
Great explanation, thanks.
And the "distinct eigenvalues" part is obvious in hindsight. For some reason my brain thought that we were adding them, not subtracting.
There are more intuitive ways to explain all that. I saw once a webpage explaining all that with just graphics but I cannot find it anymore. If I find I will update this comment with it.
Is that what it's supposed to be?
I don't mind the votes, but just to be clear, this isn't snark. My understanding of this paper is that it's an appeal to other linear algebra educators, not a conceptual introduction.
It is not an appeal, all the stuff it is describing it is pretty simple and obvious but it is done in the most pedantic way.
I never understood this kind of racism against determinants.
They are very useful and intuitive, especially in 2D and 3D, where they represent areas and volumes. For example, they give an intuitive meaning to the notion of linear independence of 3 spatial vectors: they are independent when they span a non-zero volume.