Popping the Hood
Lots of things appear to be magic: Computers, the Banach-Tarski Paradox, cars, the phenomenal success of companies like Facebook, airplanes, the Internet, the Central Limit Theorem, and the fact that...
View ArticleFinding Isomorphisms Between Finite Groups
One of the most interesting problems I came across as I was building my Abstract Algebra package was that of finding an isomorphism between two finite groups G and H, represented by their Cayley...
View ArticleHow My Chess Engine Works
I have always had a lot of respect for chess, despite the fact that I'm not very good at it myself. As I learned more about the game, I also heard about the successes of computer chess AIs, in...
View ArticleCSS Gotchas
I'm still learning html and css. As I debug simple websites I write, (including this one), I've encountered a lot of behavior that seemed very counterintuitive to me. I thought I'd share some of this...
View ArticleIncompetence as Encouragement
A few weeks ago, I was looking for a good python package that could convert user-written strings into numbers. I thought for sure a ton of people would have done this, but I was actually unable to find...
View ArticleBeing the First
I was at the climbing gym a few weeks ago with a friend of mine. We were just messing around, making up with bouldering problems for ourselves to do. One of the problems my friend came up with was a...
View ArticleDon't Trust Asymptotics: Part I
Suppose I give you a sequence of real numbers $x_n$, and tell you that $\lim_n x_n = \infty$. What can you tell me about $x_{100}$? How about $x_{1,000,000}$?
View ArticleGoldbach's Conjecture and Coding Length
Goldbach's conjecture is that every even integer greater or equal to four can be written as the sum of two prime numbers. (Try it: $4 = 2 + 2$, $6 = 3 + 3$, $8 = 3 + 5$, $10 = 3 + 7$, $12 = 5 + 7$...)...
View ArticleThe Hottest Person in the Group
Suppose you think you're pretty good-looking. In fact, you think you're at about the 90th percentile--you're hotter than 90% of people and not as hot as the other 10%. If I put you with another nine...
View ArticleMemory Locality and Python Objects
I've been obsessed with sorting over the last few weeks as I write a python C-extension implementing a lazily-sorted list. Among the many algorithms that this lazily-sorted list implements is...
View ArticleVisualizing the James-Stein Estimator
In the words of one of my professors, "Stein's Paradox may very well be the most significant result in Mathematical Statistics since World War II." The problem is this: You observe $X_1, \ldots, X_n...
View ArticleMarkov Chain Implications Graph
I've been studying for quals for the last several weeks. Today, I was reviewing basic Markov Chain theory, and decided to understand it by drawing a graph of various statements about Markov Chains....
View ArticleMartingale Implications Graph
Here's another directed graph of statement implications that I used to study for quals. This one is about convergence of stochastic processes with martingales. Like my Markov Chain Implications Graph,...
View ArticleThe Zero Times Infinity Problem
There are two ways to keep yourself safe while rock climbing: The first option is to protect yourself carefully with ropes and gear, so that if you fall you won't fall too far or hard. The second...
View ArticleThe Ten Best Ideas in Statistics
I've been studying Statistics for six years now, seriously for the last four years, and as my main focus for the last three. Now that I've finished the core PhD curriculum at Stanford, I've spent some...
View ArticleThe LaTex Numbers
Let's define the LaTex numbers to be the set of all real numbers that can be unambiguously expressed with the LaTex type system. This set of numbers has a few fun properties, not least of which, as...
View ArticleHow I Got 2x Speedup with One Line of Code
If you had asked me whether or not it was possible to get a 2x speedup for my LazySorted project by adding a single line of code, I would have told you "No way, substantial speedups can really only...
View ArticleVisualizing K-Means Clustering
Suppose you plotted the screen width and height of all the devices accessing this website. You'd probably find that the points form three clumps: one clump with small dimensions, (smartphones), one...
View ArticleHow to Solve Problems
I spend a lot of my time solving problems: I solve well-defined math problems in grad school, open-ended problems in statistical consulting, physical puzzles when I'm rock climbing, architectural...
View ArticleA Statistical Analysis of Climbing
Recently, 28 of us on the Stanford Climbing Team completed a short survey on our climbing abilities. Although the survey was intended to assess our interest in different clinics, the answers to the...
View ArticleDon't Double Major
Double majoring in college is a very suboptimal strategy. The reason is simple: It adds a substantial set of constraints to the courses you can take, but in return gives you only a very modest extra...
View ArticleA College Waitlisting Model
Suppose a selective college wants $N_0$ students in their freshman class. How many students should they admit, and what's the distribution of the number of students they'll admit off the waitlist? Of...
View ArticleCollege Interview Tips
The college admissions interview is a valuable component of college applications because it provides admissions officers with a holistic evaluation from a source that has no vested interest in your...
View ArticleVisualizing Lasso Polytope Geometry
Some recent research about the lasso exploits a beautiful geometric picture: Suppose you fix the design matrix X and the regularization parameter $\lambda$. For a particular value of y, the...
View ArticleSensitivity of Independence Assumptions
Recently I was considering an interesting problem: Several people interview a potential job candidate, and each of them scores that candidate numerically on some scale. What's the variation associated...
View ArticlePython Subclass Relationships Aren't Transitive
Subclass relationships are not transitive in Python. That is, if A is a subclass of B, and B is a subclass of C, it is not necessarily true that A is a subclass of C. The reason for this is that with...
View ArticleRobust Machine Learning
Real data often has incorrect values in it. Origins of incorrect data include programmer errors, ("oops, we're double counting!"), surprise API changes, (a function used to return proportions, suddenly...
View ArticleT-Tests Aren't Monotonic
R. A. Fisher and Karl Pearson play a heated round of golf. Being Statisticians, they agree before the round to run a two-sided paired T-test to see if either of them is statistically significantly...
View ArticleHow to Forge an Email
Most people don't realize how easy it is to forge an email. Say my brother John Doe uses the email address john.doe@example.com. If I get an email from that address, it's natural to assume that John...
View ArticleHalf the Decimal Trick
If something happened 1,234 out of 10,000 times, we'd estimate that the true probability of occurence is about 0.1234. Of course, we wouldn't expect the true probability to be exactly 0.1234, and to...
View ArticleYou Can't Predict Small Counts
A small restaurant is interested in predicting how many customers will come in on a given night. This is valuable information to know ahead of time, for example, so that the restaurant can figure out...
View ArticleVisualizing DBSCAN Clustering
A previous post covered clustering with the k-means algorithm. In this post, we consider a fundamentally different, density-based approach called DBSCAN. In contrast to k-means, which modeled clusters...
View ArticleMachine Learning over JSON
Supervised machine learning is the problem of approximating functions X -> Y from many example (x, y) pairs. Now, the vast majority of supervised learning algorithms assume that X is p-dimensional...
View ArticleOHMS Lessons Learned
Note: I found the following post as an almost complete draft as I was reading some of my unpublished posts. I wrote it around October 1st, 2013, at the beginning of what would end up being my last year...
View ArticleDesperation Motivated Creativity
I am not the strongest climber. Some of the people I've climbed with are so strong that they can do a one-arm pull-up, and then--while locking off with one arm--sing the "Head, Shoulders, Knees and...
View ArticleWhy I'm Making Tauthon
For the past two months I've been spending half my time on Tauthon. Tauthon is a backwards-compatible Python interpreter that runs Python 2 code and C-extensions exactly as-is, while also allowing...
View ArticleAn Easy Chess Puzzle
I was looking through Markovian, my old chess engine, recently, and came across the first game it won against another chess engine. Stepping through the game, it seems that both engines actually played...
View ArticleContinuous Time Lending
Assume a borrower takes out an installment loan of size $1$ and makes continuous-time payments on it. The installment loan starts at time $0$, ends at time $T$, and has an interest rate of $r$,...
View ArticleDay-to-Day Operations of Palo Alto
Palo Alto runs a pretty open city government, with a number of interesting documents available for download on their website. Of particular interest are their annual budgets and annual financial...
View ArticleImplementing "nonlocal" in Tauthon: Part I
Tauthon is a fork of Python 2.7 with syntax, builtins, and libraries backported from Python 3. It aspires to be able to run all valid Python 2 and 3 code. In this article, I begin discussing how I was...
View ArticleNontrivial: Exception Handling in Python
Most code can fail in multiple places and for multiple reasons. Handling these failures seems pretty trivial, something you'd cover in the basic tutorial to your programming language. Actually, I think...
View ArticleLogistic Regression Isn't Interpretable
Suppose two events A and B are independent, with the odds of A occurring being 4, and the odds of B being 5. What are the odds of both A and B occurring? I'll give you a hint: it's not 20.
View ArticleThe Communication Loss Function
My first ever task writing software professionally was to make some small change to the Kaggle server. I spent a day or so following painstakingly moving down the call stack from the API endpoint to...
View ArticleHypothesis Tests for Machine Learning
Statisticians have spent a lot of time attempting to do complicated inference for various machine learning models. In fact, there's an enormously simple and naive way to do this in complete generality:...
View ArticleStyle for Python Multiline If-Statements
PEP 8 gives a number of acceptable ways of handling multiple line if-statements in Python. But to be honest, most of the styles I've seen--even those that conform with the PEP--seem ugly and hard to...
View ArticleMap Transformation
Note: I was describing my map transformation project to Sasha Trubetskoy recently, who is even more into maps than I am. This is a project I completed in March 2011 with Shir Yehoshua for a CS class,...
View ArticleWhy I've Liked Chess So Much
I've liked chess off and on for twenty years. I don't remember learning to play but I do remember an after school class and losing repeatedly to my dad and cousin when I was little. When I taught...
View ArticleGross Receipts Taxes as Payroll Taxes
A common critique of gross receipts taxes (where the government charges a small fixed percent of a business's total revenue) is that they have a disproportionate impact on low margin businesses. (For...
View Article