The Strange Math That Predicts (Almost) Anything - Markov Chains

The Strange Math That Predicts (Almost) Anything - Markov Chains

Brief Summary

This video explores the history and applications of Markov chains, starting with a math feud in Russia between Andrey Markov and Pavel Nekrasov. Markov developed the concept of Markov chains to challenge Nekrasov's views on probability and free will. The video then transitions to the Manhattan Project, where Markov chains were used in the Monte Carlo method to simulate neutron behavior in nuclear fission. Later, it discusses how Google's PageRank algorithm used Markov chains to revolutionize search engine technology. The video also touches on predictive text and the memoryless property of Markov chains, ending with a discussion on how many shuffles are needed to randomize a deck of cards.

  • Andrey Markov created Markov Chains to disprove Pavel Nekrasov's views on probability and free will.
  • Markov Chains was used in Manhattan project.
  • Google's PageRank algorithm used Markov chains to revolutionize search engine technology.

The Law of Large Numbers

In the early 1900s, Russia was politically divided between Tsarists and socialists, which influenced even mathematicians like Pavel Nekrasov and Andrey Markov. Nekrasov, a religious man, argued that the law of large numbers implied free will by suggesting that consistent statistical averages in social behaviors indicated independent decisions. Markov, an atheist, criticized this view, arguing that math should not be linked to free will or religion. The law of large numbers, proven by Jacob Bernoulli, states that as the number of independent trials increases, the average outcome approaches the expected value.

What is a Markov Chain?

Markov challenged Nekrasov by demonstrating that dependent events could also follow the law of large numbers. He analyzed the poem "Eugene Onegin" to show that letters in text are dependent on each other. Markov created a predictive machine with states (vowels and consonants) and transition probabilities to simulate the text. This machine generated text that, despite its dependencies, still converged to the expected ratio of vowels to consonants, disproving Nekrasov's argument that convergence implies independence and thus free will. Markov's work introduced the concept of Markov chains, which allows probability calculations with dependent events.

Ulam and Solitaire

Markov's breakthrough was not immediately recognized for its practical applications. However, during the Manhattan Project, it played a crucial role. After contracting encephalitis, mathematician Stanislaw Ulam spent his recovery playing Solitaire. He wondered about the probability of winning a randomly shuffled game. Unable to solve this analytically due to the vast number of possible game arrangements (52 factorial), Ulam conceived the idea of approximating the answer by playing hundreds of games and counting the wins.

Nuclear Fission

Upon returning to Los Alamos, Ulam realized his Solitaire approach could be applied to simulating neutron behavior in a nuclear core. He shared this idea with John von Neumann, who recognized its potential but noted that neutron behavior is dependent, unlike the independent games of Solitaire. Von Neumann realized that Markov chains were needed to model the chain of events where each step influences the next.

The Monte Carlo Method

Ulam and von Neumann created a simplified Markov chain to model neutron behavior, with states representing a neutron traveling through the core. The neutron could either scatter, be absorbed, or cause fission, each with associated probabilities. They used the ENIAC, the first electronic computer, to run this chain repeatedly, tracking the average number of neutrons produced per run (the multiplication factor k). This method allowed them to approximate differential equations too complex to solve analytically, and it was named the Monte Carlo method, inspired by Ulam's uncle's gambling habits.

The first search engines

The Monte Carlo method quickly spread to other labs and was used to study nuclear reactor designs. In 1993, the internet became public, leading to an explosion of new web pages and a need for effective search engines. Early search engines like Yahoo, founded by Jerry Yang and David Filo, ranked pages by keyword frequency, which was easily manipulated. Masayoshi Son invested heavily in Yahoo, recognizing that marketing, rather than superior technology, would determine the leading search engine.

Google is born

Yahoo's keyword-based search was vulnerable to manipulation, necessitating a better way to rank pages by relevance and quality. Sergey Brin and Larry Page at Stanford developed PageRank, an algorithm that treats each link to a page as an endorsement. They modeled the web as a Markov chain, where web pages are states and links are transitions. By simulating a random surfer moving through the web, they could determine the relative importance of each page. To prevent surfers from getting stuck in loops, they introduced a damping factor, ensuring all parts of the web were explored.

How does predictive text work?

Google's PageRank provided superior search results, leading to its rise as the dominant search engine. In 1998, Brin and Page launched Google, initially called BackRub, using PageRank to analyze backlinks. The name was changed to Google, a misspelling of "googol," to reflect their ambition to index all the pages on the internet. Google's success is rooted in its focus and the effectiveness of its Markov chain-based algorithm. Claude Shannon later expanded on Markov's ideas by using letters and words to predict text, which is the base of predictive text algorithms used today.

Are Markov chains memoryless?

Modern large language models use attention mechanisms to weigh the importance of different tokens, improving prediction accuracy. However, concerns arise as AI-generated text becomes training data, potentially leading to repetitive and dull language models. Systems with feedback loops, like global warming, are difficult to model using Markov chains due to their non-memoryless nature. Despite these limitations, Markov chains are powerful tools for simplifying complex systems and making meaningful predictions by focusing on the current state.

How to perfectly shuffle a deck of cards

The video concludes by answering the question of how many shuffles are needed to randomize a deck of cards. It takes seven riffle shuffles to achieve a nearly random arrangement. The video then promotes Brilliant, a learning app that offers interactive lessons and challenges in math, physics, programming, and AI. Brilliant helps users understand the math behind complex problems, such as card shuffling, and offers a 30-day free trial and 20% off their annual premium subscription.

Share

Summarize Anything ! Download Summ App

Download on the Apple Store
Get it on Google Play
© 2024 Summ