GSoC 2015 Application Lokesh Sharma : Adding PDF's and PMF's - sympy/sympy GitHub Wiki

Title

SymPy: Extending SymPy Stats Module

About Me

Basic Information

Name: Lokesh Sharma

University: National Institute of Technology, Hamirpur

Degree: Bachelor of Technology (Undergraduate)

Major: Computer Science and Engineering

Contact Info:

Background and Programming Skills

I am first year undergraduate perusing Bachelors in Computer Science & Engineering. I am passionate about Artificial Intelligence and Data Science. I love the idea of making use of computation in solving real life problems.

I have taken courses that aims at making use of computers to model real life scenarios, creating a simulation and solving them. I am involved with Mozilla as its Student Ambassador. I am into competitive programming to some extent and have been given the role of Campus Ambassador of HackerEarth in our university.

I am also a great fan of Physics and spent a great deal of time studying Quantum Physics and Particle Physics to a certain depth. Because of my involvement in Physics, I was invited by Adventures Of The Mind for a week summit.

Courses Taken/Taking

  • 6.00.1x Introduction to Computer Science and Programming Using Python (from Edx)
  • 6.00.2x Introduction to Computational Thinking and Data Science (from Edx)
  • CS188.1x Artificial Intelligence (from Edx)
  • Mathematics for Computer Science (from currently enrolled university)
  • 6.041x Introduction to Probability - The Science of Uncertainty (from Edx)

The Idea

The ability to think probabilistically is a fundamental component of scientific literacy. Modelling uncertainty with PDF and PMF's (Probability Mass Functions and Probability Distribution Functions) can be quite useful in study of various parts of Physics and Maths. Many of the phenomenons which occur in Physics involve analysis of probabilistic distributions.

Analyzing such type of data can be truly useful in many ways and here Sympy can prove to be very useful. It can prove to be very useful to abstract many useful information if a platform exist for manipulation of these PDF's and PMF's.

Sympy already has a 'Stat' module which perform these functions and here are the things it currently doesn't have and I plan to implement those:

  1. To find PDF of sum of two random variables with their given PDF's.

  2. Returning Marginal PDF's from joint distributions.

  3. Finding covariance and correlation between two or more given random variables.

  4. Creating Derived Distributions.

  5. Enable plotting of PDF's and PMF's.

Motivation

Exponential Distribution

Let's say we have distribution of lifetime of a particle given by the following PDF:

PDF

And we might be interested in knowing its CDF. It can be computed as follows:

CDF

We might also be interested in its Expected value or its variance which can be captured by this nice formula:

Exp

and

var.

Stick-Breaking Example

Let's say we are given a joint distribution composed of two variables:

joint

And suppose we need to find the expected value of Y in this distribution.

We can proceed as follows:

First we need to find the Marginal PDF of Y from above joint PDF:

marginal

Now we have the PDF of Y and its simple enough to find its expected value like done below:

Exp

[The last example is adopted from stick breaking example. In this example, one is equally likely to break a stick of given length l at any point. Let's mark it x. Then we break the stick again from what is left of stick of length x at any point. Let's mark it y. Now we wish to know what's the expected value of y that is what is the length left over after we break the stick at two randomly selected points one at a time. Notice the answer would be 1/4 as predicted from expected value of marginal distribution of y calculated above.]

Conclusion

Examples like these keep occurring in physics and maths where we might need to deal with uncertainties and having these tools in Sympy can prove to be very fruitful. These situations are ever occurring and can be modeled computationally in a very nice manner. Further, the laws which govern PDF's and PMF's are well formulated, discrete and nicely captured in mathematical form and hence easy to implement. And thanks to current symbolic implementation in Sympy, it can inherit these features so that most of work is already accomplished.