GSoC 2014 Application Shipra Banga Building the New Assumptions Module - sympy/sympy GitHub Wiki

Introducing Myself

I am Shipra Banga and I am pursuing Computer Science Engineering in International Institute Of Information Technology Hyderabad. I am currently in my Sophomore year.

Email : [email protected] , IRC nick : shippy

Github : shiprabanga

Programming Experience

I am about to finish my second year in IIIT-Hyderabad . Coding and development on versatile platforms constitutes the major half of my programming experience . Something I looked into beyond this was Image Processing and Language Processing.
I have quite a lot of experience with python and its multi-faceted libraries. A few I worked with and found interesting are kivy,matplotlib,sympy,numpy,nltk,etc . I have previous experience with SVN and am in the process of getting acquainted with all the features git has to offer.

My Project Idea

The assumptions module is a core part of the CAS system. It basically performs two features . One being that given an expression it records the properties of that expression . Also , it help in inferring results out of the known facts. For instance, given x is positive we can infer that 3*x + 2 will also be positive. Sympy alread has an assumptions module in existence , however this module needs to be wiped out due to the following issues:

Limited expressiveness :

For instance, It's impossible to write a handler that says "x is zero or y is zero, but we don't know which one when x*y is 0".

Hard to extend
Tightly coupled to the core
Complicated working
Complicated organisation structure

Hence there is a need to wipe out the old assumptions system and bring in the new one. Work is already in process on the new assump module that aims to make the entire computation easy,clean and easier to understand. This new-assump module can be extended to include a vast arena of facts enhancing the functionality of this module. Also, there is a need to organise the work in a way to minimise coupling and overheads. In the past , there have been various attempts on wiping out the existing old assumptions system . I plan to use my free time in the summer in finally completing this major work.

Contribution

Contribution To New-Assumptions Module

Merged

New Fact added Real*Imaginary = Imaginary

All Args are real implies that multiplication and addition of args is also real

Tentative Schedule

First Part : Solving Performance Issues

week 1 : Restructuring the code in the form of modules for better understanding

week 2 : Organisation of the code to reduce coupling with core

week 3+4 : Organisation of the fact registry and how its accessed in order to reduce time - overhead

week 5+6 :Implement a caching system

Second Part : Adding Facts

week 7 : Implementation of Odd_no_of_Args() and Even_no_of_Args()

week 8 : Work on exponential of imaginary bases

( Relational queries if time permits)

Third Part : Making the final switch

week 10 : Getting out redundant code in old assumptions module which is not referred

week 11 + 12 + 13 : Refactoring the dependent modules to work on the new-assump module step-by-step

The End

week 14 : Documentation and Evaluation

Implementation Details :

The main aim of this project is to build the new-assump module , replace the old assumptions module with this without affecting the functionality of the system.

The complete replacements of the existent assumptions module in three months time, I realise , is a mammoth task . But I will aim towards this and would continue work on this even after the GSOC period comes to an end. Firstly , I would like to explain how the current new-assump system works and then progress to implementing essential changes in it.

The new-assump ( work in progress at #2508) module currently uses a system of predicate and args to evaluate the predicate on the arguments and return True or False . This makes computation easier and also makes extending it possible.

Generally how this works is from the entire set of facts that are known , by forward chaining on facts we obtain more facts. For instance, X is an integer >> X is real

The current implementation basically includes three major parts :

1.The newask System : Getting all relevant facts from the fact registry

def newask(proposition, assumptions=True, context=global_assumptions,
    use_known_facts=True, iterations=oo):
    relevant_facts = get_all_relevant_facts(proposition, assumptions, context,
        use_known_facts=use_known_facts, iterations=iterations)

    can_be_true = satisfiable(And(proposition, assumptions,
        relevant_facts, *context))
    can_be_false = satisfiable(And(~proposition, assumptions,
        relevant_facts, *context))

    if can_be_true and can_be_false:
        return None

    if can_be_true and not can_be_false:
        return True

    if not can_be_true and can_be_false:
        return False

    if not can_be_true and not can_be_false:
        raise ValueError("Inconsistent assumptions")

2.The Handler :

 arg = _sympify(arg)
        predicates = arg.atoms(Predicate)
        applied_predicates = arg.atoms(AppliedPredicate)
        if predicates and applied_predicates:
            raise ValueError("arg must be either completely free or singly applied")
        if not applied_predicates:
            obj = BooleanFunction.__new__(cls, arg)
            obj.pred = arg
            obj.expr = None
            return obj
        predicate_args = set([pred.args[0] for pred in applied_predicates])
        if len(predicate_args) > 1:
            raise ValueError("The AppliedPredicates in arg must be applied to a single expression.")
        obj = BooleanFunction.__new__(cls, arg)
        obj.expr = predicate_args.pop()
        obj.pred = arg.xreplace(Transform(lambda e: e.func, lambda e:
            isinstance(e, AppliedPredicate)))
        applied = obj.apply()
        if applied is None:
            return obj
        return applied

    def apply(self):
        return

3.The Fact Registry :

(Add, Implies(AllArgs(Q.positive), Q.positive)),
    (Mul, Implies(AllArgs(Q.positive), Q.positive)),
    (Mul, Implies(AllArgs(Q.commutative), Q.commutative)),
    (Mul, Implies(AllArgs(Q.real), Q.commutative)),
    # This one can still be made easier to read. I think we need basic pattern
    # matching, so that we can just write Equivalent(Q.zero(x**y), Q.zero(x) & Q.positive(y))
    (Pow, CustomLambda(lambda power: Equivalent(Q.zero(power), Q.zero(power.base) & Q.positive(power.exp)))),
    (Integer, CheckIsPrime(Q.prime)),
    # Implicitly assumes Mul has more than one arg
    # Would be AllArgs(Q.prime | Q.composite) except 1 is composite
    (Mul, Implies(AllArgs(Q.prime), ~Q.prime)),
    # More advanced prime assumptions will require inequalities, as 1 provides
    # a corner case.
    (Mul, Implies(AllArgs(Q.imaginary | Q.real), Implies(ExactlyOneArg(Q.imaginary), Q.imaginary))),
    (Mul, Implies(AllArgs(Q.real), Q.real)),
    (Add, Implies(AllArgs(Q.real), Q.real))

So the newask system obtains the query and calls for getting the relevant facts and evaluate the query.
The handler handles the queries in the form of predicate and args.
The fact registry is used to get all relevant facts

Issues with This Implementation :

Organisation
Scalability
Limited to a certain number of facts right now.

I will hope to solve all these issues in my project .

First Part : Improving the Scalability of The Module And Organisational Aspects

Organisation in Form Of Modules : The current code is scattered and all the handlers for every functionality are defined together in one single code . The first essential step is to organise the code in the form of modules in the best way possible . My thought on this was that basically the modules in the existent system can be differentiated on a major quality :
A few like sets,ntheory are highly coupled with the core module
Some like matrices are not dependent on the core module

So this major difference can form the basis for organisation . The plan would be to group all the modules that call on core together and keep the independent modules seperate.

Calling The Fact Registry :

The current new-assump system is based on building a fact registry that is called on extensively when querying. A major method to let down the time in solving queries would be to implement this fact registry in an organised fashion. For instance, my query is

True```

This query should call on the facts related with Mul and Q.positive . This would highly reduce the time taken to evaluate a query and increase scalability.
 
* Caching System :

There should be a way to memorize the query results of all the recent queries in the assumptions modules so that when a query is repeatedly asked , the cycle of forward chaining does not happen again and again , and instead the cache is directly referred to for providing the query solution.
So, a highly integral and complicated part of the work would be to come up with a local efficient caching system and looking at the possibility of extending it to other modules.

### Second Part : Thinking About Adding New and Interesting Facts

After the organisational bit is done ,  the part corresponding to implementing interesting and challenging facts comes to mind. On a few of them work is already in progress . Some of them are :

* The existing code base implements this :
`Implies(AllArgs(Q.imaginary | Q.real), Implies(ExactlyOneArg(Q.imaginary), Q.imaginary))`
 
We need something of the form : 

`Implies(AllArgs(Q.imaginary | Q.real), Implies(OddArgs(Q.imaginary), Q.imaginary))` 

For this OddArgs needs to be implemented which basically tells us that the no. of a=imaginary args in the expression are odd or even .For this , one way would be to use xor and look for all the possible CNF forms we get or use solvers for this . 

* Another fact which interested me on which work is also going on is exponential of imaginary bases.Work is already in progress on this fact . There are several issues and also ambiguities which sprouted out when working on this fact . One of them is the ambiguity on the whole '0' aspect : Is it imaginary or real? I can look to solving all these problems if time permits.

* Again, if time permits , another fact which is to be implemented are the relational facts.

`>>> newask(Q.greater_than(x,2),Q.positive(x))`

`True`

### Third Part : Making The Final Switch
I am dubious as to the fact if I will be able to finish the switch in my GSOC period but I will continue my work even after GSOC gets over . Making the final switch is a mammoth task and needs work on a lot of modules in the sympy repository so that they become compatible with the new-assump patch.
So, in order to address this mammoth task, lets start small!!

* There are many aspects of the old assumptions module that aren't being used . The first step could be to clean up all the redundant code that is there.
* Also, simple replacements can be done in the entire code one by one instead of making the switch in one big step.

I believe , step by step , if an issue is addressed any huge challenging task can be finished and thats what I plan to do .

# Extended Research 

Another thing that I plan on is researching on better methods to replace the current methods of knowledge base representation and querying structures. I looked into the Mathematical Knowledge Management Literature and read papers on Interactive Theorem Proving , Deductions and Ontological Engineering . One suggested model that can be implemented :

* Representation :  The knowledge base can be depicted in the form of simple conceptual graphs that depict relations between the entities involved.
  For instance, 

  A ----> Between----->B&C--->Belongs_to---->StraightLine D

  Here A,B,C,D are entities and Between and Belongsto are the relationships.So our facts can be represented in this form.

* Query Answering : Query answering involves  verifying whether an assertion is true in a particular situation. Within the conceptual graph model, this can be done  by taking  the projection of a graph representing a query on a graph representing a particular geometric scene.
For instance here, If the Query is whether 3 points that are aligned exist , we can make a conceptual graph to represent this query and take a projection of this on our initial model.

This is a very general and a theoretical idea and I will work hard on implementing this and integrating it with the current working after discussing it with my mentor.

 
# References  
1. Tom Bachmann's Proposal

(https://github.com/sympy/sympy/wiki/GSoC-2013-Application-Tom-Bachmann%3A-Removing-the-old-assumptions-module)

2. Work going on at #2508

(https://github.com/sympy/sympy/pull/2508)

3. The old assumptions module

(https://github.com/sympy/sympy/tree/master/sympy/assumptions)

# Disclaimers

I will devote my full time and energy in order to accomplish what I propose here.I will continue my work on my project even after GSOC ends in order to complete what I proposed here if I am not able to finish the task .