Sunday 30 March 2014

Sorting and Efficiency

     When writing code, there are many factors to consider about the data one is working with. One important thing to consider is whether the order matters. When storing a list of items, it would seem that the most efficient data type to use would be pythons builtin type list. If however, one wants specific keys instead of just indexes, a dictionary comes in handy. This is also an efficient way to store data except the items are unordered. This is where sorting comes in.

     There are lots of times when one needs to sort the data in a list in order for the data to be useful. Pythons builtin sort, Timsort, is the most efficient sorting algorithm. It is a hybrid of merge sort and quick sort, and is O(n logn). This seems to be the optimal efficiency of a sorting algorithm and is often achieved by repeatedly splitting the list into half, causing the logn, and each individual step being linear(causing n). This type of behaviour is displayed in both quick and merge sort, except merge sort is less efficient than quick sort because it takes extra storage to split the lists and then merge them together.

     There are many different sorting algorithms used in python, some better than others. The worst efficiency to have is O(n!), followed by O(n^n). The hierarchy as listed in lecture is as follows:


O(1) <= O(lgn) <= O(n) <= O(n^2) <= O(n^3) <= O(2^n) <= O(n^n)

    The worst sorting algorithm (in terms of efficiency) we've studied is bubble sort. It is O(n^2). Bubble sort starts at the beginning of the list and bubbles the item up the list until it reaches an item bigger than it. Eventually it bubbles the largest item to the end of the list and the list is sorted. Although selection and insertion sort are a little bit faster, they are also O(n^2) and have algorithms that involve iterating through n-1 items and making n comparisons. However, this varies for some lists. For example,  since there are less exchanges in selection sort, in the list below, bubble sort makes 20 exchanges and selection only makes 8 (http://interactivepython.org/courselib/static/pythonds/SortSearch/sorting.html)

[54,26,93,17,77,31,44,55,20]

The other approach to sorting is often called "divide and conquer". This involves repeatedly dividing the list into half (log n splits) and sorting it through either recursively merging the sorted lists back together (merge sort) or using a pivot and partition process (quick sort). These sorting algorithms perform at O(n logn) even for large values of n, but are limited to n <1000 because of pythons maximum recursion depth (which is 1000). The most efficient sort, as stated at the beginning, is Timsort.

If theres one thing I've learned from this course, its to always stick to the builtins. Even if you think you can write something that works really fast and it seems to work, chances are there is something you didn't consider and your code is too specialized. This is definitely applicable to sorting algorithms, as pythons sort is both efficient and general enough because it works well in all cases. Although understanding the behaviour of sorting algorithms and how to achieve n logn was difficult at first, once I learned that the divide and conquer method uses ln n approach, it became simpler to understand. Sorting is a very important thing to understand when using python, and can be much more complex than someone who doesn't study computer science would expect.

Thursday 20 March 2014

Epiphany!

As I was going through the slogs of some of my classmates, I realized I'm not the only person who feels this way:

A lot of times with complex examples of recursion I will end up writing out the whole code, testing it, and not even fully understand what I've written until after the fact. Not necessarily through trial and error, but just by writing code for less complex cases based on what I think works, and the realizing it works for examples that are more complex than I can even process. I often find myself having to correct my expected output in docstrings for complicated test cases. As phrased ever so accurately on c3reiner's slog: http://c3reiner.blogspot.ca/  under the post about recursion woes: "it's interesting to see that something I programmed seems to understand how to solve the problem better than I do."

A2

This post is not really about any theories or coding methods we learned in class, its mostly about the stress of assignment 2. Firstly when going through part 1, I clearly read the instructions wrong and feverishly began developing a series of classes whos __eq__ methods were for the string matching of regexes. After realizing my mistake, I was able to finish part 1, just in time for the extended deadline. Then I was greeted with exercise 3, which was also no walk in the park. Finally I was able to create a successful submission. Then, bombarded with midterms and other work I began to tackle part 2. After being stuck on is_regex for a couple of days, I rediscovered the magic of helper functions.

The take home message from my struggle through this assignment was that it really helps to write things out and break them up into parts before writing any code. Sometimes thinking about the task at hand before coding seems too abstract so what I like to do is start writing the code and think through comments. And then once I get stuck thats when I pull out the paper and pen and really break down what the problem is, and why I can't get to where I want to be. Thats how I tackled finding the index of the root of regex strings. Once I realized I had no idea how to do this on my own, off to the help centre I went! haha

Saturday 8 March 2014

LinkedList

Linked lists can be thought about in two ways, as a list of an item and the rest of the list, or as objects which reference the next object in the list. They introduce a new way of creating, viewing, and mutating a list. For example in a traditional list you can use methods such as pop or append mutate the end of the list. For linked lists however, theres no pop function and one can't simply get rid of an item. In a method called "delete" from lecture, we discovered how to do something similar to pop. When comparing linked lists with a python list, the delete method is more like reassigning a list. For example:

>>> lst = [1, 2, 3]
>>> lst
[1, 2, 3]
>>> lst = lst[ 1:]
>>> lst
[2, 3]

Since we named both lists "lst", lst now points to the new list [2, 3]

In a similar way, to delete the first item of  a LinkedList, we make its attribute front point to the second item instead of the first item, and make size = size - 1. so similar to the new lst, we don't know where the first item went, but it doesn't matter.

So far, linked lists are interesting but I think they are overcomplicated lists and can't see the benefit of them yet. Hopefully after working with them more my opinion will change.

Sunday 16 February 2014

Recursion

The key to writing recursive code is understanding that recursion is "defining something in terms of itself." So the best way to approach recursive code is by finding the base case of what you're trying to do and writing code for that. From then on, the trick is to just have the function call itself repeatedly until it reaches the base case; therefore the only thing the code actually does is whatever is in the base case.

Throughout the weeks of learning recursion it seemed simple enough. Tracing through codes was tedious but eventually a pattern could be found that convinced me that the code was just repeating the base case multiple times. But actually writing recursive code was a whole different ball game!

While doing A1, and reading through the notes on recursion, I learned that there are two ways of viewing recursion. The low-level operational view, and the high-level view. For example, step 5 of the assignment, writing a recursive method of solving the tours of anne hoy game for n number of cheeses on 4 stools, can be represented by both views.
The low level view can be represented by breaking it into steps thereby eliminating recursion. This would involve writing code for moving 1 cheese, and then calling that to move 2 cheeses in a new function, then calling 2 cheeses to move 3 cheeses etc. This method would be impossible to write for an infinite amount of cheeses though. So the objective was to use the high-level view which involved "Taking a leap of faith." This is where you write code for 1 cheese and convince yourself it works for 1 cheese, within the main function "tour_of_four_stools." And then you can call this function within itself for any number of cheeses larger than 1. For n number of cheeses, by calling this function within itself on n-1, whenever n > 1, eventually n =1, and at that point the code you wrote for n=1 is executed. This happens repeatedly until n amount of cheeses are moved, and the puzzle is solved!

Although recursion seems complicated it is actually a very effective and useful way of simplifying a situation into smaller parts and makes solving coding problems much easier.

Monday 3 February 2014

Import Statements

After spending an entire week buried in my psychology textbook, cramming 4 weeks of work into 6 days, I realized on Wednesday night that Exercise 2 was due on Thursday..good thing I realized when I did. Anyways, I took a break from psych and tackled the assignment. I finished it pretty quickly and felt confident about my code, but after looking at the autotester results the next day, I was in a state of panic.

I had no clue what was wrong with the code so I went to office hours for guidance. In my attempt to finish the exercise as fast as possible, I missed out on important details in the handout, the key one being to import e2a using the format from e2a import xyz. I had no idea the format of an import statement had such a big impact on code, not to mention I'd never seen that format before. That format also came with other small changes such as no longer having to call functions from e2a using e2a.x, they can just be called x. 

If only I had learned all these useful coding strategies before the deadline....
lesson: never leave things until the last minute...speaking of which I should probably start A1 hahaha

Object Oriented Programming continued...

Regarding my question from last week, after re-analyzing the code we looked at in lecture, I realized the problem I had was more with exceptions than classes. When using the format try and except, when an exception is raised in the try clause, the exception clause goes in order as its written. So if certain exceptions are subclasses of other exceptions, they get caught as the superclass exception first and then stop checking the rest of the except clauses. For example, in the code below the special exception would be caught as an Exception and the code that catches it as a SpecialException would never be reached.

class SpecialException (Exception):
     pass

if __name__ == "__main__":
     try: 
          raise SpecialException
     except Exception as e:
          print("caught as Exception")
     except SpecialException as se:
          print("caught as SpecialException")

This is why its very important to keep in mind the order of exceptions and write them in the order of most specific to most general (where specific refers to a subclass and general is a superclass).