I am trying to understand the internal working of the in command and index() of the list data structure.
When I say:
if something not in some_list :
print "do something"
Is it traversing the whole list internally, similar to a for loop or does it use, better approaches like hashtables etc.
Also the index() in lists, gives an error if the item is not present in the list. Is the working of both in and index() the same? If index() is better then is it possible to catch the error when an item is not present and if it is possible, is it good programming?
解决方案
Good question! Yes, both methods you mention will iterate the list, necessarily. Python does not use hashtables for lists because there is no restriction that the list elements are hashable.
If you know about "Big O" notation, the list data structure is designed for O(1) access by looking up a known index, e.g. my_list[13]. It is O(n) for membership testing.
There are other data structures which are optimised for O(1) speed for membership testing (i.e. __contains__), namely set and dict. These are implemented with hashtables.
Here is an example of how you can use IPython to verify the time-complexity of sets and lists, to confirm these claims:
In [1]: short_list, long_list = range(1000), range(10000)
In [2]: timeit 'potato' not in short_list
10000 loops, best of 3: 40.9 µs per loop
In [3]: timeit 'potato' not in long_list
1000 loops, best of 3: 440 µs per loop
In [4]: small_set, big_set = set(short_list), set(long_list)
In [5]: timeit 'potato' not in small_set
10000000 loops, best of 3: 72.9 ns per loop
In [6]: timeit 'potato' not in big_set
10000000 loops, best of 3: 84.5 ns per loop