A common sort of script that deals with a filesystem is to open each file in a directory hierarchy with a given path and do something to its contents. For example, let's write a program that prints out a list of all Python modules (with a .py extension) in a tree which contain shebang lines.
Here's the script using good old os.path:
import sys
import os
def os_shebangs(pathname):
for dirpath, dirnames, filenames in os.walk(pathname):
for filename in filenames:
fullpath = os.path.join(dirpath, filename)
if (fullpath.endswith(".py") and
file(fullpath, "rb").readline().startswith("#!")):
yield fullpath
def os_show_shebangs(pathname):
for path in os_shebangs(pathname):
sys.stdout.write("%s: %s\n" % (
path,
file(path, "rb").readline()[2:].strip()))
if __name__ == '__main__':
os_show_shebangs(sys.argv[1])
Pretty normal looking python code; not too much wrong with it. At 20 lines and 596 characters long, it's not too complex.
Now let's have a look at a similarly idiomatic version using FilePath:
At 18 lines and 471 characters, it's almost exactly 20% smaller than the version that uses os.path. However, a small space savings is hardly the most interesting property of this code. The advantages over the version that uses os.path:import sys
from twisted.python.filepath import FilePath
def shebangs(path):
for p in path.walk():
if (p.basename().endswith(".py") and
p.open().readline().startswith("#!")):
yield p
def showShebangs(pathobj):
for path in shebangs(pathobj):
sys.stdout.write("%s: %s\n" % (
path.path,
path.open().readline()[2:].strip()))
if __name__ == '__main__':
showShebangs(FilePath(sys.argv[1]))
- It's easier to test. You can use a fake FilePath object rather than needing to replace the whole "os" module and the "file" builtin.
- It's easier to read. You need fewer names; rather than os, os.path, and builtins, the code talks mainly to one object.
- It's easier to write. How many of you honestly remembered that "dirpath, dirnames, filenames" is the order of the tuples yielded from os.walk?
- It's easier to secure. If you wanted to allow untrusted users to supply input to the os.path version, you need to be very, very careful. What about "/"? What about ".."? With FilePath, you simply supply the input to the 'child' method, and...
>>> from twisted.python.filepath import FilePath
>>> fp = FilePath(".")
>>> x = fp.child("okay")
>>> y = fp.child("..")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "twisted/python/filepath.py", line 308, in child
raise InsecurePath("%r is not a child of %s" % (newpath, self.path))
twisted.python.filepath.InsecurePath: '/home' is not a child of /home/glyph
>>> z = fp.child("hello/world")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "twisted/python/filepath.py", line 305, in child
raise InsecurePath("%r contains one or more directory separators" % (path,))
twisted.python.filepath.InsecurePath: 'hello/world' contains one or more directory separators - It's easier to extend. As of revision 22464 of Twisted (i.e. the next release) you can replace twisted.python.filepath.FilePath with twisted.python.zippath.ZipArchive, and this exact same code can operate on zip files.
4 comments:
I've been thinking about sharing the goodness that is FilePath for a few months -- now I don't have to :-) Using it has saved me sooo much time...
Submit a patch! :) This sounds straightforward / well specified enough I'll even commit to a review... (However, something that lets you actually modify the iteration would be better than something that took static strings.)
As far as getting into the stdlib - agitate on python-dev. I'll help you do any necessary coding if you can do the legwork to get everyone to agree that it's desirable (as opposed to one of the 30 "OO" filesystem wrappers that people have written for the stdlib, or nothing at all). I don't have the energy for that.
I don't have the energy to agitate on python-dev either nor do I have the required diplomacy skills ;) Anyhow, here's the patch:
http://twistedmatrix.com/trac/attachment/ticket/3044/filepath.py.diff#preview
This kind of thing makes me sad.
There are lots of people who could benefit from twisted.python.filepath, and there are comparable packages which twisted could use in order to gain the benefit without the cost of maintaining the package (at least one of which is being considered for inclusion in the Python Standard Library), but it isn't going to happen -- non-Twisted-requiring projects aren't going to benefit from twisted.python.filepath, and Twisted isn't going to benefit from those other packages, because Twisted doesn't use good packaging technology so that it can use other people's code and other people can use its code in an easy, manageable way.
Frankly, suggesting that people could copy a few source files is the kiss of death, for the prospect of that code being re-used by other people.
Twisted is falling behind because of this. Please ponder the postscript to this page:
http://www.kieranholland.com/code/documentation/nevow-stan/
This guy says, as I interpret: Nevow is technically better, but Django is the future because it makes it easy for people to re-use components in isolation. The same could be said of many of the Twisted and Divmod offerings.
Post a Comment