Sourceforge Update

Authenticate downloaded binaries from sourceforge a little more.

When I wrote my previous post about Sourceforge, things were looking pretty grim for the site; I (rightly, I think) slammed them for some pretty atrocious security practices.

I invited the SourceForge ops team to get in touch about it, and, to their credit, they did. Even better, they didn't ask for me to take down the article, or post some excuse; they said that they knew there were problems and they were working on a long-term plan to address them.

This week I received an update from said ops, saying:

We have converted many of our mirrors over to HTTPS and are actively working on the rest + gathering new ones. The converted ones happen to be our larger mirrors and are prioritized.

We have added support for HTTPS on the project web. New projects will automatically start using it. Old projects can switch over at their convenience as some of them may need to adjust it to properly work. More info here:

https://sourceforge.net/blog/introducing-https-for-project-websites/

Coincidentally, right after I received this email, I installed a macOS update, which means I needed to go back to Sourceforge to grab an update to my boot manager. This time, I didn't have to do any weird tricks to authenticate my download: the HTTPS project page took me to an HTTPS download page, which redirected me to an HTTPS mirror. Success!

(It sounds like there might still be some non-HTTPS mirrors in rotation right now, but I haven't seen one yet in my testing; for now, keep an eye out for that, just in case.)

If you host a project on Sourceforge, please go push that big "Switch to HTTPS" button. And thanks very much to the ops team at Sourceforge for taking these problems seriously and doing the hard work of upgrading their users' security.

Don’t Trust Sourceforge, Ever

Authenticate downloaded binaries from sourceforge. A little.

Update: please see my more recent post about updates in the interim.

If you use a computer and you use the Internet, chances are you’ll eventually find some software that, for whatever reason, is still hosted on Sourceforge. In case you’re not familiar with it, Sourceforge is a publicly-available malware vector that also sometimes contains useful open source binary downloads, especially for Windows.


In addition to injecting malware into their downloads (a practice they claim, hopefully truthfully, to have stopped), Sourceforge also presents an initial download page over HTTPS, then redirects the user to HTTP for the download itself, snatching defeat from the jaws of victory. This is fantastically irresponsible, especially for a site offering un-sandboxed binaries for download, especially in the era of Let’s Encrypt where getting a TLS certificate takes approximately thirty seconds and exactly zero dollars.

So: if you can possibly find your downloads anywhere else, go there.


But, rarely, you will find yourself at the mercy of whatever responsible stewards1 are still operating Sourceforge if you want to get access to some useful software. As it happens, there is a loophole that will let you authenticate the binaries that you download from them so you won’t be left vulnerable to an evil barista: their “file release system”, the thing you use to upload your projects, will allow you to download other projects as well.

To use it, first, make yourself a sourceforge account. You may need to create a dummy project as well. Sourceforge maintains an HTTPS-accessible list of key fingerprints for all the SSH servers that they operate, so you can verify the public key below.

Then you’ll need to connect to their upload server over SFTP, and go to the path /home/frs/project/<the project’s name>/.../ to get the file.

I have written a little Python script2 that automates the translation of a Sourceforge file-browser download URL, one that you can get if you right-click on a download in the “files” section of a project’s website, and runs the relevant scp command to retrieve the file for you. This isn’t on PyPI or anything, and I’m not putting any effort into polishing it further; the best possible outcome of this blog post is that it immediately stops being necessary.


  1. Are you one of those people? I would prefer to be lauding your legacy of decades of valuable contributions to the open source community instead of ridiculing your dangerous incompetence, but repeated bug reports and support emails have gone unanswered. Please get in touch so we can discuss this. 

  2. Code:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    #!/usr/bin/env python2
    
    import sys
    import os
    
    sfuri = sys.argv[1]
    
    # for example,
    # http://sourceforge.net/projects/refind/files/0.9.2/refind-bin-0.9.2.zip/download
    
    import re
    matched = re.match(
        r"https://sourceforge.net/projects/(.*)/files/(.*)/download",
        sfuri
    )
    
    if not matched:
        sys.stderr.write("Not a SourceForge download link.\n")
        sys.exit(1)
    
    project, path = matched.groups()
    
    sftppath = "/home/frs/project/{project}/{path}".format(project=project, path=path)
    
    def knows_about_web_sf_net():
        with open(
                os.path.expanduser("~/.ssh/known_hosts"), "rb"
        ) as read_known_hosts:
            data = read_known_hosts.read().split("\n")
            for line in data:
                if 'web.sourceforge.net' in line.split()[0]:
                    return True
        return False
    
    sfkey = """
    web.sourceforge.net ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA2uifHZbNexw6cXbyg1JnzDitL5VhYs0E65Hk/tLAPmcmm5GuiGeUoI/B0eUSNFsbqzwgwrttjnzKMKiGLN5CWVmlN1IXGGAfLYsQwK6wAu7kYFzkqP4jcwc5Jr9UPRpJdYIK733tSEmzab4qc5Oq8izKQKIaxXNe7FgmL15HjSpatFt9w/ot/CHS78FUAr3j3RwekHCm/jhPeqhlMAgC+jUgNJbFt3DlhDaRMa0NYamVzmX8D47rtmBbEDU3ld6AezWBPUR5Lh7ODOwlfVI58NAf/aYNlmvl2TZiauBCTa7OPYSyXJnIPbQXg6YQlDknNCr0K769EjeIlAfY87Z4tw==
    """
    
    if not knows_about_web_sf_net():
        with open(
                os.path.expanduser("~/.ssh/known_hosts"), "ab"
        ) as append_known_hosts:
            append_known_hosts.write(sfkey)
    cmd = "scp web.sourceforge.net:{sftppath} .".format(sftppath=sftppath)
    print(cmd)
    os.system(cmd)
    

Far too many things can stop the BLOB

Local mutable filesystem usage is a scalability problem.

It occurs to me that the lack of a standard, well-supported, memory-efficient interface for BLOBs in multiple programming languages is one of the primary driving factors of poor scalability characteristics of open source SaaS applications.

Applications like Gitlab, Redmine, Trac, Wordpress, and so on, all need to store potentially large files (“attachments”). Frequently, they elect to store these attachments (at least by default) in a dedicated filesystem directory. This leads to a number of tricky concurrency issues, as the filesystem has different (and divorced) concurrency semantics from the backend database, and resides only on the individual API nodes, rather than in the shared namespace of the attached database.

Some databases do support writing to BLOBs like files. Postgres, SQLite, and Oracle do, although it seems MySQL lags behind in this area (although I’d love to be corrected on this front). But many higher-level API bindings for these databases don’t expose support for BLOBs in an efficient way.

Directly using the filesystem, as opposed to a backing service, breaks the “expected” scaling behavior of the front-end portion of a web application. Using an object store, like Cloud Files or S3, is a good option to achieve high scalability for public-facing applications, but that creates additional deployment complexity.

So, as both a plea to others and a note to myself: if you’re writing a database-backed application that needs to store some data, please consider making “store it in the database as BLOBs” an option. And if your particular database client library doesn’t support it, consider filing a bug.

Ungineering

Don’t use the word “engineering” to refer to the process of creating software.

I am not an engineer.

I am a computer programmer. I am a software developer. I am a software author. I am a coder.

I program computers. I develop software. I write software. I code.

I’d prefer that you not refer to me as an engineer, but this is not an essay about how I’m going to heap scorn upon you if you do so. Sometimes, I myself slip and use the word “engineering” to refer to this activity that I perform. Sometimes I use the word “engineer” to refer to myself or my peers. It is, sadly, fairly conventional to refer to us as “engineers”, and avoiding this term in a context where it’s what everyone else uses is a constant challenge.

Nevertheless, I do not “engineer” software. Neither do you, because nobody has ever known enough about the process of creating software to “engineer” it.

According to dictionary.com, “engineering” is:

“the art or science of making practical application of the knowledge of pure sciences, as physics or chemistry, as in the construction of engines, bridges, buildings, mines, ships, and chemical plants.”

When writing software, we typically do not apply “knowledge of pure sciences”. Very little science is germane to the practical creation of software, and the places where it is relevant (firmware for hard disks, for example, or analytics for physical sensors) are highly rarified. The one thing that we might sometimes use called “science”, i.e. computer science, is a subdiscipline of mathematics, and not a science at all. Even computer science, though, is hardly ever brought to bear - if you’re a working programmer, what was the last project where you had to submit formal algorithmic analysis for any component of your system?

Wikipedia has a heaping helping of criticism of the terminology behind software engineering, but rather than focusing on that, let's see where Wikipedia tells us software engineering comes from in the first place:

The discipline of software engineering was created to address poor quality of software, get projects exceeding time and budget under control, and ensure that software is built systematically, rigorously, measurably, on time, on budget, and within specification. Engineering already addresses all these issues, hence the same principles used in engineering can be applied to software.

Most software projects fail; as of 2009, 44% are late, over budget, or out of specification, and an additional 24% are cancelled entirely. Only a third of projects succeed according to those criteria of being under budget, within specification, and complete.

What would that look like if another engineering discipline had that sort of hit rate? Consider civil engineering. Would you want to live in a city where almost a quarter of all the buildings were simply abandoned half-constructed, or fell down during construction? Where almost half of the buildings were missing floors, had rents in the millions of dollars, or both?

My point is not that the software industry is awful. It certainly can be, at times, but it’s not nearly as grim as the metaphor of civil engineering might suggest. Consider this: despite the statistics above, is using a computer today really like wandering through a crumbling city where a collapsing building might kill you at any moment? No! The social and economic costs of these “failures” is far lower than most process consultants would have you believe. In fact, the cause of many such “failures” is a clumsy, ham-fisted attempt to apply engineering-style budgetary and schedule constraints to a process that looks nothing whatsoever like engineering. I have to use scare quotes around “failure” because many of these projects classified as failed have actually delivered significant value. For example, if the initial specification for a project is overambitious due to lack of information about the difficulty of the tasks involved, for example – an extremely common problem at the beginning of a software project – that would still be a failure according to the metric of “within specification”, but it’s a problem with the specification and not the software.

Certain missteps notwithstanding, most of the progress in software development process improvement in the last couple of decades has been in acknowledging that it can’t really be planned very far in advance. Software vendors now have to constantly present works in progress to their customers, because the longer they go without doing that there is an increasing risk that the software will not meet the somewhat arbitrary goals for being “finished”, and may never be presented to customers at all.

The idea that we should not call ourselves “engineers” is not a new one. It is a minority view, but I’m in good company in that minority. Edsger W. Dijkstra points out that software presents what he calls “radical novelty” - it is too different from all the other types of things that have come before to try to construct it by analogy to those things.

One of the ways in which writing software is different from engineering is the matter of raw materials. Skyscrapers and bridges are made of steel and concrete, but software is made out of feelings. Physical construction projects can be made predictable because the part where creative people are creating the designs - the part of that process most analagous to software - is a small fraction of the time required to create the artifact itself.

Therefore, in order to create software you have to have an “engineering” process that puts its focus primarily upon the psychological issue of making your raw materials - the brains inside the human beings you have acquired for the purpose of software manufacturing - happy, so that they may be efficiently utilized. This is not a common feature of other engineering disciplines.

The process of managing the author’s feelings is a lot more like what an editor does when “constructing” a novel than what a foreperson does when constructing a bridge. In my mind, that is what we should be studying, and modeling, when trying to construct large and complex software systems.

Consequently, not only am I not an engineer, I do not aspire to be an engineer, either. I do not think that it is worthwhile to aspire to the standards of another entirely disparate profession.

This doesn’t mean we shouldn’t measure things, or have quality standards, or try to agree on best practices. We should, by all means, have these things, but we authors of software should construct them in ways that make sense for the specific details of the software development process.

While we are on the subject of things that we are not, I’m also not a maker. I don’t make things. We don’t talk about “building” novels, or “constructing” music, nor should we talk about “building” and “assembling” software. I like software specifically because of all the ways in which it is not like “making” stuff. Making stuff is messy, and hard, and involves making lots of mistakes.

I love how software is ethereal, and mistakes are cheap and reversible, and I don’t have any desire to make it more physical and permanent. When I hear other developers use this language to talk about software, it makes me think that they envy something about physical stuff, and wish that they were doing some kind of construction or factory-design project instead of making an application.

The way we use language affects the way we think. When we use terms like “engineer” and “builder” to describe ourselves as creators, developers, maintainers, and writers of software, we are defining our role by analogy and in reference to other, dissimilar fields.

Right now, I think I prefer the term “developer”, since the verb develop captures both the incremental creation and ongoing maintenance of software, which is so much a part of any long-term work in the field. The only disadvantage of this term seems to be that people occasionally think I do something with apartment buildings, so I am careful to always put the word “software” first.

If you work on software, whichever particular phrasing you prefer, pick one that really calls to mind what software means to you, and don’t get stuck in a tedious metaphor about building bridges or cars or factories or whatever.

To paraphrase a wise man:

I am developer, and so can you.