Table of Contents
The Tale of HTTPDirFS
HTTPDirFS is a FUSE filesystem which allows you to mount a HTTP directory listing. It has a very interesting beginning.
The story starts with a conversation with the admin of the-eye.eu, which is a website containing a lot of questionable content in terms of copyright. Below is the chat log copied from Discord.
fangfufu 07/15/2018 could you enable webdav please? lol i know i asked this before and got denied httpfs2 doesn't work i want to browse your collection i don't want to have to download everything or install the server side script for this? https://github.com/cyrus-and/httpfs GitHub cyrus-and/httpfs httpfs - Remote FUSE filesystem via server-side script -Archivist 07/15/2018 nope why do you have to be the one in millions that doesn't want to view the site like a normal person fangfufu 07/15/2018 is there any ways to "stream" the website? well because i don't want to download the whole website it would be nice if i can mount it locally -Archivist 07/15/2018 it's a website, its provided as is, we already do enough extra shit fangfufu07/15/2018 ok nvm then, oh well :frowning:
I thought it would be funny to actually write a software that allows me to mount a HTTP directory listing locally, and throw that in the Archivist's face. The project turned out to be fairly difficult. Mainly libfuse is multithreaded, and I got about 40% in the concurrency course work in Principles of Programming Languages, back when I was an undergraduate in York. I am reliably told by a computer science postdoc that nobody likes dealing with race conditions.
Obviously I wrote in the README of the project that I dedicated the project to the Archivist, and people on Reddit find it funny.
It is kind of crazy how far this project has come - this software is now available on Debian. It is interesting enough to attract a Debian Developer who packaged and uploaded it.
Finally, researchers in Germany have decided to incorporate HTTPDirFS in their research software framework, for importing data. Their publication record so far suggests that the project is primarily used for biomedical research.
I really don't know what to feel or what to say about this one - this project was originally designed to annoy someone on the Internet. It was not meant to be useful or helpful. It feels really strange that some researchers on the Internet are taking it seriously. Because I am in UEA Triathlon Club, I have a lot of friends who study medicine, I do enjoy being around them. But I find it highly weird that HTTPDirFS somehow winds up helping out with biomedical research - when will people who are somehow related to medicine leave me alone? (Only joking of course!) My dad does biomedical research, so I suppose it feels great to indirectly contribute to the field.
So overall, I am not sure if this project has been a success or failure, in the sense of whether it fulfilled its original purpose. I am not sure if the Archivist is annoyed. However I believe I have provided ample entertainment for the Redditors in his own subreddit.
What is certain is that I am really proud of this project - it feels great that people on the Internet take your toy project seriously, especially when it wasn't meant to be serious at all. Using badly learnt knowledge learnt from undergraduate days in real life brought me great satisfaction. Thank you for teaching me about concurrency, Professor Alan Burns.
Email to Professor Alan Burns
Race conditions might still be in my code, because my code is crappy, and my knowledge is shoddy. I have emailed my undergraduate professor - hopefully he will give me some help. Hopefully, at least he would find my story funny – I know at least if someone send me an email like this, I would love it.
Dear Alan, I don't know if you remember me. I was the 2nd year undergraduate course rep back in 2012. I am currently a PhD student in University of East Anglia. I am working on Computer Vision. Thank you for teaching me concurrency programming in POPL back then. I am afraid I didn't do so great in your coursework - I think I scraped a 40%. However, the things you taught me has proven to be incredibly useful and valuable, because my hobby project depends on it. I basically wrote a filesystem (https://github.com/fangfufu/httpdirfs), and it is mildly popular. Somebody in Canada decided to package my software and upload it to Debian repository. Debian is one of the largest Linux distribution. Without the knowledge I gained from your module, I might not have been able to figure out what I was facing. During the process of writing the cache system for my filesystem, I have hunted down numerous race conditions. It was kind of funny and bizarre when my code ran fine when I forced it to run with one thread, but it failed mysteriously when I decided to run it with multiple threads. I needed multithreading for performance reasons. I have since identified the critical sections of my code, and guard them using pthread mutexes. During my own testing, I haven't encoutered deadlocks. Unfortunately, my Canadian friend has reported a sympton that sounds like a deadlock. The most annoying thing is that it is not very reproducible. I am pretty sure that if I persevere. I will eventually figure something out. I am doing a PhD afterall. But I just thought my situation is kind of funny, and you might appreciate the irony of my story. I have never thought that my hobby project would depend on knowledge from one of my worst performing module back in my undergraduate days! So my question is, us there a way to check for deadlocks by static code analysis? I am sorry, but I do not remember if there was a way to check if my solution to the dinining philosopher coursework would cause deadlock, other than running it. It would be nice if you point me to some reading materials. Best wishes, Fufu
Professor Alan Burn's reply
Hi, Thanks for your 'story', indeed ironic but interesting - I am pleased to have had a possible input. Deadlock detection is VERY hard - testing will not identify the subtle situations that can lead to this failure Two approaches - one use a resource usage protocol that prevents deadlocks (they exists for single processor systems but are not as common for true parallelism) - two, use model checking on a model of your software to 'prove' deadlock free in ALL circumstances. The latter is a powerful but not too easy to apply, especially to code that already exists. Good luck Alan
HTTPDirFS was accepted into Debian repository by the then DPL himself!
Chris Lamb was the Debian Project Leader in March 2019. In his blogpost, he mentioned httpdirfs-fuse
. This is really cool!!! It feels like I unknowingly got an autograph from a Hollywood star!!!