thufie's blog

thufie

Somebody at some point will have to expend energy to deal with the slope of the terrain. We are raised looking at flat symbolic maps to help us navigate, with no terrain relief. The slope of the terrain is more real than the map ever was, and while we navigate the confines of the map we constantly expend energy on this hidden cost of the slope.

We never ask ourselves why our legs are tired, or why we were so easily persuaded to work so hard and for so long to afford a vehicle, which eliminates our perception of the cost of navigating the slope and externalizes it to a resource conflict at the end of a “supply chain”. We are given low-variety decision-making in low-frequency and easily controlled bursts as some kind of say over this organizational machinery which works day in and day out selling you to the solution to what they refuse to map for you.

Suppose someone then shows you the topographic map, showing the slopes of the terrain hidden from you. You might think, “that's interesting, but how does it help me get around”? Certainly at a personal level, the map of the terrain's slope is not all that actionable. Perhaps you have the luck to be situated at the bottom of a hill relative to your workplace or grocery. Having the map of the slope did not help you gain any advantage from this situation, but maybe you recognize that the grocery truck, or your public transit save you work by dealing with hauling heavy things up the slope for you so that you can coast home with the weight of a heavy load working to your mechanical advantage. Nothing you did individually impacted that very fortunate labor freeing state of affairs… Back to looking at your phone perhaps, the topographic map wasn't all that useful after all, need to pull up the GPS route for my work commute and get on your way...

However, suppose you were the person ultimately responsible for placing groceries throughout the city. It would be awfully inefficient to ask everyone about where the groceries ought to be, after all. You sit at your desk, and you look at the same flat symbolic map you've had your whole life. Sure, perhaps it has been updated with new features, you can see the location of the first supermarket location you approved breaking ground a year or two after you first ticked the box on your map. It is the same however, in the sense that it shows only the same types of things the map has ever shown. The mapmaker doing the job of mapping does not often ask themselves the question of what details of the terrain are most important to be mapped, do they? There is a mapping tradition, dating far back in time, with very slow and officially authorized symbolic development. If you are good at your job you might be wise enough to make use of as many of these maps as possible, recognizing that the map is not the territory, and you will never know what crucial detail might be important in determining where to place the next grocery. You have a lot on your plate, personally observing the conditions of every place you would need to survey for advantages to grocery placement for the community is something that is very time consuming! You don't have any time for that. So back to these maps, these very historied maps, these maps without slopes.

What is it that differentiates the person using the map to get their groceries, and the person using the map to place the groceries? What is their relationship. These people don't talk or know each other, so insofar as one impacts the other we can safely assume that is a one-way relation. Which happens first? The person placing the grocery, of course, that is how time works after all, causation and everything. Someone places the grocery using the map so that another person finds it using the map. A very storied, very ideological map.

I'm riding my bike to the grocery today and my legs are very tired. My legs feel this ideological oversight, making me tired. But what can I do, the relation of the grocery to my terrain was decided, the connection between the grocery planner and I was cast in the cement two blocks down from me, and my legs burn from it every day. Maybe there is a world where my legs burning could have impacted where the cement was cast, rather than this tenuous ideological connection the mapmaker has made between the grocery planner and I. Perhaps there could be a map of where my legs burn that is given to the grocery planner? But what of the ideology of the map symbols it chooses to employ don't capture the truly important information to saving my effort? And even then, wouldn't the placement of the grocery be changing that map, impacting my behavior, directing me differently than the way the map captured the effort of my legs beforehand?

Some may tell you that the issue here is that the map is centralized, rather than decentralized , and others still might tell you things like “well, if person X made the decision rather than person Y, everything would have sorted out fairly for everyone”. Reflect for a moment and ask yourself what would have happened if the grocery planner was not only elected fairly, but that the grocery planner truly was the best fit for the role. Votes don't tell them how tired your legs are, do they? That does not seem like it would give even the ideal grocery planner any better informational feedback for making the correct decision. When you fill out a piece of paper describing the names of planners rather than the tiredness of your legs that doesn’t seem like the right type of input. Suppose the job of grocery planning were decentralized, and we all got together virtually somehow, with some way of filtering out all the informational noise that would introduce to the process of checks notes placing a grocery store. At the end of the day we would be a bunch of independent grocery planners looking at maps, and possibly not even the same ones! However, none of these scenarios would seem to explain why my legs are so tired from going to the grocery store.

Perhaps we got it all wrong somewhere with this at a fundamental level, to keep failing in the same way, and dishing up solutions that only appear to introduce new organizational problems or rearrange the deck chairs on my sore, sore legs. Perhaps the planning should be play, perhaps the map should be the territory, and perhaps only the people with the relevant tired legs should be making the ones deciding where to break ground through a different kind of process. After all, if somebody at some point will have to expend energy to deal with the slope of the terrain, maybe the people who can feel it should have a try at eliminating that feeling through a self-actualizing process. I don't want to propose a structure, if anything the problem is that we are proposing structures but not feelings, resulting in structures mapped by a self-preserving organizational ideology of said structure, and increasingly inhospitable terrain.

Now what if I told you some very smart and very evil people can simulate how your legs feel with hoards of sensory data collected off of you, but rather than addressing the cause of your legs being tired they used this information to price the car you need to get to work at exactly the price point you are willing to desperately dish out for a car to keep yourself from walking/cycling the trip. This, in terms of feedback, is multiple steps above anything I proposed before, and remarkably effective if not evil. There was never a grocery planner, just a powerful car salesman in control of an algorithm. How would you feel about that? Some would have you direct your pain at the algorithm. After all, were it not for the computer, the car salesman’s power over how tired your legs are would not be so all-encompassing. Some would have you direct your pain at the car salesman. His computer is a powerful and flexible tool, fully re-programmable. Maybe we have been outmatched organizationally be a technological instrument for detecting how tired our legs are. Outclassed in the same way bows were outmoded by revolvers and machine guns. What if this algorithm was ours? The algorithm is kind of like the maps used by the grocery planners in the past, but its inputs and outputs are refreshed much faster, allowing for more feedback. It inherits some problems that are ideological, but because it isn’t a mechanical and inflexible machine, but a dynamic and re-programmable one, it could be refitted for another purpose. Perhaps you could choose to dream that the ratchet of technology was a one-way ticket to freedom instead of persistent controls of old powers and modes of thought. Maybe you could dream to design a dominant and powerful freedom no longer relegated to the fringes, cybernetically. These revolvers and machine-guns of the new age of organization can be reprogrammed into plowshares, should we try.

(This post is just advertising how to use my new pet project)

Getting the list of peers ready

This depends on their instance api being accessible (and running Mastodon), but if it is, run this command replacing $YOURINSTANCE with your instance's FQDN (and install gron):

gron https://$YOURINSTANCE/api/v1/instance/peers | sed 's|.*"\(.*\)".*|\1|' - | sed 's/^/https\:\/\//; s/$/\//' - | tail -n +2 > peers.txt

Running Cloudfinder

Then we'll use my new tool, cloudfinder to detect which instances are using cloudflare in this list we've created. To do that, you will need to install stack, then git clone the repository and run this in the new directory:

stack install
cloudfinder -f peers.txt > results.txt

Then just analyze the results with, for example:

cat results.txt | grep " - CloudFlare" | wc -l

which would tell you how many instances have CloudFlare!

This guide will help you setup a network namespace which connects over a wireguard connection and give you a shell in that namespace where you can run anything you need. It is great for when you want to run something through a VPN, but don't want to send all of your internet traffic through the VPN. Keep in mind that this is not a container and whatever you run in your network namespace has access to your filesystem just like anything else would and needs to be run with appropriate permissions. I am assuming you already installed wireguard for this tutorial.

Simple setup:

  1. Add a line to your wireguard config (usually /etc/wireguard/wg0.cfg) that says AllowedIPs = 0.0.0.0/0, ::/0 under the [Peer] section if it is not already there. If you are using this for a VPN service, they probably provided you with a wireguard config file to use here.
  2. Run the networked namespace creation script available here as root. Enter the name of your primary network interface at the prompt (you can list your system's network interfaces with ip a on most Linux machines, use one with a LAN IP address).
  3. Start a shell in the namespace using the recommended command from the output of the above script.
  4. Run sudo wg-quick up wg0 in the namespace where wg0 is the name of your wireguard config file you are using.
  5. Your namespace is now connected to the internet over a wireguard connection while your main system is not! In addition, you can access your namespace and your namespace can access your main system over the veth interface that was created with the IPs 10.200.1.1 and 10.200.1.2 respectively, try pinging between them!

Make a Wireguard Network Namespace systemd service unit for startup and shutdown

  • Download these two scripts for startup and shutdown.
  • Edit the top of both of the scripts and change the values at the top to match your current network interface with internet access and the name of your wireguard config file in /etc/wireguard/. If you are using a VPN provider they should have provided a file to place there or made you run a script which added configs there to choose from.
  • Make a service file called wg-namespace.service in /etc/systemd/system/.
  • Copy the startup and shutdown scripts to /usr/bin and mark each of them as executable using chmod a+x scriptname.sh.
  • (OPTIONAL, avoid if using a VPN config) Run mkdir -p /etc/netns/ns1/ and then echo 'nameserver 51.15.98.97' > /etc/netns/ns1/resolv.conf to use a custom DNS server for your wireguard namespace rather than whatever DNS service the rest of your system uses. Replace 51.15.98/87 with a DNS server of your choosing, but I would point you to the ones here.
  • Edit wg-namespace.service and add the following:
   [Unit]
   Description=Start and stop wireguard namespace
   
   [Service]
   Type=oneshot
   ExecStart=/usr/bin/wgns-start
   ExecStop=/usr/bin/wgns-stop
   RemainAfterExit=yes
   
   [Install]
   WantedBy=multi-user.target
  • To start the service and enable it to run on startup run systemctl enable --now wg-namespace.service.

Now you have a namespace setup like above which preserves itself across reboots. To run commands in that namespace which has all traffic going through wireguard, run ip netns exec ns1 YOUR-COMMAND-HERE. You can run /bin/bash -i to get an interactive bash shell there, too. Keep in mind that some wireguard VPN configs have finicky DNS resolution setups and so if it doesn't resolve domain names but can ping outside IP addresses try restarting the service.

An Example Application: VPN seedbox

Now that all of this is setup I'm going to outline a use-case that some people might find familiar. Say you are running a seedbox for fully gratis GNU/Linux distro torrents in a country where that might be frowned upon on a system which already hosts content directly on the public internet. Once the ns1 namespace is setup you can make a new user called torrentrunner to isolate their configuration files and avoid issues. Then install deluge and deluge-web, a torrent daemon with a webui for controlling it. Then once that is done you can enable the service for the webui so it runs in the default namespace with sudo systemctl enable deluge-web. Before running the daemon though, you should probably use ip netns exec ns1 links press “g” and browse to a site which confirms you are on a VPN (what is my ip address or whatever), otherwise restart the namespace systemd service unit and try again because sometimes it can be finicky. Once you are ready, run the daemon and kill it once with the new user to ensure the config files exist sudo -u torrentrunner deluged and sudo -u torrentrunner killall deluged. Find the deluge core config file in /home/torrentrunner/.config/deluge/ and set allow remote connections to true. Then you are ready! Start the deluge daemon with the new user in the wireguard namespace by running ip netns exec ns1 sudo -u torrentrunner deluged. Connect to the webui (defaults to port 8112) and then go to “Connection Manager > Add” and for the host enter 10.200.1.2 and leave the port as the default port, 10.200.1.2 is the ip address for your main system's wireguard namespace running the daemon and only shows up on your machine and won't be directly on the LAN. Connect to it, and you are all set.

Other use cases could be for running a browser or any other networked application through a VPN from the terminal on a desktop Linux machine ip netns exec ns1 $Application but keep in mind that VPNs don't offer privacy no matter what they say and these won't be running in a container so they can still fingerprint you. You could also setup a testing client-server setup locally for development purposes with connections going between 10.200.1.1 (regular namespace) and 10.200.1.2 (wireguard namespace).

CloudFlare is a CDN-like service that I have talked about the perils of before including when they got included as the default DNS over HTTPS root in Firefox. They act as an intermediary for sites and intercept connecting web traffic and most importantly they decrypt it before it reaches the actual destination (CloudFlare gets access to all supposedly secure data) and also provide other “services” which conveniently place them in a position where they can intercept private data. So in the context of a supposedly fairly decentralized network like the Fediverse, how much do we really need to worry about “secure” connections eventually passing through CDNs like CloudFlare rather than their actual destinations and revealing private data to these sketchy mega-corporations? This post is a preliminary attempt to find out.

Process of identifying the extent of the problem

I do not currently know a way of getting an easy list of a ton of instances in simple plaintext, so here I will be using all of the instances of the people I follow, which comes out to a total of 271 instances. Mastodon allows exporting a CSV file in the format user@instance.tld,true where true is whether to display their boosts (never knew that was an option). There are multiple users on the same instances, and I also don't want anything but the instance in my file so I can get a list of instances. Here is where my nerdiness shows (if it somehow didn't hit you already). I used two (stock) emacs commands to trim the file down to a deduplicated list of instance URLs, so here they are for reference:

M-x replace-regexp <RET> .*?@\(.*\),true <RET> \1 <RET>
M-x delete-duplicate-lines <RET>

Then after deleting the top line which had the column names the file was ready. I would need a bash script that could take in a bunch of URLs as arguments and check whether they used (were used by) CloudFlare. Well, this is basically just checking for empty results from grep, but results from what? That took a second, but after a little bit I found out that CloudFlare announces itself in the HTTP response headers as Server=cloudflare, so I crafted a curl command that outputs the headers, and simply grepped for “cloudflare” in order to check each instance. To start it using the list from the file, I just ran ./cfinder.sh $(cat instances.txt) (the $(command) syntax executes that command in place so it's output can be used as part of a larger command). At least they nicely announce themselves instead of forcing a check against their known IP ranges to identify them.

Here is the bash script which takes any number of domains as it's arguments and determines whether they use CloudFlare. It checks for CloudFlare's signature in the headers using curl and grep then outputs a result and even keeps a total count for you at the end.

The heart of the script is mostly just the below line of code.


curl -sSL -D - $i -o /dev/null | grep cloudflare

cfinder.sh script This script is also very slow, not for use on large datasets. Just good enough for quick checks.

I added some coloring for terminals that support that as well. The output from my first run was actually quite green overall (only 14/271 positive for CloudFlare disease, ~5%)! But maybe it would be more helpful to weigh the results by user instead, so I did the same thing except I skipped the deduplication step on the input in emacs. This time it took much longer to run (if I had the patience I'd code a better solution to this, but eh, whatever). Of the accounts on my follow list now, 82 out of 1339 are hosted on CloudFlare (~6%). A mere 1% increase would be heartening, if it weren't my follow list, because generally I tend not to follow people who would be on CloudFlare instances to begin with and a few times it has made me reconsider following people or accepting their follow requests.

How do we interpret this? (skip this if you don't like math)

We can analyze the data to look for patterns in terms of how many nodes out of the total need to compromised for 50% to 100% of a network's connections to be compromised. This is graph theory, so if you aren't familiar then maybe watch a quick numberphile video.

The Fediverse can be modeled as a complete graph (undirected), meaning every node connects to every other node, for simplicity of the math and also since at the moment federating openly is the default behavior. When we ask ourselves how many connections there are (edges) in a graph we can use what are called triangle numbers, which should be familiar to computer scientists at least even if not by name. The formula is n(n+1)/2, but mathematicians have their own special notation which is on the Wikipedia page. You can think of the basic formula as behaving like a factorial, except instead of the terms getting multiplied n! = 1*2*3*...*n the terms are added, n? = 1+2+3+...+n (Knuth suggested the n? notation, but nobody seems to use it).

There are two types of compromised connections we need to consider: the connections between two compromised nodes and the connections between a compromised node and an uncompromised node. Both are just as bad, so we should add them together to get the number of compromised connections out of the total connections. We then get the total connections with the above formula n? where n is the number of nodes (I'm using Knuth's notation or else this will look overwhelming). To get the connections between the compromised connections and the uncompromised ones, we can denote the compromised connections as c and the uncompromised as u and simply multiply them together c*u. Lastly, to get the connections between the compromised nodes we will use c? because the uncompromised nodes are their own complete subgraph of the larger graph. All together, this comes out as (c?+cu)/n?, but we can also describe it as (c?+cu)/(c+u)? (removing n) and ((cu+(c(c+1)))/2)/(((c+u)((c+u)+1))/2) if you hate yourself (might have to double-check that last one, lol). The output of these functions are between 0 and 1, with 0 being fully secure and 1 being fully compromised.

To give you an idea of how bad this is, the output of this function for both of my runs (weighted and unweighted) are both around 0.05 and 0.057, both roughly corresponding to the 5% and 6% from the raw fractions before. But now that we have a function, we can plot it! With a function plot we can judge how much the problem of compromised connections scales as the number of compromised nodes grows. But what would be a good size for our Fediverse as the total number of nodes for our function (we can vary the compromised and set the uncompromised accordingly)? Fediverse.network at the time of writing has 3,128 Mastodon and Pleroma instances listed, so let's go with that.

Before I move on, though, there is an assumption I have to admit I've hidden in the math, if I write a function like this with a constant number of instances where the compromised determines the number of uncompromised in the function, the assumption is that going forward pre-existing instances will either switch to using CloudFlare or be replaced by CloudFlare instances rather than the Fediverse getting only new compromised instances. Keep in the back of your mind that because of this the function I am making is a worst-case scenario I am using to demonstrate the very worse number of compromised connections possible for a given number of compromised nodes. So, to conclude, I'm finally reducing the function to one term so I can plot it. Behold this absolute monstrosity:

f(c) = ((c(3128-c)+(c(c+1)))/2)/(((c+(3128-c))((c+(3128-c))+1))/2)

Now that's what I call a function.

Unfortunately, since this a function going from x=0 to x=3128 and y=0 to y=1, the proper viewing window would almost be a horizontal line, and since I don't really know how to scale graph plots in KAlgebra (if somebody can plot this in a tool with more flexibility let me know) I will just give you some points. When there are 782 nodes compromised (25% of total), 25% of connections are compromised, 50% of connections are compromised at 50% of the nodes, and at 90%, 90% of connections are compromised. In other words, this is linear. It may have already been clear to some of you who know how to determine the shape of a function from a formula's structure, but as long as only the minority of instances use CloudFlare, only the minority of connections use CloudFlare. All we have to do is keep it that way if we are concerned about our security. In reality, there will probably only be new CloudFlare instances which do not replace pre-existing instances alongside more non-CloudFlare instances for each one of those. Okay, I'm done nerding for now, I promise.

Closing Thoughts

So it seems like instance to instance connections, overall, are not that susceptible to CloudFlare. But before you consider your private content safe, consider that the Fediverse is not a platform where messages simply go from sender to receiver, they are usually broadcasted, as in the case of follower-private posts. With this in mind, the above math and conclusion only applies to private messages, but not much else. Follower posts and other content relying on TLS between instances, which propagate on the back of following lists, are much more likely to end up with CloudFlare (obviously including public/unlisted posts as well). If you post a follower-only post, and have a follower on a CloudFlare instance like mastodon.cloud, then CloudFlare will get your message's content. Until Fediverse software actually enforces the privacy of posts somehow, or people stop federating with instances using CloudFlare, this is a risk. I would urge developers of Fediverse software to take one of two courses for the time being (preferably both, actually):

  1. Allow instance admins to let private posts with any kind of sensitive information be configurable to go through a separate connection on large instances which unfortunately rely on CDNs rather than encouraging users to go elsewhere.
  2. Actually get around to solving the problem of actually securing activitypub posts to begin with, so we do not have to be stuck with hoping that TLS connections secure them for us.

Also, CloudFlare is not the only large CDN being used on the Fediverse, actually others have found that while the Fediverse is very logically decentralized it is not very physically decentralized. I have not looked into the security arrangement of other hosters, but my fixation with CloudFlare is primarily because I think it is largely unnecessary and from a security/privacy standpoint it is simply a proxy which decrypts HTTPS connections before they reach their proper destination without users knowing that CloudFlare has access to that data even if the server itself is self-hosted or on another VPS service. That is not all, I'd once again refer back to my original and thorough post about CloudFlare, but I think that should be enough to understand my concerns.

As long as CloudFlare has access to the unencrypted connections that our supposedly “private” messages are sent over for anyone sending from or receiving on an affected instance, we have to worry about yet another third party to trust besides the instance administrators at both ends. And unlike administrators, there is no social trust, it is just another corporation with shady governmental research origins. It has a good PR department though, but ultimately brands are just trying to optimize for maximum profit and you should not trust anything they say at their word.

So recently on the Fediverse this post has gotten quite a lot of attention.

Firefox admits they will eventually be sending all of your DNS to Cloudflare. Cloudflare will monetize your internet browsing, no matter how much their PR people say they are. If you want to disable that, go to “about:config”, and set “network.trr.mode” to 5. The values are: 0 – default off, 1 – race, 2 TRR first, 3 TRR only, 4 shadow, 5 off by choice – @phessler@bsd.network

Reactions in the thread are quite varied, with some people outright denying that this is even a problem. I want to start from the beginning to unpack what exactly is going on here along with Mozilla's intentions, the technology involved, and the implications for Firefox users (and probably also users of other browsers eventually).

DoH – a.k.a “DNS over HTTPS”

DNS over HTTPS is a technology that has been around for a while now and is starting to get formalized and implemented. The general idea, as you might get from the name, is simply to encrypt your DNS requests in the same way that HTTPS encrypts HTTP traffic protecting users from MITM attacks. There have been several competing technologies to solve the problem of investing trust in your local network when it comes to resolving domain names and this is probably the most mainstream one. There are almost half a dozen alternatives to DoH which attempt to solve this problem while also decentralizing domain name ownership as well, as few of which I will list later. What you need to know about DoH in a nutshell, is that the trust is being transferred from your local network to the remote server which decrypts and replies to your requests instead of either asking your router to resolve the IP (when it will then ask some DNS server) or manually configuring your machine to connect to a particular DNS server. Both of these methods are usually unencrypted and the first relies on trusting the LAN.

The Mozilla Solution

Mozilla has now fully implemented a feature which at some point in the future will by default use DoH to connect to one of a set of pre-configured “Trusted Recursive Resolver”s (TRRs). That's where the “network.trr.mode” comes from. The TRRs are a set of servers, or groups of servers, that “will be required to conform to a specific set of policies intended to protect user privacy” (source). By centralizing trust in these TRRs the idea is that they can be strictly held to this “specific set of policies” (there hasn't been a write-up of the policies yet) and thus everybody else will be better off whenever they connect to sketchy WiFi hotspots. DNS requests being intercepted and modified is a well-known issue, so it makes sense that Mozilla would like some kind of response to it.

I Thought This Was About CloudFlare Though?

Yes. Not directly, but in implementation. CloudFlare from the beginning of the testing of DoH using TRRs in Firefox has been the standard TRR setting the example (they even already have a policy here even if it is not specifically for DoH). If you have a stock(ish) Firefox install right now network.trr.uri is likely already set to https://mozilla.cloudflare-dns.com/dns-query (though who knows by the time you are reading this). So all the idealist talk about securing web access for users aside, in practice right now if network.trr.mode is set to anything other than 5 or (at least while I am posting this) 0 your DNS requests could be being sent to CloudFlare to be decrypted and resolved on their end. Mozilla plans to have the default value, 0 , enable this feature at some point in the future, so by the time you are reading this that might already be the case. That's essentially what this fuss is all about.

There are two ways of looking at this. You could trust in CloudFlare to respect the spirit of their policy, not act maliciously given any loopholes in their policy, and keep the temporary logs of your requests safe from other organizations just like you do with ordinary DNS servers (or more likely, don't). Alternatively, you could say “I Don't Trust CloudFlare” and invest your trust elsewhere or even just admit that the status-quo is probably better than sending all or even some portion of your requests to CloudFlare. It is really up to you either way, but I think I've already made it clear where I stand before.

Can I Just Not Have This?

Yes, and the advice from the original post will disable this, but I have some advice...

Just use a fork of Firefox that disables or removes things like this by default, so you're covered in the future. Below is a little list of ones I'd personally recommend in order of how much you could trust them:

  1. GNU Icecat – The Free Software Foundation's fork which also removes anti-features like DRM, but which may be too “extreme” for some users (except on Android). On Linux and Android.
  2. PaleMoon – An earlier fork of Firefox with it's own rendering engine which is actively maintained. Available on Linux and Winblows.
  3. Fennec F-Droid – Android version of the latest Firefox which is mostly if not entirely deblobbed and has sane profile defaults.

Keeping track of actively maintained forks that are relatively trustworthy is hard these days, there used to be more. If you have an active Firefox fork you recommend which fits the bill I'd be glad to add it to this list.

Wait... So What Could I Be Doing Instead? There's still an issue here!

Yes and there are different approaches which you may even want to use in tandem to resolve this if you really care. Two categories come to mind:

Trusted DNS Providers and DNSSEC

In the words on the Wikipedia entry for DNSSEC: “[DNSSEC provides]... origin authentication of DNS data, authenticated denial of existence, and data integrity, but not availability or confidentiality.”

In other words, your requests can be snooped or blocked, but not tampered with (forged), which was the main problem which DoH aimed to fix in addition to snooping. Though you could definitely argue that the current DoH implementation only centralizes snooping. If you take this approach you still have to find a trustworthy DNS provider which supports DNSSEC, but at least it won't be CloudFlare. Configuring your system's DNS varies by what you are running, or you can only set your browser to use a certain DNS server if you want. One thing to note is that on some networks messing with your DNS could possibly break interoperability with “quirky” (being generous here) systems which rely on trusting their DNS or tampering with your connection to authenticate through a portal instead of using a standard form of network authentication. People have strong opinions about which servers to use, so I will avoid making any recommendations here for now at least, but you can search up servers which don't keep logs and respect your values.

Alternative Domain Name Resolution Methods (for privacy nuts and tech hipsters :P)

If you just want to browse the web absolute confidentiality, without fearing being blocked access, and without your connection being tampered with, there isn't really any alternative to using the Tor browser or plugging a tor daemon into your Firefox (only experienced users who know what they are doing should do this).

If you want to say “FUCK ICANN” and give up on the modern web entirely there are Tor hidden sites accessible via the Tor Browser, IPFS, Beaker Browser (dat://), and even blockchain-based projects like namecoin (wow blockchain not being used for something cancerous!). However I doubt your grandma could use any of these right now and find them useful.

Hopefully sometime not too long from now, there will be TRRs for DoH which we can actually trust as well as an accessible way of running our own as is the case with DNS servers at the moment. But for now at least I'd avoid it.

If you clicked on this post because you don't like CloudFlare, you might be interested in my longer more in-depth post on CloudFlare in general, available here.

Whether on matrix, the fediverse, or wherever else you know me, you've probably seen me ranting about CloudFlare for one reason or another and advocating for its abandonment by server administrators. CloudFlare has issues on quite a few fronts, which depending on your ideals, may only amount to one or two. This post is an attempt to enumerate all of the different, often unrelated, issues with CloudFlare, in a single place for reference purposes. In fact I plan on keeping this updated as CloudFlare pulls-off crazier and crazier bullshit and continues to become the norm among administrators.

Other similar/related posts: 1. CloudFlare, We Have A Problem (joepie91) 2. The Trouble with CloudFlare (Tor blog) 3. CloudFlare's Captcha Deanonymizes Tor Users (cryptome) 4. CloudFlare and RIAA Agree on Tailored Site Blocking Process (torrentfreak) 4. Why CloudFlare is Probably a Honeypot (cyberpunk.is) 5. The Great CloudWall (notabug.org) 6. iSucker: Big Brother Internet Culture (exiliedonline.com)

The immediate problem with CloudFlare, with a fix for lazy admins.

So here is where I'll cut to the chase for all you CloudFlare sellouts who aren't interested in the future of the internet or the threats that CloudFlare poses to it, but instead worry about “User Experience” and want more people to be able to access your website by offloading the basic work onto somebody else (yeah, I'm mad at you, stop being lazy).

The Problem:

CloudFlare's “protection” has a massive issue which blocks an entire demographic of users from accessing your site because it will consistently have “false-positives” about threats, and this demographic is Tor users (Who uses tor?). Basically, any tor user is systematically blocked from viewing not only your websites behind CloudFlare, but also any resources like media hosted behind CloudFlare. For more details from the tor project about how this makes the average user's experience on the modern internet completely unusable because of admin incompetence (looking at you, admin) you can read more here. Something to note is that CloudFlare has a bunch of highly skilled PR people who train their support employees and marketing department to avoid the word “block” and instead say newspeak like “challenged” or “flagged due to threat score”, which are all the same thing in practice.

The Lazy Workaround:

CloudFlare allows its useds (the administrators being used to dragnet its user's data) to allow for an exception to the prohibitive blocking which harasses tor users. CloudFlare treats Tor exit-relays like a “country” under its UI (Tor (T1)), and to allow them to visit go to the IP Firewall > Access Rules panel and select the Whitelist option for Tor.

cloudflare_firewall_UI

Notice how I hosted this media on my site which doesn't have CloudFlare to make it accessible rather than linking to CloudFlare.

And then if that's all you are here for, that's it. I would invite you to read on though.

The fundamental issue with CloudFlare and similar services.

A couple of years ago the web started becoming incredibly bloated with redundant technologies, adverts, trackers, and such which all went beyond displaying the content the user asked for using open, standardized technologies. As a result of this, CDNs (Content Delivery Networks), which had previously been relegated to hosting media content like videos and images which they can cache and serve faster for larger audiences began hosting javascript frameworks and applets embedded in webpages along with various forms of tracking and social media connections. Once this started happening, and web design stopped being about how to make your content compatible with as many browsers as possible using the features from web standards (HTML+CSS), and more about unnecessarily mimicking much of that functionality with javascript frameworks (which take quadruple the time to load and create a bottleneck for the user's browsing experience). It was inevitable that a lazy solution would come presenting itself as the solution to a lazy problem (relevant talk and relevant article). Nowadays CloudFlare is far more, its the ultimate laziness bundle for web administrators who won't properly configure a server to use caching, retain a valid TLS certificate, perform load balancing, and more. Its the reverse proxy for people too lazy to know what that even means plus a bunch of shiny bits on top. It is common to hear CloudFlare being used for the purpose of “DDOS protection”, sure, making all your traffic go through foreign servers does have that as a side effect, but it seems like most hosting services these days offer that regardless, and self-hosters probably only need to setup some basic DDOS mitigation on a private server. But enough of reviewing what CloudFlare already is with my extremely sarcastic tone, what's the problem here with CloudFlare in particular?

Aside from the web becoming a bloated mess and needing all this stuff in the first place, one way or another, CloudFlare represents a model web-service which negates all the privacy and security benefits of independent hosting. User connections to sites configured with CloudFlare are decrypted not at the site itself, but at CloudFlare's servers, allowing them to snoop like teenagers fiddling around with Wireshark in 2004 before HTTPS was being used by most websites. Even worse, traffic passed between two servers each configured to use CloudFlare is owned by CloudFlare at both ends. This comes with extreme privacy and security implications which are at least partially explored here, but have otherwise not received any attention whatsoever. As services like CloudFlare become more and more “Comprehensive”, and more and more security responsibilities are passed off to them by administrators, the purpose for these privacy and security features to begin with is being negated. I'm not the type who is interested in doing a full security analysis, but there is definitely one that deserves to be done concerning services like CloudFlare and I think I have made clear the fundamental issue at the very least. I urge administrators to take back the responsibilities of their jobs and quit handing off their duties to companies like CloudFlare or else we are in for serious trouble in the future.

CloudFlare as a threat to federation.

Whether you are talking about the fediverse (Mastodon/Pleroma/Misskey), or any other federated network the motivations behind such projects as of late can be clearly outlined:

  • Decentralization of Power (Not beholden to any single administration and its policies)
  • Privacy (Anti-Mass surveillance)
  • Interoperability (Anyone can run a node following the specification and expect it to work with the rest of the network)

These have been, and remain, the appeals of federated networks for social networking.

However, the increasing and alarming trend of administrators in these federated networks to use CloudFlare threatens all three of these.

Decentralization of Power

CloudFlare threatens decentralization of power by being in a position to deny service to nodes in the federated network by its own policies. Any portion of the network running on CloudFlare is not subject to the policies of a diverse selection of hosting services, but a singular entity's conditions and terms of service which are subject to change at any point in time. Using CloudFlare is re-centralizing power.

Privacy

I will not get into too many specifics, but above in the “Fundamental issue” section I outlined the general security issue presented by having major nodes on CloudFlare's network. The threat described above applies only to HTTPS and secure web connections, but the degree of this issue can vary from more mild to extremely concerning based on how nodes in a federated network communicate with each other.

Interoperability

One of the benefits of federated networks is that nodes can provide entry points from different networks altogether which once connected allow users who prefer to browse, message, or interact on one service to keep in touch with users anywhere else. In principle, any server which implements of the specifications of a federated network and begins federating with peers on other networks can participate. CloudFlare's policy of blocking Tor connections, and presumably other anonymization overlay networks in the future threatens this key accessibility feature. Nodes running on CloudFlare are cut-off from nodes running via Tor (as hidden services, for example) by default. I would also suspect that CloudFlare may have the potential of limiting interoperability between networks and undercutting this accessibility property in other ways which have yet to be seen.

Any one of these on its own should be enough for a keen observer to have concerns about how CloudFlare usage may effect the future of federated networks, but all three of these have been the case for quite some time now. I think it is time to ring the alarm bells.

CloudFlare's expansion into the decentralized web and beyond.

CloudFlare's business model, surprise surprise, keeps finding new ways to coincidentally end up as an intermediary for internet traffic. Here I will outline the new and innovative ways in which CloudFlare is commercializing alternative and traditional networks while simultaneously deanonymizing users and re-centering trust towards their proprietary infrastructure:

There are many admins out there with no regard for the future of the internet as it becomes hyper-centralized, and as few mega-corporations accumulate absolute power. An example would be /r/selfhosted and all the people behind sites which rely on ad-revenue.

And so what if you don't too?

Fuck you and Fuck CloudFlare.

The writefreely software depends on a javascript plugin called highlight.js to support syntax-highlighting for code-blocks in your blog posts. This guide will outline how to effectively integrate this feature into your blog and customize it to best suit your blog's style. Syntax highlighting isn't possible for inline code, but you can still make it look fairly nice as well if you reference my custom CSS. Preemptive disclaimer: I'm not a web-dev so I don't actually know the proper terminology for some javascript and CSS stuff so let me know on the fediverse if I made any mistakes in that regard.

Here's the general outline: 1. Use the correct markdown spec to specify your coding language 2. Avoid using hashtags in the code block (octothorpes if you are a nerd) since they break with the code block plugin 3. Add a fix to your blog's custom CSS to disable line-wrapping and add a horizontal scrollbar to the code block 4. (Optional) Setup a custom code block color theme to match your blog

Making a code block which specifies the coding language

To specify the coding language in your code-block so the plugin can do fancy syntax-highlighting you need to specify the name of the coding language right after you begin the code block like in this picture: css_codeblock

This code block is specifying that the code inside of it is CSS so the plugin can correctly highlight it. If you do not know what the name for a language might be you can browse the languages supported by the highlight.js plugin here.

By default the plugin on write-freely instances only actually has support for a small subset of these languages downloaded. You can tell which languages are supported by your instance by viewing the source of your blog page when it has a code block on it and searching for “var langs” which has a list of all the languages your particular instance supports next to it. If your language is not in this list, but it is in the list above, consider contacting your admin and asking them to download a version of highlight.js with your language supported.

Avoid using hashtags in the code block

“How can we move forward here? This is probably less trivial than it looks.” – mrvdb on github

Unfortunately writefreely has a known bug that causes hashtags inside of codeblocks to be parsed incorrectly. Until this is fixed, any code blocks containing one will be horribly mangled by an ugly html tag that appears in place of the hashtag. If you are using a language which has syntactically meaningful hashtags (not just for comments, which can be done with the blog post itself), it may be worth substituting hashtags with this character “⋕” and just letting your reader know since if they copy-paste it and get syntax errors they will be very confused.

Fixing your custom-CSS with a scrollbar

By default, writefreely's highlight.js theme has line-breaks enabled which can lead to ugly messes like this: linewrapped_code To avoid this, you can add this CSS to your blog's custom CSS:


/* custom line-wrapping fix for highlight.js */
.hljs {
    white-space: pre !important;
    overflow-x: auto !important;
}

Once this has been added, line-wrapping will no longer occur, and a scrollbar will be added to the bottom of the codeblock view which allows readers to scroll horizontally and reach any code which was previously out of sight.

Setup a custom code block color scheme

The highlight.js project has a large variety of CSS themes to choose from here and installing one of them is almost as simple as copying and pasting. I say almost as simple, because you will have to do a bit of slightly-tedious correction work that anyone should be able to figure out even if you don't know CSS syntax.

Since your writefreely instance has a preset syntax-highlighting color scheme set for highlight.js for default, that means that your custom theme CSS has to override the defaults, which in practice just means adding !important after every single property. The code block with the fix for line-wrapping does this above, but for reference here is my blog's custom CSS with the a11y-dark theme at the bottom of the file using !important on every CSS property. I recommend editing a custom CSS file in your own editor rather than just pasting it straight into the custom-css box on your blog's settings page since the box on your settings page does not let you copy text out of it with your clipboard (at least in my experience).

With all that done, you should have fairly-functional and aesthetically appealing code-blocks for all of your blog posts.S