The 8 Fallacies of Distributed Computing for PHP Developers

Seven fallacies of developing distributed computing applications were coined in 1997 by Peter Deutsch. Later, the eighth was coined by James Gosling (the father of Java). These fallacies directly relate to us as PHP developers since we build distributed applications each and every day. We build mashups, applications that interact with SOAP and REST services, authenticate users via Facebook, Google, or Twitter APIs, retrieve information from remote databases and caching services, etc. Make no mistake, we’re building distributed computer applications. Given that we are building distributed applications, it’s important that we understand the eight fallacies and how they affect us.

1. The Network is Reliable

It’s fair to say that this is obviously untrue. Though network latency has decreased and bandwidth has increased markedly year after year since 1995, to say that the network is reliable is false. Let’s say we’ve setup a simple application that doesn’t use too many services – a basic PHP application that uses MySQL as it’s backend. There’s arguably not much that could go wrong. However, let’s say that later we decide to go with a hosting provider for MySQL, such as Xeround to provide our database needs. Despite good scalability and high availability, what if something goes wrong at their end? What if their infrastructure suffers a DDoS attack or has downtime because of a internal issue? We hear quite a lot about 99.999% uptime, but even that is never 100%. With the proliferation of services and the usually high availability of bandwidth available today, it can be easy to forget that nothing’s ever perfect. How do you account for a failure of a service within your application?

2. Latency is Zero

Though latency may be low, indeed lower than it was some years ago, it’s never zero. To quote Arnon Rotem-Gal-Oz in his Fallacies of Distributed Computing Explained post:

At roughly 300,000 kilometers per second (3.6 * 10E12 teraangstrom per fortnight), it will always take at least 30 milliseconds to send a ping from Europe to the US and back, even if the processing would be done in real time.

Is this a bad thing? Well, yes and no. Depending on how we structure our application and the resources available to us, we can largely mitigate the issue of latency. Instead of having our applications deployed in a single datacenter, we can host it with a service such as Amazon Web Services and make use of S3 so that we have data located in several regions around the world, bringing it closer to our end users and reducing the latency of the application over the network. But even though we can reduce latency, we can’t remove it. We can employ a series of methods and architectures to reduce its impact on us, but no matter what we do, it will always be present. Have you considered this when you designed your application?

3. Bandwidth is Infinite

Can bandwidth really be infinite? If so, at what price is it infinite? When we consider that the web is increasingly going mobile, everything old is new again. Now I’m not suggesting that we’re starting over from the speed of dial-up, and the newer 4G networks are notable faster than the earlier 2G and 3G networks. But still, even their peak data rates are currently less than those of a standard broadband connection. Also, with the increasing uptake of mobile broadband, the amount of possible users seeking to use our service (we all want to be popular and have at least some of the success of Facebook) is growing at a phenomenal rate. Consider these statistics from mobithinking:

There are 5.9 billion mobile subscribers.
There are 1.2 billion mobile web users with 3G coverage.
Mobile devices account for 8.49 percent of global hits.

Given that, it’s fair to say that even though bandwidth rates and the penetration of it around the world is increasing, the rate of increase of users serves to balance it out. Going further, with the massive flexibility that mobile broadband provides, a clear adhoc consumption of service naturally arrives. Are you prepared for the sheer volume of potential load on your service? Can you handle the spike that this kind of availability can deliver?

4. The Network is Secure

I think it’s fair to say without going in to too much detail that this is, and will always be, false. If you have any doubts, maybe you should talk to a LinkedIn or eHarmony member. When we design and deploy our applications, how much emphasis do we place on security, both in where the application’s hosted, such as Rackspace, PagodaBox or cloudControl, and also in the design of the application itself? According to SecurityAffairs, Prolexic reported:

3,000% quarter-on-quarter increase in malicious packet traffic targeting the financial services sector.
19.1TB of data and 14 billion packets of malicious traffic against financial services sector during Q4 2011, increasing during 2012.
65TB of data and 1.1 trillion packets that were identified and mitigated in 2012, 80 times greater than in 2011.

Given that the network is not secure, we need to be certain that we are using good security practices as a matter of course. Given the plethora of good advice from such sources as Chris Shiflett’s blog, Essential PHP Security, the PHP Security Consortium, and others, it’s hard not to know how and why to bake security in to the core of our applications. What are your security practices? Do you assess the vendors that you deploy with?

5. Topology doesn’t Change

Doesn’t it? Really? Does it not change, or do we just not know about it? When we host our applications with others, we just don’t know. If the provider reconfigures their data center, upgrades it, adjusts it, for whatever reason, the topology changes. Given the earlier reference to the increased rate of smart phone usage, the topology changes frequently. From both a user and a provider perspective, topology can change nearly daily! If the topology changes and an external service that it relies on can no longer be reached, resulting in say, no database access, then sure this is an issue. But if, internally at our provider, things change, and the application continues to function, then it may not be a problem. Granted, it’s easy to code an application that’s small and hosted in a simple configuration. But applications change, and those that gain in popularity more so. Do you consider changes in topology in your design? How do you account for or handle failures in the application design and deployment design?

6. There is One Administrator

“But I have my application hosted with a single service provider. They provide the OS, database, and web server support”, you say. Okay, assuming that that’s your application, is there really one administrator? And if there was really only one administrator, would you really trust the provider with your application? I’d hate to think what could wrong if they were sick or went on vacation. Normally, there will be at least a few administrators, though each may not have the same level of training and astuteness, both technically and more broadly. There should be policies in place, such as network intrusion detection and other security policies, but there’s no guarantee they will all follow them with the same level of thoroughness and diligence. Given the plethora of hosting providers available today and the low time required to update DNS records, we have a lot of choice and flexibility that if one provider’s not meeting our needs and expectations, we can move away from them to another. Have you considered how this affects you? What if you’re not in a position where you can easily change vendors? What if you have a high amount of vendor lock-in, or it will be costly to move? What if your application’s architecture is not flexible enough? What can you do to mitigate such risks?

7. Transport Cost is Zero

As with all of the statements thus far, the validity of this too is highly unlikely. If the servers that support our application are in the same rack in the same data center then transport cost can be greatly reduced, but in terms of the time cost. What about the monetary costs? Yes, we can infinitely scale up and down elastically as the demand requires it and we can store the data for our applications across geo-located data centers so that it’s as physically close to our end user as possible, but at what cost? What’s the architecture composition of your application or service? Is it approaching zero in respect to either cost or time? If you could reduce one, does it increase the other?

8. The Network is Homogeneous

Unlike the other fallacies, I think that this is one where we as PHP developers inherently understand. We host our applications on Windows, Linux, Solaris, BSD and Mac OS X servers. We use MySQL, SQLServer, SQLite, PostgreSQL, mongoDB, Hadoop, and Oracle for storing data. We consume external services via XML or JSON requiring different interfaces. As a multi-operating system and multi-service community, arguably right from the early days, we’ve never expected a homogeneous network. But the question still needs to be asked, are you flexible in your approach? Can you work with multiple databases and datasources? Do you use relevant design patterns, such as abstract factory, to to consume data from a variety of sources and types with a transparent code interface? Or does your code break if you need to do something as simple as swap from XML to JSON?

In Conclusion

I think as PHP developers the eight fallacies of distributed computing are as relevant as they ever were. Given the plethora of information and resources available, we are in a great position to understand them and mitigate the risks that stem from believing in them. What do you think? Do you account for them when developing your applications and services? How do you think the eight fallacies affect your application?

Frequently Asked Questions (FAQs) about Fallacies of Distributed Computing for PHP Developers

What are the common misconceptions about distributed computing?

Distributed computing is a complex field and there are several misconceptions about it. The most common ones include the belief that the network is reliable, latency is zero, bandwidth is infinite, the network is secure, topology doesn’t change, there is one administrator, transport cost is zero, and the network is homogeneous. These misconceptions, also known as fallacies, can lead to significant problems in the design and implementation of distributed systems.

How does latency affect distributed computing?

Latency is the delay before a transfer of data begins following an instruction for its transfer. In distributed computing, latency can significantly affect the performance of the system. If the latency is high, it can cause delays in data transfer, leading to slower system performance. Therefore, it’s crucial to consider latency when designing and implementing distributed systems.

Why is the network reliability a fallacy in distributed computing?

The belief that the network is reliable is a fallacy because networks can and do fail. This can be due to various reasons such as hardware failures, software bugs, and network congestion. When the network fails, it can lead to data loss or delays in data transfer. Therefore, it’s important to design distributed systems with the assumption that network failures can occur.

How does the fallacy of infinite bandwidth affect distributed computing?

The fallacy of infinite bandwidth assumes that there will always be enough network capacity to handle data transfer. However, in reality, network capacity is limited and can be affected by various factors such as network congestion and hardware limitations. If the system is designed with the assumption of infinite bandwidth, it can lead to performance issues when the network capacity is not sufficient.

Why is network security a fallacy in distributed computing?

The assumption that the network is secure is a fallacy because no network is completely secure. There are always potential threats and vulnerabilities that can lead to security breaches. Therefore, it’s important to implement robust security measures and regularly update them to protect the network and the data it carries.

How does the fallacy of a single administrator affect distributed computing?

The fallacy of a single administrator assumes that there is only one person or entity controlling the network. However, in most distributed systems, there are multiple administrators. This can lead to issues with coordination and control, and can also increase the risk of security breaches.

Why is the assumption of zero transport cost a fallacy in distributed computing?

The assumption of zero transport cost is a fallacy because there are always costs associated with data transfer in a network. These costs can be monetary, such as the cost of network infrastructure, or they can be in terms of resources, such as bandwidth and processing power. Therefore, it’s important to consider these costs when designing and implementing distributed systems.

How does the fallacy of a homogeneous network affect distributed computing?

The fallacy of a homogeneous network assumes that all network nodes are the same and can handle the same tasks. However, in reality, networks are often heterogeneous, with different nodes having different capabilities. This can lead to issues with load balancing and can affect the performance of the system.

How can these fallacies be avoided in distributed computing?

These fallacies can be avoided by understanding and acknowledging them during the design and implementation of distributed systems. It’s important to consider factors such as network reliability, latency, bandwidth, security, topology, administration, transport cost, and network heterogeneity. By taking these factors into account, you can design a more robust and efficient distributed system.

What are some resources for learning more about distributed computing?

There are many resources available for learning more about distributed computing. Some recommended ones include textbooks on distributed systems, online courses on platforms like Coursera and Udemy, and research papers on distributed computing. Additionally, online forums and communities can also be a great source of information and help.