In a typical grid computing setup, each computer is granted access to the resources of all other computers in the network. This creates an interconnected environment that powers collaborative work on complex projects. MytourConsider a scientist analyzing proteins by logging into a computer, utilizing a vast network of systems to process data. A business executive forecasts stock trends by connecting through a PDA to their company's network. An Army officer orchestrates resources across multiple military networks to plan a strategy. These scenarios are all united by a single concept: grid computing.
Grid computing is essentially a network where resources like processing power, memory, and storage are shared across all connected computers. Users authorized to access the system can leverage these community resources for specific tasks. A grid can range from a simple collection of identical computers running on the same operating system to an intricate web of interlinked systems, utilizing diverse platforms.
Grid computing is an evolution of distributed computing, where resources across a network are shared. In an ideal grid setup, every resource is interconnected, transforming a simple computer network into a formidable supercomputer. With the right interface, accessing such a system feels just like using a local machine's resources, providing immense processing and storage capacity to all authorized users.
While the concept of grid computing isn't new, it remains a work in progress. Computer scientists, engineers, and programmers are continually refining standards and protocols. Currently, many grid systems depend on proprietary software and tools. Once a unified set of standards and protocols is agreed upon, the adoption of grid computing will become more streamlined and efficient for organizations.
Curious about what exactly a grid computing system entails? Keep reading to discover more.
Grid Computing Overview
Grid computing systems function on the principle of resource pooling. Imagine you're going on a camping trip with a few friends. You have a large tent and offer to share it. One friend brings food, and another offers to drive everyone in his SUV. Once at the campsite, you all contribute your skills and knowledge to ensure the trip goes smoothly. If you had gone alone, it would have taken much longer to gather everything and you would have faced more challenges during the trip.
A grid computing system follows the same concept: distributing the workload across multiple computers to finish tasks faster and more efficiently. Before diving deeper, let's first understand the resources of a computer:
- Central Processing Unit (CPU): A CPU is a microprocessor responsible for executing mathematical calculations and managing the flow of data to various memory locations. A computer can have multiple CPUs working together.
- Memory: In general terms, a computer's memory serves as temporary storage, holding relevant data for quick access by the microprocessor. Without memory, the processor would need to retrieve data from slower storage devices like a hard disk drive.
- Storage: In grid computing, storage refers to permanent data storage solutions, such as hard disk drives or databases.
A computer typically operates within the limits of its own resources. There's a maximum speed at which it can perform operations and a finite amount of data it can store. While many computers are upgradeable, meaning you can add more processing power or storage, these enhancements only provide incremental improvements in performance.
Grid computing systems connect multiple computer resources, allowing a user to tap into the combined power of all the computers in the network. For the individual, this setup makes it seem as though their own computer has become a supercomputer, thanks to the shared computational capabilities.
While reading about grid computing, you may encounter unfamiliar terms and concepts. What do these terms mean? Keep reading to learn more.
Grid computing is still an evolving field, with connections to other advanced computing systems, some of which are subtypes of grid computing. Shared computing refers to a network of computers working together to handle a specific task by sharing processing power. Another related model is software-as-a-service (SaaS), known as utility computing, where a company offers services like data storage or additional processing power on a metered basis. Cloud computing allows applications and storage to be accessed on the Web, instead of residing on a local computer.
Grid Computing Lexicon
If you're new to grid computing, understanding the terminology can be overwhelming. Here's a brief guide to some key terms you'll come across while exploring grid computing:
- Cluster: A collection of networked computers that share common resources.
- Extensible Markup Language (XML): A language for describing data that both computers and humans can interpret. Control nodes rely on XML-based languages like the Web Services Description Language (WSDL) to manage data and applications.
- Hubs: A network point where multiple devices connect and communicate.
- Integrated Development Environment (IDE): The suite of tools and resources developers use to create software for a given platform. Testing environments are often referred to as sandboxes.
- Interoperability: The capability of software to function seamlessly across various platforms. For instance, both PCs and Mac computers can interact if their software supports interoperability despite having different operating systems and architectures.
- Open standards: Publicly accessible technical standards. Unlike proprietary standards, which are owned by specific organizations, open standards are available for anyone to use and adopt. Open standards simplify integration between applications compared to proprietary standards.
- Parallel processing: The simultaneous use of multiple CPUs to solve a single problem. This concept is closely related to shared computing, where unused resources on a network are harnessed for specific tasks.
- Platform: A base upon which software applications are built. It could refer to an operating system, a hardware architecture, a programming language, or even a website.
- Server farm: A collection of servers designed to handle tasks too complex for a single server.
- Server virtualization: A method where a single physical server is divided into several virtual servers, each running its own independent operating system. For example, a server could simultaneously host a Linux virtual server and a Windows virtual server, thanks to virtualization technologies. This is often used to reduce hardware expenses in grid computing systems.
- Service: In grid computing, a service refers to a software system that enables communication between computers across a network.
- Simple Object Access Protocol (SOAP): A set of protocols for exchanging XML-based messages across networks. Developed by Microsoft, it allows different systems to communicate.
- State: In IT, a state refers to persistent data that remains after being used within an application. For example, when you add items to an Amazon.com shopping cart, the system retains your selections even as you navigate other parts of the site. Stateful services make it possible for applications to operate over several stages while maintaining the same underlying data.
- Transience: The ability to activate or deactivate a service across a network without disrupting other operations.
Curious about how grid computing links various computer resources together? Stay tuned for the next section to find out.
Sharing Resources
Numerous companies and organizations are collaborating to establish a universal set of rules known as protocols that will simplify the creation of grid computing environments. Although grid computing systems already exist, the challenge lies in the lack of a unified approach. This means that two separate systems might not work together, as each one uses its own specific protocols and tools.
In general, a grid computing system typically requires:
- At least one computer, usually a server, to manage all administrative tasks for the system. This type of computer is often called a control node. Other servers, both physical and virtual, offer particular services within the system.
- A network of computers equipped with specialized grid computing software. These computers serve as both user interfaces and the resources used by the system for different tasks. Grid computing systems can either consist of multiple computers with identical specifications running the same operating system (known as a homogeneous system) or a mix of various computers with different operating systems (a heterogeneous system). The network can range from a wired system where each computer is physically connected to a system to an open system where computers link up over the Internet.
- A suite of software referred to as middleware. Middleware enables different computers to execute processes or applications across the network. It acts as the backbone of the grid computing system. Without middleware, communication between the computers wouldn't be possible. Like most software, there isn't a single standard format for middleware.
If middleware is the engine that drives the grid computing system, the control node functions as the manager. The control node's role is to prioritize and schedule tasks across the network, determining which resources each task can access. It also monitors the system to prevent overloads, ensuring that each user connected to the network doesn't experience a decline in performance. A grid computing system should be able to tap into unused computer resources without negatively affecting other operations.
The potential for grid computing applications is vast, but this potential can only be realized once standardized protocols and tools are in place. Without a common format, third-party developers — those independent programmers wanting to build applications on the grid computing platform — often find it difficult to create applications that are compatible across different systems. While it's possible to develop different versions of the same application for various systems, it's time-consuming, and many developers prefer not to repeat the work. Having a standardized set of protocols would allow developers to focus on a single format when creating applications.
What are the major criticisms and concerns that people have regarding grid computing? Keep reading to explore the details.
- The Open Grid Forum (OGF), which developed a crucial set of standards called the Open Grid Services Architecture (OGSA).
- IBM
- Microsoft
- The Organization for the Advancement of Structured Information Standards (OASIS), a nonprofit organization.
- The Globus Alliance, an international group of computer scientists.
Concerns about Grid Computing
When you connect two or more computers, several important questions arise: How do you ensure the privacy of personal information? How can you protect the system from malicious hackers? How do you control who can access and utilize the system's resources? How do you prevent a user from monopolizing all the system's resources?
The quick answer is middleware. A grid computing system, by itself, cannot address these concerns. The emerging protocols for grid systems are intended to simplify the process for developers in creating applications and facilitating communication between computers.
The most common method used by computer engineers to safeguard data is encryption. Encryption is the process of encoding information in such a way that only those with the correct key can decode and access it. Interestingly, a hacker could potentially build a grid computing system with the goal of breaking encrypted data. Since encryption relies on complex algorithms, it would typically take a single computer many years to crack a code (which often involves finding the two largest prime factors of a massive number). However, with the power of a sufficiently large grid computing system, a hacker could potentially reduce the time required to decrypt the data.
Protecting a system from hackers is particularly difficult when the system depends on open standards. Every computer in a grid computing network must have specific software to enable communication and interaction with the system as a whole—computers cannot perform this task independently. If the software is proprietary, it might make it harder (though not impossible) for a hacker to infiltrate the system.
In most grid computing systems, only authorized users are granted full access to the network's capabilities. Without this limitation, the control node would become overwhelmed with processing requests, leading to a standstill (this issue is known as deadlock in IT terminology). Additionally, limiting access is essential for security reasons. Consequently, most systems implement authorization and authentication protocols, restricting access to a select group of users. Other users can still interact with their individual machines, but they cannot tap into the resources of the entire network.
The middleware and control node of a grid computing system play a key role in maintaining the system's efficiency. Together, they regulate how much access each computer has to the network's resources and vice versa. While it's critical not to allow any single computer to dominate the network, it’s equally important to prevent network applications from monopolizing the resources of a single computer. If the system deprives users of computing resources, it becomes inefficient.
How are people currently utilizing grid computing systems? Continue reading to discover more.
Applications of Grid Computing
While there are numerous grid computing systems, most of them only partially align with the full definition of a true grid computing system. Many of the systems in use today are tied to academic and research organizations. These networks make use of unused computing power. The most accurate term for these systems would be shared computing networks.
One of the first grid computing systems to gain widespread recognition was the Search for Extraterrestrial Intelligence (SETI) project. SETI's goal is to analyze data from radio telescopes in search of signs of intelligent alien communication. The volume of data collected is simply too vast for a single computer to process effectively. To address this, SETI created SETI@home, a program that connects computers to form a distributed virtual supercomputer.
A similar initiative is the Folding@home project, run by the Pande Group, a nonprofit organization at Stanford University's chemistry department. The focus of the Pande Group's research is proteins, specifically their folding patterns and how these structures influence their functions. Scientists suspect that protein misfolding could play a role in diseases like Parkinson's and Alzheimer's. By studying these folding processes, the Pande Group hopes to uncover new treatments or even cures for these conditions.
There are many other active grid computing projects, each serving different purposes. However, most of these projects are temporary, dissolving once their goals are achieved. Occasionally, a new project related to the original will take its place after completion.
While each project has its own distinct characteristics, the general process of participation remains consistent. A user wishing to join simply downloads the project's application from its official Web site. After installation, the app connects to the project's control node, which sends a data chunk to the user's computer for analysis. The software uses unused CPU power to analyze the data. If the user launches a resource-heavy application, the project software pauses temporarily. Once CPU usage returns to normal, the analysis resumes.
After completing the analysis, the user's computer sends the results back to the control node, which forwards the data to the appropriate database. The control node then sends another data chunk to the user's machine, and the process continues. If the project garners enough participants, it can achieve significant objectives in a much shorter time frame.
As grid computing systems evolve and grow more advanced, we'll likely see a surge in the creation of dynamic networks by organizations and corporations. It's possible that one day companies could interconnect, allowing them to tackle computational challenges that now seem insurmountable, all within a matter of hours. Only time will tell.
For further reading on grid computing and related subjects, explore the links provided on the next page.
The Genome Comparison Project, a groundbreaking study comparing the protein sequences of over 3,500 organisms, kicked off on December 20, 2006. By July 21, 2007, the project achieved all its objectives, thanks to a grid computing system.
