The Backbone of Crypto World: Understanding Nodes
Before we begin our article titled "What is Node, How to Install Node," we will first explain Blockchain technology and its architecture in simple terms, delving into the finest details. Subsequently, we will cover various topics such as how to set up a Node from scratch and what precautions to take.
Blockchain
The foundations of Blockchain were laid in 2008, and it gained recognition with Bitcoin in 2009. This technology is defined as a distributed ledger. In a more comprehensive definition, Blockchain is a distributed, shareable, encrypted, irreversible, and tamper-proof information repository. Blockchain is a system that verifies and stores transactions among users using the system.
Distributed Ledger:
The data held on a distributed ledger is generally in an unencrypted form, meaning anyone who wishes can access this data. Could it be encrypted? Of course! However, it would be of no use to anyone other than the key holder and would bring about different problems. In multilateral systems, every piece of data to be added to the system must adhere to a valid standard. This ensures both the integrity of the system and its acceptability by the general public. This is where the consensus structure, which we call consensus, comes into play.
In Blockchain, transactions are stored in structures called blocks, and these blocks are linked consecutively to form a chain. These blocks are created within certain rules. Each written block is distributed to distributed ledgers and added in a way that it has a copy. Thus, each new block created contains data from the previous block.
When any transaction occurs in Blockchain, it is broadcasted over the existing network. Every node, i.e., every user using the system, whether or not they have connections in the system, verifies and records the transaction by approving it. This way, the block is verified, and subsequently, this information cannot be altered or deleted.
Node:
A node represents each user in the system. Every user joining the system has a copy, i.e., a ledger or a database, of the system. This ledger is added to other nodes using a peer-to-peer protocol, eliminating the need for a third-party intermediary and ensuring the decentralization of Blockchain.
Blockchain system has some fundamental criteria and is built upon these standards.
- Distributed: This is considered the fundamental feature of Blockchain. Data is not stored in a single location; it is recorded in a distributable way and can be stored by all users.
- Transparent: The record of data is transparent to each node, and data can be verified retroactively.
- Independent: Without a central authority, thanks to the consensus structure, every node in the system can securely transfer data.
- Immutable: Data added to Blockchain cannot be updated, deleted, and is stored permanently.
- Identity Privacy: In the Blockchain system, nodes, i.e., individuals, can transfer data without revealing their identities. Knowing the person's Blockchain address is sufficient for this process.
Blockchain Record Structure
In Blockchain technology, data is always recorded with a specific sequential system. To better understand this, let's consider a fictional but straightforward illustration: What is Node?
Let's assume that a teacher in a class conducted an experiment to explain the structure of Blockchain to the students, imagining a roll call system. The teacher enters the class, hands out a blank sheet of paper, and instructs everyone to write their name, surname, signature, and date from the beginning (here, the date does not change because it refers to the instantaneous time within the same day; you can think of it in seconds).
The Blockchain record system resembles this example. The names we wrote on paper represent the data. Each name line in the system, which continues according to the rule set by the person who first wrote their name, represents structures called blocks. Each block has its own unique signature. The equivalent of time in the Blockchain system is the flow of time; information containing date and time details is added to each block as soon as it is created. Thus, data blocks created with their own signatures are lined up one after another and form Blockchain. The first record in this structure is the starting block, named Genesis, as it is the initial block.
Blockchain Sequential Structure
To understand the sequential structure, let's continue with the roll call example we provided. To explain the sequential structure, the teacher introduces a rule: everyone will verify and sign the line where the next person is located to prevent falsification, later additions, and signing in place of each other.
Now, in this new structure, any change made will have the signature of the person before it, and when the attendance book (our chain) is carefully checked, it will be easily understood when the sequence is broken. If this continues, except for the first person, in addition to our friend who writes his name and signs every new name, the signature of the previous person will also be added.
The first created block is named Genesis, the starting block, because there is no block before it, and it carries only its digital signature. However, each subsequent block will carry both its own signature and the unique signature of the previous one. Thus, in a sequential record structure, it becomes possible in the digital world. What is Node?
Blockchain Distributed Structure
The method the teacher uses to explain the distributed structure is as follows: he gives a blank sheet of paper to everyone and asks each person to write everyone's name on the paper in the same order, get signatures from the people with those names, and also, as we explained in the sequential structure, everyone verifies and signs the person who comes after them. In other words, everyone has a copy not in a copied form but in an approved and verified form.
Now we have a structure with a specific record and order, and copies of the generated chain are distributed to everyone. In this case, any manipulation or fraud on the sequence by a person will be noticed. Because the majority will compare their records, continue to trust the structure agreed upon by the majority, and those attempting fraud will be noticed.
Blockchain technology precisely offers us the same structure. Data is recorded not only by a single center or a group of centers but by everyone included in the system. Here, it is not necessary for the parties to know each other; what ensures trust is not the relationships between individuals but the rules initially defined for the system and the distribution of the record chain produced within these rules to everyone.
What is Node?
All blocks where Blockchain records are distributed communicate with each other, confirming that the system is intact. If a data block exits or changes in the record chain structure, the chain breaks, and the entire system removes the point with the broken/corrupt ring from the distributed ledger network. Thus, those remaining agree that the chain continues without breaking at the point where the majority agrees, and they continue to use the system.
By stretching your imagination a bit, apply the illustration we provided to all classes in all schools, in short, consider that everyone performs these processes in other schools. It sounds very strange that all the data is in a single student, probably in the real world, there is no real repository that can store so much data. Each student has a very large library. You can strain your imagination a bit. This is the idea that pushes your imagination in real life, and that's why Blockchain consists not only of coins, and in fact, coins are perhaps the weakest link in this system.
Understanding this fundamental architecture in Blockchain helps increase your mastery of nodes. If you understand this structure, other parts will become even easier. However, it is beneficial to know some basic concepts.
In the previous parts of our article, we provided a general definition of Blockchain Architecture. We explained this definition to better understand the basic logic of Nodes and to continue with a comparative method. In this part of our article, we will fully explain what Nodes are.
Briefly, if we describe a network node, it can be a point where a message is created, received, or transmitted.
So, if we ask what a node is for Blockchain, in Blockchain, a node is a system that has fundamental tasks such as maintaining the integrity of the blockchain. In the later parts of our article, we will delve into this topic in more detail. In short, a node is a copy of the blockchain located on a computer or another hardware device.
These copies we call nodes are called "Düğüm" in Turkish. These nodes follow and record all transactions that occur on the blockchain and are responsible for producing blocks when necessary. Anyone who wants to create a new node can do this easily. Running these nodes is a bit challenging, and that's why we purchase a virtual server from service providers instead of running them on our own computers, as we may not have the necessary hardware infrastructure. This ensures the continuity of our node and prevents overloading our personal computer. There are many methods and a lot of hardware that can be counted, but let's stick to this example to understand the logic.
Centralized and Decentralized Concepts:
In centralized structures, such as banks, you do not hold your data in your possession; whatever is recorded in the bank's database is what you see. You don't have authority over your transactions; you don't perform transfers or other transactions yourself; you simply make a request for them to be done, and the centralized system performs them.
The main point here is the centralization of the bank and whether you can trust its database. Many questions arise in our minds, such as whether it has the infrastructure to withstand a hack attack or how the bank can verify deleted data in the event of an attack. Similarly, when we ask these questions about the decentralized structure of Blockchain, you can find logical answers to all these questions. The goal is not to say that banks are bad, but even you cannot ensure the security of these entities; you give them to others for protection. If security is the issue, compare and decide which one is safer. When you look at the advertisements of banks, most of them make claims about trust. Everyone looks for a trustworthy bank because there is a problem here; trust is lacking, and no one wants their money to go to waste.
Node:
Now, let's move on to finding the vulnerability of the bank and making the bank a secure position, again using banks as an example. The solution is actually straightforward; the trust issue was there because of the security of the database. If we make this database owned by the bank accessible to everyone or openly available, the bank would be the most reliable bank. However, there is a detail here: does exposing this data owned by the bank endanger people's security? Yes, it does, but if this data is only numbers, and no one knows to whom it belongs except the owner, the problem disappears.
Actually, the logic of Bitcoin, in general, is based on a decentralized system where everyone can be a database. In this system where everyone can be a database, the individuals or devices that keep the records of these databases are called NODE.
Decentralization of Nodes:
We have touched on how decentralization works in nodes. So, does this decentralization have a criterion? In other words, what creates the difference between these decentralized nodes that we call "decentralized"? The decentralization of a system is directly related to how many nodes or copies of databases exist. The more copied databases there are, the more decentralized the system is.
In the case of a banking system, there is only one database, while in systems like Bitcoin, anyone can become a database. Here, two criteria determine the decentralization of nodes.
Firstly, the abundance of the number of nodes and the ease of installation of these nodes are directly related to the decentralization of the blockchain. If a blockchain cannot easily install nodes, or if it becomes difficult to keep a node open after it is installed over time, that blockchain compromises its decentralization. (This has been the case in Ethereum in recent times)
Secondly, it should not require permission from anywhere to set up a node; this is related to how much permission the blockchain needs. We touched on this in our previous blockchain articles. The decentralization of a blockchain is related to how many nodes there are, in other words, how many copies of the database exist. The more copied databases there are, the more decentralized the system is.
Node Types:
We will follow a classification type widely accepted by many people. You may find variations in this classification system in different places, but generally, this is how it is classified. We will also briefly mention other types.
Archive Nodes:
Nodes that store all data about the selected network or, more precisely, the entire data on the chain from A to Z are classified as archive nodes. In short, archive nodes are essentially a copy of the entire database on the chain, starting from the Genesis block to the last created block. A good example of this is explorers like Etherscan or Bsccan.
We use Etherscan to view transactions on the chain, but we don't necessarily need to look at Etherscan to view transactions; we can set up an archive node and access the same information. Those who provide us with details of these transactions are essentially showing us the transactions on Etherscan by setting up an archive node. Not everyone prefers to set up archive nodes because sometimes the sizes can reach very high amounts. For example, in Ethereum, the data size is more than 11 TB, and they do not attempt to set it up unless necessary.
Full Nodes:
Contrary to what you might imagine, a full node is not a node that contains all the details, but rather a node that does not store all the data. We mentioned in the previous parts of our article that archive nodes are the ones that store both blockchain history and state. In the case of full nodes, they skip the history part and only store the instant, i.e., the state part of the system. Archive nodes contain both, while full nodes only store the state part and skip the history part. Of course, full nodes are further divided into many branches.
If we continue with an example from Ethereum, the size of an Archive node can reach up to 11 TB, while a Full node reaches a size of about 1 TB. This size increases as new transactions are recorded each day, and thus, not all data is stored in the full node. In other words, a "Full Node" is a node that trims unnecessary historical information of the blockchain and keeps the necessary parts for the continuity of the system. The term "Pruning" is crucial for the full nodes to function correctly, even though they do not have all the data. Pruning is the mechanism that removes unnecessary or non-essential information (except for important data like the genesis block) that archive nodes store.
Validator/Miner (Consensus) Nodes:
Validator nodes are actually included in full nodes because they do not download the entire system. Instead, validator nodes download and run the necessary components. Validator nodes are nodes used to verify blocks that work as validators. They are nodes that validate blocks for systems like Bitcoin or Ethereum with miners.
On a standard Bitcoin network, for example, the competition to add a new block to the blockchain is called mining. Here, the miners compete to solve complex mathematical problems, and the one who solves the problem first becomes the one who adds the new block to the blockchain.
To become a miner on this network, you need to download the entire blockchain history. This is why the sizes of the miners in the Bitcoin system are so large. In Ethereum, miners are replaced by validators, but validators still need to download some necessary components, but not the entire history.
Relay Nodes:
In a network, relay nodes are used to transmit data. As the name implies, they are nodes that function to transmit information between different nodes. They are actively used, especially in network structures that have multiple separated parts, and they help to continue the continuity of the network by transmitting information between nodes.
In a blockchain network, a relay node plays a significant role in ensuring that information is quickly transmitted to other nodes in the network. In this way, relay nodes contribute to the efficiency and speed of the blockchain network.
Light Nodes:
Light nodes, also known as thin clients, are nodes that do not store the entire blockchain data. These nodes are designed to be lightweight and use less storage space compared to full nodes. Light nodes rely on other nodes in the network to provide them with the necessary information when needed.
While light nodes offer advantages such as reduced storage requirements and faster synchronization with the network, they come with the trade-off of having less control and security compared to full nodes. Light nodes are suitable for users who prioritize efficiency and do not require the complete history of the blockchain.
In conclusion, nodes play a crucial role in the functioning of blockchain networks. They maintain the integrity of the blockchain, validate transactions, and contribute to the decentralization of the network. The different types of nodes, such as archive nodes, full nodes, validator/miner nodes, relay nodes, and light nodes, serve various purposes within the blockchain ecosystem. Understanding the role of nodes is essential for anyone interested in blockchain technology and its applications. In the upcoming parts of our article, we will delve into more specific details about nodes, including their role in consensus mechanisms, security considerations, and the challenges they face in different blockchain networks.
51% Attack:
A 51% attack refers to the scenario in a blockchain where an individual or a group controlling at least 51% of the hash rate can alter and manipulate transactions on the blockchain as per their own will. The aim of a 51% attack is to obstruct the transactions of other miners on the blockchain and seize control of the system. If attackers reverse other transactions on the blockchain, it leads to what we call double spending. I won't delve fully into what double spending entails, but in brief, it's a result of the system's consensus, meaning it operates based on the majority's decision-making. If you control more than 51% of the system, akin to governing a parliament, you have decision-making authority. However, the probability of this occurring is quite low.
Double Spending:
Double spending refers to the act of using a digital asset multiple times simultaneously. In a blockchain, all transactions undergo verification. In networks like Bitcoin with a Proof-of-Work consensus model, transactions are included in the blockchain through validation by miners. If an attempt is made to execute the same transaction twice, full nodes will flag the transaction as fraudulent. This serves as a safeguard against the possibility of double spending.
If an individual or group gains control of more than 51% of the Bitcoin network, they could execute double spending within the network. Though the likelihood of a 51% attack on the Bitcoin network is minimal due to the substantial size of the blockchain and the extensive computational power required, it remains a possibility.
Assuming miners control 51% of the Bitcoin network, they have two options. They can either halt block creation or conduct double spending in the blocks they create. In the first scenario, if block creation stops, the blockchain halts but the rules cannot be altered.
In the second scenario, if double spending occurs, other full nodes will ignore these blocks. There won't be any alterations to the rules of the blockchain; only existing data may be affected, and numerous preventive measures can be implemented to counter this.
Hence, non-consensus full nodes are pivotal factors in blockchain systems, acting as defensive mechanisms during attacks. I liken this structure to our DNA, where data exists simultaneously in every cell, albeit with different functions. In a PoS system, if consensus nodes attempt an attack, non-consensus full nodes will detect this and activate punitive measures. For instance, Ethereum's transition from a PoW to a PoS consensus model highlighted a vulnerability in its decentralization concept due to the execution of nodes on centralized platforms like Amazon Web Services. Vitalik announced that any node or group violating the system's rules will be removed. This approach ensures system security and compels potential attackers to weigh their risks. Non-consensus full nodes serve this defense function.
Looking at Ethereum's security structure, it historically operates on a system where mutual losses are anticipated. Vitalik's strategy for maintaining system security involves minimizing the risks, such as the gas fees introduced post the DAO Fork hack incident, which could lead to significant financial losses. Hence, the gas fees remain high, and the focus is on implementing alternative security measures to reduce these fees.
Why should we set up nodes, and how does it benefit us?
We've discussed the advantages provided by consensus nodes, such as validator miner nodes. They offer stake or mining rewards but generally require substantial capital. For instance, Bitcoin mining demands specific equipment, and Ethereum's PoS requires a minimum of 32 ETH. While these enable solo mining and increase the chances of discovering blocks, participating in mining pools augments these odds. Mining pools function by connecting everyone around a single node, enhancing its power to validate and discover more blocks. Rewards from these blocks are distributed among the participants.
One lingering question often arises: when I earn staking rewards on Ethereum, am I doing it by setting up a node? Not exactly; you're not producing blocks or earning rewards from newly minted ETH. Instead, you provide liquidity to others who set up nodes, and they, in turn, share a portion of their profits with you. Additionally, during testing phases, setting up nodes for certain projects can earn you rewards in forms like airdrops.
Setting up a full node maximizes your security. Consider how wallets operate: when you set up a wallet that includes a full node, it's considered secure since both the full node and the wallet reside with you. However, wallets like MetaMask don't require setting up a full node; instead, they rely on a node created by MetaMask for transaction validation.
MetaMask connects to the Infura node, and all your personal data resides in Infura. This raises concerns about decentralization and security. If a breach occurs at MetaMask or Infura, our data could be compromised. Therefore, ensuring our security by setting up our own nodes is a more reliable option.