인지 무선망을 위한 기계 학습 기반 무선 자원 관리 기법 연구
- Spectrum scarcity is one of the essential issues in fifth-generation (5G) and beyond communication systems. Moreover, in the last few decades, the number of dramatically increasing mobile applications led to surging demand for radio resources. In order to tackle with spectrum inefficiency issue, the dynamic spectrum access techniques (i.e., cognitive radio (CR)), ambient backscatter communication, and non-orthogonal multiple access (NOMA) are studied. In cognitive radio networks (CRNs), cognitive users (CUs) are able to utilize the licensed spectrum bands of the primary users (PUs) while either the interference caused by the cognitive users is acceptable or the PUs are inactive at that time. On the other hand, ambient backscatter communication is emerging technique for green communication, where its key idea is to transmit data from a transmitter to its corresponding receiver by backscattering the signals via an ambient radio frequency (RF) source. In addition, NOMA allows multiple users to use the same frequency and time resources for their data transmissions. The integration of these techniques is capable of further advancing the spectrum efficiency in wireless communication systems.
Along with rapid developments of mobile devices, energy management also becomes a crucial issue since most of the smart mobile devices require long-term operation to meet their high energy consumption applications, but the battery capacity is still limited. In recent works, wireless communications powered by external harvested energy have become a promising technique to solve the energy-constrained problem. Radio frequency (RF)-harvested energy in a CRN is one of the potential solutions for energy-constrained issue in wireless networking, where the wireless devices can harvest energy from ambient RF signals. In addition, the wireless devices can also harvest ambient energy for their rechargeable batteries from perpetual non-RF sources (i.e., solar, wind…).
Nowadays, dynamic resource allocation algorithms for the energy harvesting CRNs are carefully being investigated due to the crucial effect of resource management on long-term system performance. Motivated by the aforementioned survey, this dissertation will focus on these remaining issues for CRNs as follows:
Firstly, we investigated jamming attacks in the physical layer against cooperative communications networks, where a jammer tries to block the data communication between the source and destination. An energy-constrained relay is able to assist the source to forward the data to the destination even when the jammer tries to block the direct link. Due to a limited capacity battery of the relay, a non-radio frequency energy harvester equipped in the relay helps to prolong its operation. We propose a scheme based on a partially observable Markov decision process (POMDP) to find the optimal action for the source such that we can maximize the achievable throughput of cooperative communications networks. Under this scheme, the source dynamically selects the appropriate action mode for its transmission in order to obtain maximum throughput under the jamming attack. Simulation results verify that the proposed scheme is superior to the myopic scheme where only current throughput is taken into account for making decisions.
Secondly, wireless energy harvesting enables wireless-powered communications to accommodate data services in a self-sustainable manner over a long operational time. Along with energy harvesting, an ambient backscatter technique helps a secondary transmitter reflect existing RF signal sources to communicate with a secondary receiver when the primary channel (PC) is utilized. However, secondary system performance is significantly affected by factors such as the availability of the primary channel, imperfect spectrum sensing, and energy-constrained problems. Therefore, we propose a novel approach for wireless-powered CRNs to improve the transmission performance of secosystemsstems. To reduce the dependence of the secondary system on RF sources, in the paper, we provide a new paradigm by integrating ambient backscattering with both RF and non-RF wireless-powered communications to facilitate secondary communications. Based on the sensing result in a time slot, the secondary transmitter can dynamically select the operational action: 1) backscattering, 2) harvesting or 3) transmitting to maximize the long-term achievable data transmission rate at the secondary receiver. In addition, the optimal action set for cognitive radio networks with wireless-powered ambient backscatter is selected by the POMDP, which maximizes an expected transmission rate calculated over a number of subsequent time slots. The proposed scheme aims to improve long-term transmission rate of CRNs with wireless-powered ambient backscatter in comparison with conventional schemes where an action is taken only to maximize the immediate reward in every single time slot.
Thirdly, we consider an uplink NOMA cognitive system, where the SUs can jointly transmit data to the cognitive base station (CBS) over the same spectrum resources. Thereafter, successive interference cancellation (SIC) is applied at the CBS to retrieve signals transmitted by the SUs. In addition, the energy-constrained pactor-criticeless networks is taken into account. Therefore, we assume that the SUs are powered by a wireless energy harvester to prolong their operations; meanwhile, the CBS is equipped with a traditional electrical supply. Herein, we propose an actor--critic reinforcement learning approach to maximize the long-term throughput of the cognitive network. In particular, by interacting and learning directly from the environment over several time slots, the CBS can optimally assign the amount of transmission energy for each SU according to the remaining energy of the SUs and the availability of the primary channel. As a consequence, the simulation results verify that the proposed scheme outperforms other conventional approaches (such as Myopic NOMA and OMA), so the system reward is always maximized in the current time slot, in terms of overall throughput and energy efficiency.
Then, a hybrid NOMA/OMA scheme is considered for uplink wireless transmission systems where multiple cognitive users (CUs) can simultaneously transmit their data to a cognitive base station (CBS). We adopt a user-pairing algorithm in which the CUs are grouped into multiple pairs, and each group is assigned to an orthogonal sub-channel such that each user in a pair applies NOMA to transmit data to the CBS without causing interference with other groups. Subsequently, the signal transmitted by the CUs of each NOMA group can be independently retrieved by usobtain thessive interference cancellation (SIC). The CUs are assumed to harvest solar energy to maintain operations. Moreover, joint power and bandwidth allocation is taken into account at the CBS to optimize energy and spectrum efficiency in order to obtain the maximum long-term data rate for the system. To this end, we propose a deep actor-critic reinforcement learning (DACRL) algorithm to respectively model the policy function and value function for the actor and critic of the agent (i.e., the CBS), in which the actor can learn about system dynamics by interacting with the environment. Meanwhile, the critic can evaluate the action taken such that the CBS can optimally assign power and bandwidth to the CUs when the training phase finishes. Numerical results validate the superior performance of the proposed scheme, compared with other conventional schemes.
Next, we consider an uplink solar-powered cognitive radio networks (CRNs) where multiple secondary users (SUs) transmit data long-tlong-termary base station (SBS) by sharing a licensed channel of a primary system. A deep Q-learning (DQL) algorithm, which combines non-orthogonal multiple access (NOMA) and time division multiple access (TDMA) techniques, is proposed to maximize the long-term throughput of the system. By using our scheme, the agent (i.e. the SBS) can obtain the optimal decision by interacting with the environment to learn about system dynamics. Simulation results validate the superiority of the performance under the proposed scheme, compared with traditional schemes.
Consequently, we end up this dissertation by summarizing its main contributions and opening a new door for deep reinforcement learning and its applications in future wireless networks.
- 황 티 흐엉 지앙
- Issued Date
- Awarded Date
- Authorize & License
- Files in This Item:
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.