Attribute-specific Control Units in StyleGAN for Fine-grained Image Manipulation

Rui Wang¹ Jian Chen² Gang Yu² Li Sun³ Changqian Yu¹ Changxin Gao^1* Nong Sang¹

¹Key Laboratory of Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China ²Tencent, Shanghai, China ³East China Normal University, Shanghai, China

Arxiv | ACM Multimedia | Code

Abstract

Image manipulation with StyleGAN has been an increasing concern in recent years. Recent works have achieved tremendous success in analyzing several semantic latent spaces to edit the attributes of the generated images. However, due to the limited semantic and spatial manipulation precision in these latent spaces, the existing endeavors are defeated in fine-grained StyleGAN image manipulation, i.e., local attribute translation. To address this issue, we discover attribute-specific control units, which consist of multiple channels of feature maps and modulation styles. Specifically, we collaboratively manipulate the modulation style channels and feature maps in control units rather than individual ones to obtain the semantic and spatial disentangled controls. Furthermore, we propose a simple yet effective method to detect the attribute-specific control units. We move the modulation style along a specific sparse direction vector and replace the filter-wise styles used to compute the feature maps to manipulate these control units. We evaluate our proposed method in various face attribute manipulation tasks. Extensive qualitative and quantitative results demonstrate that our proposed method performs favorably against the state-of-the-art methods. The manipulation results of real images further show the effectiveness of our method.

Attribute-Specific Control Units

Feature maps in StyleGAN2 Generator activate consistently in different semantic regions across various generated images:

channels_of_feature_maps — The first 9 channels of the input features of the 11th convolutional layer.

We divided each channel of the intermediate features into different region-specific groups based on the spatial location of the top activated region of the feature map with a simple yet effective gradient-based strategy.

So we can modify the modulation styles of the convolutional layer by moving along the sparse direction vector. More specifically, we use a portion of the difference between the positive and negative sample latent code as the editing direction vector.

portion_of_difference — The edited result of the sparse direction vector

However, the results manipulated by these sparse direction vectors still suffer from the insufficient change or entanglement issue:

The editing results are strongly correlated with the spatial distribution of the feature maps. We should collaboratively manipulate the modulation styles and feature maps rather than individual ones to obtain the fine-grained controls.

The specific semantic region’s attribute is controlled by a few channels of intermediate feature and its corresponding modulation styles, which are represented as control units.

Pipeline

Visualization of a typical attribute manipulation pipeline:

framework — The gray feature maps indicate that they have changed because of our modification.

Our modification consists of a optimized styles $\hat{S}^{l-1}$ and a direction vector $\Delta{S}^l$. A few channels of $F^l$ are replaced by $F^{l}_{U_a}$ computed with $\hat{S}^{l-1}$, while other channels of $F^l$ keep untouched. The original modulation style $S^l$ and $\Delta{S}^l$ are summed to form the new modulation style.

Results

control_units — Control units for various attributes

Comparison of the manipulation to control units with/without replacement

Manipulating various different attributes of different semantic regions

Citation

@inproceedings{10.1145/3474085.3475274,
    author = {Wang, Rui and Chen, Jian and Yu, Gang and Sun, Li and Yu, Changqian and Gao, Changxin and Sang, Nong},
    title = {Attribute-Specific Control Units in StyleGAN for Fine-Grained Image Manipulation},
    year = {2021},
    isbn = {9781450386517},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3474085.3475274},
    doi = {10.1145/3474085.3475274},
    booktitle = {Proceedings of the 29th ACM International Conference on Multimedia},
    pages = {926–934},
    numpages = {9},
    keywords = {generative adversarial networks(GANs), control unit, image manipulation},
    location = {Virtual Event, China},
    series = {MM '21}
}

Acknowledgments

This work is supported by National Natural Science Foundation of China (No. 61876210), Science and Technology Commission of Shanghai Municipality (No. 19511120800).

Contact

Rui Wang