KUKU

KU ScholarWorks

  • myKU
  • Email
  • Enroll & Pay
  • KU Directory
    • Login
    View Item 
    •   KU ScholarWorks
    • Dissertations and Theses
    • Theses
    • View Item
    •   KU ScholarWorks
    • Dissertations and Theses
    • Theses
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Invernet: An Adversarial Attack Framework to Infer Downstream Context Distribution through Word Embedding Inversion

    Thumbnail
    View/Open
    Hayet_ku_0099M_18336_DATA_1.pdf (3.939Mb)
    Issue Date
    2022-05-31
    Author
    Hayet, Ishrak
    Publisher
    University of Kansas
    Format
    77 pages
    Type
    Thesis
    Degree Level
    M.S.
    Discipline
    Electrical Engineering & Computer Science
    Rights
    Copyright held by the author.
    Metadata
    Show full item record
    Abstract
    Word embedding has become a popular form of data representation that is used to train deepneural networks in many natural language processing tasks, such as machine translation, named entity recognition, information retrieval, etc. Through embedding, each word is represented as a dense vector which captures its semantic relationship with others, and can better empower machine learning models to achieve state-of-the-art performance. Due to the data and computation intensive nature of learning word embeddings from scratch, an affordable way is to borrow an existing general embedding trained on large-scale text corpora by third party (i.e., pre-training), and further specialize the embedding by training on downstream domain-specific dataset (i.e., fine-tuning). However, a privacy issue can rise during this process is that the adversarial parties who have the pre-train datasets may be able infer the key information such context distribution of downstream datasets by analyzing the fine-tuned embeddings. In this study, we aim to propose an effective way to infer the context distribution (i.e., the words co-occurrence in downstream corpora revealing particular domain information) in order to demonstrate the above-mentioned privacy concerns. Specifically, we propose a focused selection method along with a novel model inversion architecture “Invernet” to invert word embeddings into the word-to-word context information of the fine-tuned dataset. We consider the popular word2vec models including CBOW, SkipGram, and GloVe algorithms with various unsupervised settings. We conduct extensive experimental study on two real-world news datasets: Antonio Gulli’s News Dataset from Hugging Face repository and a New York Times dataset from both quantitative and qualitative perspectives. Results show that “Invernet” has been able to achieve an average F1 score of 0.70 and an average AUC score of 0.79 in an attack scenario. A concerning pattern from our experiments reveal that embedding models that are generally considered superior in different tasks tend to be more vulnerable to model inversion. Our results iiisuggest that a significant amount of context distribution information from the downstream dataset can potentially leak if an attacker gets access to the pretrained and fine-tuned word embeddings. As a result, attacks using “Invernet” can jeopardize the privacy of the users whose data might have been used to fine-tune the word embedding model.
    URI
    https://hdl.handle.net/1808/34107
    Collections
    • Theses [3772]

    Items in KU ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.


    We want to hear from you! Please share your stories about how Open Access to this item benefits YOU.


    Contact KU ScholarWorks
    785-864-8983
    KU Libraries
    1425 Jayhawk Blvd
    Lawrence, KS 66045
    785-864-8983

    KU Libraries
    1425 Jayhawk Blvd
    Lawrence, KS 66045
    Image Credits
     

     

    Browse

    All of KU ScholarWorksCommunities & CollectionsThis Collection

    My Account

    Login

    Statistics

    View Usage Statistics

    Contact KU ScholarWorks
    785-864-8983
    KU Libraries
    1425 Jayhawk Blvd
    Lawrence, KS 66045
    785-864-8983

    KU Libraries
    1425 Jayhawk Blvd
    Lawrence, KS 66045
    Image Credits
     

     

    The University of Kansas
      Contact KU ScholarWorks
    Lawrence, KS | Maps
     
    • Academics
    • Admission
    • Alumni
    • Athletics
    • Campuses
    • Giving
    • Jobs

    The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.

     Contact KU
    Lawrence, KS | Maps