Permanent URI for this collection
Browse
Recent Submissions
Publication Generalized Quantum Convolution for Multidimensional Data(MDPI, 2023-10-31) Jeng, Mingyoung; Nobel, Alvir; Jha, Vinayak; Levy, David; Kneidel, Dylan; Chaudhary, Manu; Islam, Ishraq; Rahman, Muhammad Momin; El-Araby, EsamThe convolution operation plays a vital role in a wide range of critical algorithms across various domains, such as digital image processing, convolutional neural networks, and quantum machine learning. In existing implementations, particularly in quantum neural networks, convolution operations are usually approximated by the application of filters with data strides that are equal to the filter window sizes. One challenge with these implementations is preserving the spatial and temporal localities of the input features, specifically for data with higher dimensions. In addition, the deep circuits required to perform quantum convolution with a unity stride, especially for multidimensional data, increase the risk of violating decoherence constraints. In this work, we propose depth-optimized circuits for performing generalized multidimensional quantum convolution operations with unity stride targeting applications that process data with high dimensions, such as hyperspectral imagery and remote sensing. We experimentally evaluate and demonstrate the applicability of the proposed techniques by using real-world, high-resolution, multidimensional image data on a state-of-the-art quantum simulator from IBM Quantum.Publication Using pre-surgical suspicion to guide insula implantation strategy(Elsevier, 2023-07-14) Cameron, Nathaniel; Fry, Lane; Kabangu, Jean-Luc; Schatmeyer, Bryan A.; Miller, Christopher; Ulloa, Carol M.; Uysal, Utku; Cheng, Jennifer J.; Kinsman, Michael J.; Rouse, Adam G.; Landazuri, PatrickRationale Insular epilepsy can be a challenging diagnosis due to overlapping semiology and scalp EEG findings with frontal, temporal, and parietal lobe epilepsies. Stereotactic electroencephalography (sEEG) provides an opportunity to better localize seizure onset. The possibility of improved localization is balanced by implantation risk in this vascularly rich anatomic region. We review both safety and pre-implantation factors involved in insular electrode placement across four years at an academic medical center. Methods Presurgical data, operative reports, and invasive EEG summaries were retrospectively reviewed for patients undergoing invasive epilepsy monitoring on the insula from 2016 through 2019. EEG reports were reviewed to record the presence of insula ictal and interictal involvement. We recorded which presurgical findings suggested insular involvement (insula lesion on MRI, insula changes on PET/SPECT/scalp EEG, characteristic semiology, or history of failed anterior temporal lobectomy). The likelihood of pre-sEEG insular onset was categorized as low suspicion if no presurgical findings were present (“rule out”), moderate suspicion if one finding was present, and high suspicion if two or more findings were present. Results 76 patients received 189 insular electrodes as part of their implantation strategy for 79 surgical cases. Seven patients (8.9%) had insular ictal onset. One clinically significant complication (left hemiparesis) occurred in a patient with moderate suspicion for insular onset. There were 38 low suspicion cases, 36 moderate suspicion cases, and 5 high suspicion cases for pre-sEEG insula ictal onset. Two low suspicion (5.3%), three moderate suspicion (8.6%), and two high suspicion (40%) cases had insular ictal onset. Conclusions The insula can safely receive sEEG. Having two or more presurgical factors indicating insular onset is a strong, albeit incomplete, predictor of insular seizure onset. Using pre-implantation clinical findings can offer clinicians predictive value for targeting the insula during invasive EEG monitoring.Publication Aphid cluster recognition and detection in the wild using deep learning models(Nature Research, 2023-08-17) Zhang, Tianxiao; Li, Kaidong; Chen, Xiangyu; Zhong, Cuncong; Luo, Bo; Grijalva, Ivan; McCornack, Brian; Flippo, Daniel; Sharda, Ajay; Wang, GuanghuiAphid infestation poses a significant threat to crop production, rural communities, and global food security. While chemical pest control is crucial for maximizing yields, applying chemicals across entire fields is both environmentally unsustainable and costly. Hence, precise localization and management of aphids are essential for targeted pesticide application. The paper primarily focuses on using deep learning models for detecting aphid clusters. We propose a novel approach for estimating infection levels by detecting aphid clusters. To facilitate this research, we have captured a large-scale dataset from sorghum fields, manually selected 5447 images containing aphids, and annotated each individual aphid cluster within these images. To facilitate the use of machine learning models, we further process the images by cropping them into patches, resulting in a labeled dataset comprising 151,380 image patches. Then, we implemented and compared the performance of four state-of-the-art object detection models (VFNet, GFLV2, PAA, and ATSS) on the aphid dataset. Extensive experimental results show that all models yield stable similar performance in terms of average precision and recall. We then propose to merge close neighboring clusters and remove tiny clusters caused by cropping, and the performance is further boosted by around 17%. The study demonstrates the feasibility of automatically detecting and managing insects using machine learning models. The labeled dataset will be made openly available to the research community.Publication Editorial: Non-coding RNAs: insights and state-of-the-art in gastrointestinal sciences(Frontiers Media, 2023-07-05) Fu, Ting; Xu, Zhenjiang Zech; Zhong, CuncongPublication Gender, Smoking History, and Age Prediction from Laryngeal Images(MDPI, 2023-05-29) Zhang, Tianxiao; Bur, Andrés M.; Kraft, Shannon; Kavookjian, Hannah; Renslo, Bryan; Chen, Xiangyu; Luo, Bo; Wang, GuanghuiFlexible laryngoscopy is commonly performed by otolaryngologists to detect laryngeal diseases and to recognize potentially malignant lesions. Recently, researchers have introduced machine learning techniques to facilitate automated diagnosis using laryngeal images and achieved promising results. The diagnostic performance can be improved when patients’ demographic information is incorporated into models. However, the manual entry of patient data is time-consuming for clinicians. In this study, we made the first endeavor to employ deep learning models to predict patient demographic information to improve the detector model’s performance. The overall accuracy for gender, smoking history, and age was 85.5%, 65.2%, and 75.9%, respectively. We also created a new laryngoscopic image set for the machine learning study and benchmarked the performance of eight classical deep learning models based on CNNs and Transformers. The results can be integrated into current learning models to improve their performance by incorporating the patient’s demographic information.Publication Optical trapping of sub-millimeter sized particles and microorganisms(Nature Research, 2023-05-27) Lialys, Laurynas; Lialys, Justinas; Salandrino, Alessandro; Ackley, Brian D.; Fardad, ShimaWhile optical tweezers (OT) are mostly used for confining smaller size particles, the counter-propagating (CP) dual-beam traps have been a versatile method for confining both small and larger size particles including biological specimen. However, CP traps are complex sensitive systems, requiring tedious alignment to achieve perfect symmetry with rather low trapping stiffness values compared to OT. Moreover, due to their relatively weak forces, CP traps are limited in the size of particles they can confine which is about 100 μm. In this paper, a new class of counter-propagating optical tweezers with a broken symmetry is discussed and experimentally demonstrated to trap and manipulate larger than 100 μm particles inside liquid media. Our technique exploits a single Gaussian beam folding back on itself in an asymmetrical fashion forming a CP trap capable of confining small and significantly larger particles (up to 250 μm in diameter) based on optical forces only. Such optical trapping of large-size specimen to the best of our knowledge has not been demonstrated before. The broken symmetry of the trap combined with the retro-reflection of the beam has not only significantly simplified the alignment of the system, but also made it robust to slight misalignments and enhances the trapping stiffness as shown later. Moreover, our proposed trapping method is quite versatile as it allows for trapping and translating of a wide variety of particle sizes and shapes, ranging from one micron up to a few hundred of microns including microorganisms, using very low laser powers and numerical aperture optics. This in turn, permits the integration of a wide range of spectroscopy techniques for imaging and studying the optically trapped specimen. As an example, we will demonstrate how this novel technique enables simultaneous 3D trapping and light-sheet microscopy of C. elegans worms with up to 450 µm length.Publication Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data(Oxford University Press, 2023-03-11) Thippabhotla, Sirisha; Liu, Ben; Podgorny, Adam; Yooseph, Shibu; Yang, Youngik; Zhang, Jun; Zhong, CuncongMetagenomics is the study of all genomic content contained in given microbial communities. Metagenomic functional analysis aims to quantify protein families and reconstruct metabolic pathways from the metagenome. It plays a central role in understanding the interaction between the microbial community and its host or environment. De novo functional analysis, which allows the discovery of novel protein families, remains challenging for high-complexity communities. There are currently three main approaches for recovering novel genes or proteins: de novo nucleotide assembly, gene calling and peptide assembly. Unfortunately, their information dependency has been overlooked, and each has been formulated as an independent problem. In this work, we develop a sophisticated workflow called integrated Metagenomic Protein Predictor (iMPP), which leverages the information dependencies for better de novo functional analysis. iMPP contains three novel modules: a hybrid assembly graph generation module, a graph-based gene calling module, and a peptide assembly-based refinement module. iMPP significantly improved the existing gene calling sensitivity on unassembled metagenomic reads, achieving a 92–97% recall rate at a high precision level (>85%). iMPP further allowed for more sensitive and accurate peptide assembly, recovering more reference proteins and delivering more hypothetical protein sequences. The high performance of iMPP can provide a more comprehensive and unbiased view of the microbial communities under investigation. iMPP is freely available from https://github.com/Sirisha-t/iMPP.Publication Prosumer Nanogrids: A Cybersecurity Assessment(IEEE, 2020-07-15) Dafalla, Yousif; Liu, Bo; Hahn, Dalton A.; Wu, Hongyu; Ahmadi, Reza; Bardas, Alexandru G.Nanogrids are customer deployments that can generate and inject electricity into the power grid. These deployments are based on behind-the-meter renewable energy resources and are labeled as “prosumer setups”, allowing customers to not only consume electricity, but also produce it. A residential nanogrid is comprised of a physical layer that is a household-scale electric power system, and a cyber layer that is used by manufacturers and/or grid operators to remotely monitor and control the nanogrid. With the increased penetration of renewable energy resources, nanogrids are at the forefront of a paradigm shift in the operational landscape and their correct operation is vital to the electric power grid. In this paper, we perform a cybersecurity assessment of a state-of-the art residential nanogrid deployment. For this purpose, we deployed a real-world experimental nanogrid setup that is based on photovoltaic (PV) generation. We analyzed the security and the resiliency of this system at both the cyber and physical layers. While we noticed improvements in the cybersecurity measures employed in the current nanogrid compared to previous generations, there are still major concerns. Our experiments show that these concerns range from exploiting well-known protocols, such as Secure Shell (SSH) and Domain Name Service (DNS), to the leakage of confidential information, and major shortcomings in the software updating mechanism. While the compromise of multiple nanogrids can have a negative effect on the entire power grid, we focus our analysis on individual households and have determined through Simulink-based simulations the economic loss of a compromised deployment.Publication Colonoscopy polyp detection and classification: Dataset creation and comparative evaluations(Public Library of Science, 2021-08-17) Li, Kaidong; Fathan, Mohammad I.; Patel, Krushi; Zhang, Tianxiao; Zhong, Cuncong; Bansal, Ajay; Rastogi, Amit; Wang, Jean S.; Wang, GuanghuiColorectal cancer (CRC) is one of the most common types of cancer with a high mortality rate. Colonoscopy is the preferred procedure for CRC screening and has proven to be effective in reducing CRC mortality. Thus, a reliable computer-aided polyp detection and classification system can significantly increase the effectiveness of colonoscopy. In this paper, we create an endoscopic dataset collected from various sources and annotate the ground truth of polyp location and classification results with the help of experienced gastroenterologists. The dataset can serve as a benchmark platform to train and evaluate the machine learning models for polyp classification. We have also compared the performance of eight state-of-the-art deep learning-based object detection models. The results demonstrate that deep CNN models are promising in CRC screening. This work can serve as a baseline for future research in polyp detection and classification.Publication pyCatalstReader: Extracting Text and Tokenization of Technical Catalysis Science Papers(2021-12-08) Castro, GiordannoCatalysts are an essential and ubiquitous component of our modern life, from empowering our agriculture to reducing toxic emissions. There is a constant need for more and better catalysts. The catalysis research literature is immense, growing, and scattered. Natural Language Processing (NLP), a sub-field of Machine Learning (ML), offers a potential solution to automatically make full use of all this valuable information and speed innovation. Even though NLP has made much progress in the analysis of everyday text, its application in more technical text has not been as successful. Specifically, there are even a dearth of tools that can appropriately extract text from the PDF files of research articles, which are the most common format used in the catalyst field. Therefore, this project aims to define a tool that can extract text from PDF files of catalysis science articles, which is prerequisite to applying NLP and ML tools. We also explore the first stage of the NLP pipeline, tokenization, by objectively comparing different tokenizers for catalysis science articles.Publication Taming WOLF: Building a More Functional and User-Friendly Framework(2019-06-12) Sader, CaseyMachine learning is all about automation. Many tools have been created to help data scientists automate repeated tasks and train models. These tools require varying levels of user experience to be used effectively. The “machine learning WOrk fLow management Framework" (WOLF) aims to automate the machine-learning pipeline. One of its key uses is to discover which machine-learning model and hyper-parameters are the best configuration for a dataset. In this project, features were explored that could be added to make WOLF behave as a full pipeline in order to be helpful for novice and experienced data scientists alike. One feature to make WOLF more accessible is a website version that can be accessed from anywhere and make using WOLF much more intuitive. To keep WOLF aligned with the most recent trends and models, the ability to train a neural network using the TensorFlow framework and Keras library were added. This project also introduced the ability to pickle and save trained models. Designing the option for using the models to make predictions within the WOLF framework on another collection of data is a fundamental side-effect of saving the models. Understanding how the model makes predictions is a beneficial component of machine learning. This project aids in that understanding by calculating and reporting the relative importance of the dataset features for the given model. Incorporating all these additions to WOLF makes it a more functional and user-friendly framework for machine learning tasks.Publication Introduction to Communication Systems: An Interactive Approach Using the Wolfram Language(University of Kansas Libraries, 2021-07-20) Frost, Victor S.This ebook provides a unique pedagogical approach to teaching the fundamentals of communication systems using interactive graphics and in-line questions. The material opens with describing the transformation of bits into digital baseband waveforms. Double-sideband suppressed carrier modulation and quadrature modulation then provide the foundation for the discussions of Binary Phase Shift Keying (BPSK), Quadrature Phase Shift Keying (QPSK), M-ary Quadrature Amplitude Modulation (M-QAM), M-ary Phase Shift Keying (MPSK), and the basic theory of Orthogonal Frequency Division Multiplexing (OFDM). Traditional analog modulation systems are also described. Systems trade-offs, including link budgets, are emphasized. Interactive graphics allow the students to engage with and visualize communication systems concepts. Interactivity and in-line review questions enables students to rapidly examine system tradeoffs and design alternatives. The topics covered build upon each other culminating with an introduction to the implementation of OFDM transmitters and receivers, the ubiquitous technology used in WiFi, HDFM, 4G and 5G communication systems.Publication Machine Learning for Aerospace Applications using the Blackbird Dataset(2021-07-09) McNamee, PatrickThere is currently much interest in using machine learning (ML) models for vision-based object detection and navigation tasks in autonomous vehicles. For unmanned aerial vehicles (UAVs), and particularly small multi-rotor vehicles such as quadcopters, these models are trained on either unpublished data or within simulated environments, which leads to two issues: the inability to reliably reproduce results, and behavioral discrepancies on physical deployments resulting from unmodeled dynamics in the simulation environment. To overcome these issues, this project uses the Blackbird Dataset to explore integration of ML models for UAV. The Blackbird Dataset is overviewed to illustrate features and issues before investigating possible ML applications. Unsupervised learning models are used to determine flight-test partitions for training supervised deep neural network (DNN) models for nonlinear dynamic inversion. The DNN models are used to determine appropriate model choices over several network parameters including network layer depth, activation functions, epochs for training, and neural network regularization.Publication Practical and Secure Outsourcing Algorithms of Matrix Operations Based on a Novel Matrix Encryption Method(Institute of Electrical and Electronics Engineers, 2019-04-26) Zhang, Shengxia; Tian, Chengliang; Zhang, Hanlin; Yu, Jia; Li, FengjunWith the recent growth and commercialization of cloud computing, outsourcing computation has become one of the most important cloud services, which allows the resource-constrained clients to efficiently perform large-scale computation in a pay-per-use manner. Meanwhile, outsourcing large scale computing problems and computationally intensive applications to the cloud has become prevalent in the science and engineering computing community. As important fundamental operations, large-scale matrix multiplication computation (MMC), matrix inversion computation (MIC), and matrix determinant computation (MDC) have been frequently used. In this paper, we present three new algorithms to enable secure, verifiable, and efficient outsourcing of MMC, MIC, and MDC operations to a cloud that may be potentially malicious. The main idea behind our algorithms is a novel matrix encryption/decryption method utilizing consecutive and sparse unimodular matrix transformations. Compared to previous works, this versatile technique can be applied to many matrix operations while achieving a good balance between security and efficiency. First, the proposed algorithms provide robust confidentiality by concealing the local information of the entries in the input matrices. Besides, they also protect the statistic information of the original matrix. Moreover, these algorithms are highly efficient. Our theoretical analysis indicates that the proposed algorithms reduce the time overhead on the client side from O(n 2.3728639 ) to O(n 2 ). Finally, the extensive experimental evaluations demonstrate the practical efficiency and effectiveness of our algorithms.Publication Partial type constructors: Or, making ad hoc datatypes less ad hoc(Association for Computing Machinery (ACM), 2020-01) Jones, Mark P.; Morris, J. Garrett; Eisenberg, Richard A.Functional programming languages assume that type constructors are total. Yet functional programmers know better: counterexamples range from container types that make limiting assumptions about their contents (e.g., requiring computable equality or ordering functions) to type families with defining equations only over certain choices of arguments. We present a language design and formal theory of partial type constructors, capturing the domains of type constructors using qualified types. Our design is both simple and expressive: we support partial datatypes as first-class citizens (including as instances of parametric abstractions, such as the Haskell Functor and Monad classes), and show a simple type elaboration algorithm that avoids placing undue annotation burden on programmers. We show that our type system rejects ill-defined types and can be compiled to a semantic model based on System F. Finally, we have conducted an experimental analysis of a body of Haskell code, using a proof-of-concept implementation of our system; while there are cases where our system requires additional annotations, these cases are rarely encountered in practical Haskell code.Publication Exceptional asynchronous session types: Session types without tiers(Association for Computing Machinery (ACM), 2019-01) Fowler, Simon; Lindley, Sam; Morris, J. Garrett; Decova, SáraSession types statically guarantee that communication complies with a protocol. However, most accounts of session typing do not account for failure, which means they are of limited use in real applications---especially distributed applications---where failure is pervasive. We present the first formal integration of asynchronous session types with exception handling in a functional programming language. We define a core calculus which satisfies preservation and progress properties, is deadlock free, confluent, and terminating. We provide the first implementation of session types with exception handling for a fully-fledged functional programming language, by extending the Links web programming language; our implementation draws on existing work on effect handlers. We illustrate our approach through a running example of two-factor authentication, and a larger example of a session-based chat application where communication occurs over session-typed channels and disconnections are handled gracefully.Publication Abstracting extensible data types: Or, rows by any other name(Association for Computing Machinery (ACM), 2019-01) Morris, J. Garrett; McKinna, JamesWe present a novel typed language for extensible data types, generalizing and abstracting existing systems of row types and row polymorphism. Extensible data types are a powerful addition to traditional functional programming languages, capturing ideas from OOP-like record extension and polymorphism to modular compositional interpreters. We introduce row theories, a monoidal generalization of row types, giving a general account of record concatenation and projection (dually, variant injection and branching). We realize them via qualified types, abstracting the interpretation of records and variants over different row theories. Our approach naturally types terms untypable in other systems of extensible data types, while maintaining strong metatheoretic properties, such as coherence and principal types. Evidence for type qualifiers has computational content, determining the implementation of record and variant operations; we demonstrate this in giving a modular translation from our calculus, instantiated with various row theories, to polymorphic λ-calculus.Publication Identification of the Bacterial Biosynthetic Gene Clusters of the Oral Microbiome Illuminates the Unexplored Social Language of Bacteria during Health and Disease(American Society for Microbiology, 2019-04-16) Aleti, Gajender; Baker, Jonathon L.; Tang, Xiaoyu; Alvarez, Ruth; Dinis, Márcia; Tran, Nini C.; Melnik, Alexey V.; Zhong, Cuncong; Ernst, Madeleine; Dorrestein, Pieter C.; Edlund, AnnaSmall molecules are the primary communication media of the microbial world. Recent bioinformatic studies, exploring the biosynthetic gene clusters (BGCs) which produce many small molecules, have highlighted the incredible biochemical potential of the signaling molecules encoded by the human microbiome. Thus far, most research efforts have focused on understanding the social language of the gut microbiome, leaving crucial signaling molecules produced by oral bacteria and their connection to health versus disease in need of investigation. In this study, a total of 4,915 BGCs were identified across 461 genomes representing a broad taxonomic diversity of oral bacteria. Sequence similarity networking provided a putative product class for more than 100 unclassified novel BGCs. The newly identified BGCs were cross-referenced against 254 metagenomes and metatranscriptomes derived from individuals either with good oral health or with dental caries or periodontitis. This analysis revealed 2,473 BGCs, which were differentially represented across the oral microbiomes associated with health versus disease. Coabundance network analysis identified numerous inverse correlations between BGCs and specific oral taxa. These correlations were present in healthy individuals but greatly reduced in individuals with dental caries, which may suggest a defect in colonization resistance. Finally, corroborating mass spectrometry identified several compounds with homology to products of the predicted BGC classes. Together, these findings greatly expand the number of known biosynthetic pathways present in the oral microbiome and provide an atlas for experimental characterization of these abundant, yet poorly understood, molecules and socio-chemical relationships, which impact the development of caries and periodontitis, two of the world’s most common chronic diseases.Publication Accurate and Efficient Mapping of the Cross-Linked microRNA-mRNA Duplex Reads(Cell Press, 2019-05-28) Zhong, Cuncong; Zhang, ShaojieMicroRNA (miRNA) trans-regulates the stability of many mRNAs and controls their expression levels. Reconstruction of the miRNA-mRNA interactome is key to the understanding of the miRNA regulatory network and related biological processes. However, existing miRNA target prediction methods are limited to canonical miRNA-mRNA interactions and have high false prediction rates. Other experimental methods are low throughput and cannot be used to probe genome-wide interactions. To address this challenge, the Cross-linking Ligation and Sequencing of Hybrids (CLASH) technology was developed for high-throughput probing of transcriptome-wide microRNA-mRNA interactions in vivo. The mapping of duplex reads, chimeras of two ultra-short RNA strands, poses computational challenges to current mapping and alignment methods. To address this issue, we developed CLAN (CrossLinked reads ANalysis toolkit). CLAN generated a comparable mapping of singular reads to other tools, and significantly outperformed in mapping simulated and real CLASH duplex reads, offering a potential application to other next-generation sequencing-based duplex-read-generating technologies.Publication Dimension Reduction Using Quantum Wavelet Transform on a High-Performance Reconfigurable Computer(Hindawi, 2019-11-11) Mahmud, Naveed; El-Araby, EsamThe high resolution of multidimensional space-time measurements and enormity of data readout counts in applications such as particle tracking in high-energy physics (HEP) is becoming nowadays a major challenge. In this work, we propose combining dimension reduction techniques with quantum information processing for application in domains that generate large volumes of data such as HEP. More specifically, we propose using quantum wavelet transform (QWT) to reduce the dimensionality of high spatial resolution data. The quantum wavelet transform takes advantage of the principles of quantum mechanics to achieve reductions in computation time while processing exponentially larger amount of information. We develop simpler and optimized emulation architectures than what has been previously reported, to perform quantum wavelet transform on high-resolution data. We also implement the inverse quantum wavelet transform (IQWT) to accurately reconstruct the data without any losses. The algorithms are prototyped on an FPGA-based quantum emulator that supports double-precision floating-point computations. Experimental work has been performed using high-resolution image data on a state-of-the-art multinode high-performance reconfigurable computer. The experimental results show that the proposed concepts represent a feasible approach to reducing dimensionality of high spatial resolution data generated by applications such as particle tracking in high-energy physics.