2015 Symposium Abstracts - Computer Science

CSE-01  Automatic In Vivo Cell Detection In MRI

Authors: Muhammad Afridi; Xiaoming Liu; Erik Shapiro; Arun Ross

Abstract: According to the Department of Health, about 18 people die in U.S. everyday while waiting for an organ transplant and this situation is getting worst every year. Therefore, scientists in regenerative medicine have proposed cell transplant based therapies as a promising alternate to organ transplant. However, its success in humans has not been proven fully and requires a comprehensive in vivo analysis of the transplanted cells in MRI. Unfortunately, to date, such analysis is conducted manually and is extremely tedious, especially in the clinical arena. On the other hand, previous research has attempted to automate cell analysis for microscopic images which does not address the challenges of real in vivo cell detection in MRI. Therefore, this paper proposes a novel computer vision-based learning approach that creates superpixel-based 3D models for candidate cell spots in MRI, extracts a novel set of superfern features, and utilizes a partition-based Bayesian classifier ensemble to distinguish cell spots from non-spots. Unlike traditional ferns that utilize pixel-based differences, superferns exploit superpixel averages in computing difference-based features despite the absence of any order in superpixel arrangement. To evaluate the proposed approach, we develop the first labeled database with a total of more than 16 thousand labels on five in vivo and four in vitro MRI scans. Experimental results show the superiority of our approach in comparison to the two most relevant baselines. To the best of our knowledge, this is the first study to utilize a learning-based methodology for in vivo cell detection in MRI.


CSE-02  On The Measurement Of Route Based Traffic Shaping At Scale

Authors: Faraz Ahmed; M. Zubair Shafiq; Amir Khakpour; Alex X. Liu

Abstract: Internet traffic consists of a wide variety of applications delivered across heterogeneous IP networks. Internet service providers (ISPs) manipulate network traffic for quality of service (QoS) management and bandwidth regulation. This practice of traffic manipulation is called traffic shaping. Content providers and content delivery networks (CDNs) are interested in knowing whether their traffic is adversely affected due to traffic shaping. We design and implement RTmon, a tool that improves network transparency by enabling content providers and CDNs to detect whether their traffic is subject to route based traffic shaping, which happens when traffic between the same origin-destination pair is treated differently across different paths. RTmon is implemented as a client-side JavaScript, which simultaneously downloads a pixel tag from a multi-homed server, at a nearby IXP, via different transit providers. A significant difference in object download time indicates potential traffic shaping on one of the paths. RTmon was deployed by a large commercial CDN across thousands of websites over the duration of one year. RTmon measurement experience reveals that route based traffic shaping is larger in wireless networks as compared to wired networks. Route based traffic shaping is more prevalent in rural areas as compared to urban areas. Traffic shaping is more likely to occur at access ISPs or their interconnections. Moreover, traffic shaping in regions with 3 or more competing ISPs is 20% less than regions with only one ISP or two competing ISPs.


CSE-03  Blueear: A Low-Cost Eavesdropping System For Sniffing Bluetooth Traffic

Authors: Wahhab Albazrqaoe; Guoliang Xing

Abstract: As the de facto wireless technology of personal area networking, Bluetooth is becoming increasingly popular due to the proliferation of wearable devices. At the physical layer, Bluetooth employs adaptive, pesudorandom frequency hopping over 79 channels. The hopping sequence is a secret shared by communication parities, and is frequently modified based on interference conditions. Such design provides Bluetooth with a basic mechanism to improve data confidentiality. Although specialized wide-band packet sniffers exist for eavesdropping Bluetooth traffic, they are power hungry and extremely expensive. In this work, we describe the design and implementation of BlueEar, a low-cost eavesdropping system for sniffing Bluetooth traffic. BlueEar employs three techniques in achieving this goal. First, by analyzing packet statistics passively collected on a single channel, BlueEar reverses the basic hopping sequence of a given target. Second, based on spectrum sensing, BlueEar predicts the adaptive hopping behavior of the target in the presence of interference. Third, BlueEar employs smart channel jamming to control the frequency hopping of the target, which significantly improves the accuracy of packet capturing. BlueEar is implemented and extensively evaluated based on Ubertooth, an inexpensive wireless development platform. We also discuss the design of a standard-compliant countermeasure against BlueEar-based eavesdropping. As an efficient tool for packet sniffing, BlueEar can also be used for analyzing, troubleshooting, and debugging Bluetooth-based protocols and applications.


CSE-05  Demystifying And Exploiting Bit Fate In 802.11 Network

Authors: Alireza Ameli; Jun Huang; Abdol-Hossein Esfahanian; Guoliang Xing

Abstract: Today wireless communication suffers from high transmission error and unreliable high data rates. It was believed that channel condition was mostly responsible for these errors. However, recent research has confirmed that there are patterns in the bit error frequency which are not caused by channel condition. These patterns can be exploited to bring about improvement in WLAN throughput. In particular, they can be used in Forward Error Correction, channel coding as well as in applications like video streaming. In this paper, we first validate the existence of such a patterns in different environments and across different devices. We also demonstrate the error patterns depend just on the transmission rate. After hypothesize their cause, we give a statistical model governing such patterns. Finally, we apply our findings to a video streaming scenario and demonstrate a noticeable improvement in the throughput.


CSE-06  Crowd Powered Latent Fingerprint Identification: Fusing AFIS With Examiner Markups

Authors: Sunpreet Arora; Kai Cao; Anil Jain; Gregoire Michaud

Abstract: Automatic matching of poor quality latent fingerprints to rolled/slap fingerprints using an Automated Fingerprint Identification System (AFIS) is still far from satisfactory. Therefore, it is a common practice to have a latent examiner mark features on a latent for improving the hit rate of the AFIS. We propose a synergistic crowd powered latent identification framework where multiple latent examiners and the AFIS work in conjunction with each other to boost the identification accuracy of the AFIS. Given a latent, the candidate list output by the AFIS is used to determine the likelihood that a hit at rank-1 was found. A latent for which this likelihood is low is crowdsourced to a pool of latent examiners for feature markup. The manual markups are then input to the AFIS to increase the likelihood of making a hit in the reference database. Experimental results show that the fusion of an AFIS with examiner markups improves the rank-1 identification accuracy of the AFIS by 7.75% (using six markups) on the 500 ppi NIST SD27 database, 11.37% (using two markups) on the 1000 ppi ELFT-EFS public challenge database, and by 2.5% (using a single markup) on the 1000 ppi RS&A database against 250,000 rolled prints in the reference database.


CSE-07  A Longitudinal Study Of Automatic Face Recognition

Authors: Lacey Best-Rowden; Anil K. Jain

Abstract: With the deployment of automatic face recognition systems for many large-scale applications, it is crucial that we gain a thorough understanding of how facial aging affects the recognition performance, particularly across a large population. Because aging of the human face is a complex process involving genetic and environmental factors, some faces “age well” while the appearance of others changes drastically over time. This heterogeneity (inter-subject variability) suggests the need for a subject-specific analysis of the effect of facial aging on the performance of face recognition systems. We conduct such an analysis using a longitudinal (aging) database of 147,784 operational mug shots of 18,007 repeat criminal offenders, where each subject has at least five face images acquired over a minimum of five years. By fitting multilevel statistical models to genuine similarity scores from two commercial-off-the-shelf (COTS) matchers, we quantify (i) the population average rate of change in genuine scores with respect to elapsed time between two face image acquisitions, and (ii) how closely the subject-specific rates of change follow the population average. Longitudinal analysis of the scores from the more accurate COTS matcher shows that despite a decreasing average rate of change in genuine scores over time, the average subject can still be correctly verified at a false accept rate (FAR) of 0.01% up to 16 years elapsed time (the maximum in our database). We also investigate (i) the effects of several other covariates (gender, race, face quality), and (ii) the probability of true acceptance over time.

This work was supported in part by Center for Identification Technology Research (CITeR)


CSE-08  Homelog: A Smartphone System For Unobtrusive Family Routine Monitoring

Authors: Chongguang Bi; Tian Hao; Guoliang Xing; Jina Huh; Wei Peng; Mengyan Ma

Abstract: Research has shown that family routine plays a critical role in establishing good relationships among family members and maintaining their physical and mental health. In particular, regularly eating dinner as a family and limiting screen viewing time significantly reduce prevalence of obesity. Fine-grained activity logging and analysis can enable a family to track their daily routine and modify their life styles for improved wellness. This paper presents HomeLog -- a practical system to log family routine using off-the-shelf smartphones. HomeLog automatically detects and logs details of several important family routine activities, including family dining, TV viewing and conversation, in an unobtrusive manner. By providing a detailed family routine history, HomeLog enpowers family members to actively engage in preventing child obesity and make positive changes to improve family wellness. Based on the acoustic data collected from real families, we carefully choose the robust yet lightweight acoustic features for various family activities. HomeLog keeps track of the ambient noise characteristics and adapt its learning algorithms in response to the dynamics of the environment. Our extensive experiments involving 5 families with children (total 4,620 minutes of recording) show that HomeLog can detecting family routine activities with average 88.8% precision and 90.2% recall rates across different families and home environments.


CSE-09  Can Sex Be Deduced From An Iris Image?

Authors: Denton Bobeldyk; Arun Ross

Abstract: The human iris is the colored annular portion of the eye that regulates the amount of light entering the eye. In the context of biometrics, the rich textural pattern observed in near infrared iris images has been used for recognizing individuals. In this work we explore the possibility of deducing the sex of an individual, i.e., male or female, from the iris image. First, we conduct a literature review to determine if sex-related factors can impact the morphology and the texture of the iris. Next, we devise automated schemes based on state-of-the-art computer vision and pattern recognition techniques to classify an iris image as “male” or “female”. This poster will present our current research and the progress made in predicting the genetic sex from an iris image.


CSE-10  An Efficient Structural Diversity Technique For Genetic Programming

Authors: Armand Burks; William Punch

Abstract: Genetic diversity plays an important role in avoiding premature convergence, which is a phenomenon that stifles the search effectiveness of evolutionary algorithms. Much research effort has focused on maintaining genetic diversity. However, techniques that avoid premature convergence often do so by sacrificing efficiency, requiring more fitness evaluations to discover high quality solutions. We introduce a simple and efficient genetic diversity technique that is capable of avoiding premature convergence while maintaining a high level of search quality in tree-based genetic programming. Our method finds solutions to a set of benchmark problems in significantly fewer fitness evaluations than the algorithms that we compared against.

This work was supported in part by National Science Foundation under Cooperative Agreement No. DBI-0939454. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foun


CSE-11  Towards Bilingual Language Acquisition: A Developmental Network Approach

Authors: Juan L. Castro-Garcia; Isaac E. Dorgbefu; Juyang Weng

Abstract: Based on only a finite set of words and symbols, a natural language consists of unbounded combinations of sentences that express concepts, objects, events, structures (e.g., as in mathematics) and emotions. How can one form unbounded combinations when he is exposed to only a small set of symbols in his lifetime? Scientists study how a child is able to distinguish speech patterns of complex languages like Russian, Chinese or both and how they can learn to produce such combinations while being exposed to a small set of words. Current probabilistic methods like Hidden Markov Models (HMM) and Partially Observable Markov Decision Processes (POMDP) are based on finite-state machines. However, finite-state machines are symbolic models that are static, not emergent automatically from experiences. We show how a developmental network serves as a general-purpose model capable of learning any natural language by sensing patterns of receptors and actions from the real world. It dynamically and incrementally updates an emergent finite-state machine. When such a finite-state machine works with the real-world, the entire machinery becomes a Turing Machine whose capability is well known. Because it does not require a static model, we demonstrate how such a developmental network is able to learn two or more languages interactively in a bilingual environment.


CSE-12  Efficient Kernel Clustering Of Data Sets With Large Number Of Clusters

Authors: Radha Chitta; Anil K. Jain

Abstract: Recent advances in data generation, collection and storage technologies have resulted in digital data explosion. Analysis of this massive amount of data necessitates analysis techniques which can cater to the large volume, high dimensionality and heterogeneity of the data. Clustering is one of the principal tools to efficiently organize the massive amount of data and to enable convenient access. Several algorithms have been developed in the literature to efficiently cluster large high-dimensional data sets. These algorithms can be categorized into two groups: linear and kernel-based. Linear clustering algorithms assume that the data is linearly separable in the input space and use Euclidean distance to define the inter-point similarity. Kernel-based clustering techniques use nonlinear distance measures enabling them to capture the true structure in real world data sets and perform better than the linear clustering algorithms. Though many of these algorithms are efficient in terms of the volume and dimensionality of the data, they take a long time to cluster data sets when they contain a large number of clusters. Such data sets are often encountered in applications such as document clustering, image categorization, and social network analysis. We propose an approximate clustering algorithm which uses random sampling and approximate nearest neighbor computation to reduce the running time complexity of kernel-based clustering. The proposed algorithm has complexity linear in the number of data points and logarithmic in the number of clusters, and can cluster data sets containing millions of data points into thousands of clusters in only a few hours.


CSE-13  Human Recognition: Combining Biometrics With Demographics Using Generalized Additive Models

Authors: Yaohui Ding; Mark Culp; Arun Ross

Abstract: The automatic estimation of demographic attributes (such as age, gender, ethnicity) from conventional biometric traits (fingerprint, face, iris, etc.) has gained considerable attention recently. While demographic attributes lack the distinctiveness to provide reliable verification or identification by themselves, they can be combined with biometric traits to improve human recognition accuracy. Current approaches to integrating demographics have three major drawbacks: i) require different operating thresholds for different demographic attribute values, ii) are tightly bound to specific biometric matching algorithms, and/or iii) suffer from low tolerances to incorrect demographic labels. We address these drawbacks by formulating the fusion problem as a general optimization function with respect to the recognition accuracy. The optimization can be solved based upon the Generalized Additive Model (GAM) with several constraints. We show the advantages of using the proposed GAM scheme by conducting experiments on the MORPH and WVU multimodal database. These experiments show that the proposed scheme results in an increase in the overall recognition accuracy and is more robust to the mislabeling problem.

This work was supported in part by National Science Foundation


CSE-14  Embodied Collaborative Referring Expression Generation In Situated Human-Robot Interaction

Authors: Rui Fang; Malcolm Doering; Joyce Y. Chai

Abstract: To facilitate referential communication between humans and robots and mediate their differences in representing the shared environment, we are exploring embodied collaborative models for referring expression generation (REG). Instead of a single minimum description to describe a target object, episodes of expressions are generated based on human feedback during human-robot interaction. We particularly investigate the role of embodiment such as robot gesture behaviors (i.e., pointing to an object) and human's gaze feedback (i.e., looking at a particular object) in the collaborative process. This paper examines different strategies of incorporating embodiment and collaboration in REG and discusses their possibilities and challenges in enabling human-robot referential communication.

This work was supported in part by IIS-1208390 from the National Science Foundation and N00014-11-1-0410 from the Office of Naval Research


CSE-15  Spatial Reward Heterogeneity Promotes Useful Diversity In Evolutionary Computation

Authors: Emily Dolson; Charles Ofria

Abstract: A central problem in evolutionary computation is ensuring that adequate diversity is maintained within a population of candidate solutions. If a population is not sufficiently diverse, evolution will stagnate. Many strategies for maintaining diversity suffer from a tendency to preserve solutions that are not useful to solving the problem at hand purely because they are unique. In biological evolution, spatial resource heterogeneity is believed to be a driver of biodiversity. This concept can be mapped to evolutionary computation by rewarding different sub-problems in different spatial locations. Here, we investigate the impact of spatial resource heterogeneity on phenotypic diversity and the evolution of a complex logic task. We studied a range of environments composed of patches in which eight different simpler logic tasks were rewarded. For each, we explored the relationship between spatial heterogeneity, phenotypic diversity, and the probability of the complex task evolving. Spatial entropy and phenotypic diversity were strongly correlated, a relationship that was consistent over various spatial configurations. Diversity also improved evolutionary potential, but had a much smaller impact than other components of environmental composition. The most important of these components proved to be the average number of sub-problems rewarded in cells across the environment, likely owing to the importance of building blocks for the evolution of complex features. These results suggest that spatial reward heterogeneity can be used to preferentially maintain diversity that is likely to be useful to solving a specific computational problem.

This work was supported in part by National Science Foundation Graduate Research Fellowship; BEACON Center for Evolution in Action (an NSF Science and Technology Center)


CSE-16  Association Mapping In The Presence Of Complex Sample Structure

Authors: Hussein Hejase; Kevin Liu

Abstract: A fundamental goal of association mapping is to identify the underlying genetic causes of diseases, which provides an aid in developing prevention techniques and therapeutic strategies. Association mapping examines the relationship between genetic features and class labels (e.g. boolean disease status) to pinpoint statistical associations that uncover the underlying genetic architecture of a trait of interest. Recently, association mapping have been applied on admixed populations (e.g. African-Americans) to study the genetic architecture behind complex traits, which introduce a complex sample structure due to the non-tree like evolutionary history resulting from admixture. This complex sample structure could confound the association analysis and generate spurious results if not accounted for. In this work, we conduct a performance study to show that current state-of-the-art association mapping methods fall short, a ~40% decrease in prediction accuracy, in the presence of a complex sample structure resulting from gene flow due to genetic admixture. Additionally, we provide a new computational pipeline to account for this complexity by utilizing an evolutionary and genetic perspective to association mapping. This pipeline involves reconstructing local genealogical histories by identifying the breakpoints separating the different gene trees across the genome. Next, we apply the association mapping method between breakpoints to account for the different local genealogical histories across the genome. We show a ~10% increase in prediction accuracy and a ~8% decrease in false positives using the new pipeline compared to the standard state-of-the-art association mapping method.


CSE-17  Towards Online Auction With Dynamic Bidders And Supplies

Authors: Chowdhury Hyder; Thomas Jeitschko; Li Xiao

Abstract: In spectrum trading, secondary users bid for the spectrum being made available by the primary users. Auction theory has been widely used to improve spectrum allocation in such spectrum trading scenarios. However in contrast to reality, most of the research work assume either static user population or static spectrum supply or both. In this work, we investigate a realistic dynamic auction environment where secondary users with diverse delay bound arrive dynamically and spectrum becomes available at random. We propose a price ranking based online auction mechanism that discourages bidders to lie about their bid and delay bound. We prove that the proposed auction mechanism is truthful and individual rational and demonstrate the effectiveness of the mechanism through extensive simulations.


CSE-18  Attribute Preserved Face De-Identification

Authors: Amin Jourabloo; Xi Yin; Xiaoming Liu

Abstract: In this paper, we recognize the need of de-identifying a face image while preserving a large set of facial attributes, which has not been explicitly studied in prior work. We verify the underling assumption that different visual features are used for identification and attribute classification. As a result, the proposed approach jointly models face de-identification and attribute preservation in a unified optimization framework. Specifically, a face image is represented by the shape and appearance parameters in an AAM model. Motivated by k-Same, we select k images that share the most similar attributes with those of a test image. Instead of using the average of k images, adopted by k-Same methods, we formulate an objective function and use gradient descent to learn the optimal weights for fusing k images. Experimental results on 200 images show that our proposed approach performs substantially better than the baseline method with a lower face recognition rate, while preserving more facial attributes.


CSE-19  A Wireless Sensor Network Within An Aquatic Environment

Authors: Tam Le; Matt Mutka

Abstract: Wireless sensor networks have been widely used in many environmental monitoring applications. However, for aquatic environments, the deployment is quite expensive since the sensors need to be anchored to prevent them from floating away and losing communications. We propose an inexpensive and flexible approach to provide environmental monitoring in aquatic environments. In our approach, a special mobile sensor robot acts as a mobile base station and travels the water area to collect data from sensors as well as locations that cannot be covered by sensors. The sensors in the water have a jumping capability that enables an extended communication range in comparison to sensors that merely float upon the water. In addition, by leveraging the jumping capability, the sensors can collaborate with others to exchange data and communicate with the robot, so that the robot can compute an efficient strategy path to travel. The problems we are studying are: 1) given a set of visited points, how to find the robot’s optimal path with supports of sensors to cover the remaining points; 2) to design an efficient jumping strategy and communication protocol between sensors.


CSE-20  Model-Reduced Variational Fluid Simulation

Authors: Beibei Liu; Gemma Mason; Julian Hodgson; Yiying Tong; Mathieu Desbrun

Abstract: We present a model-reduced variational Eulerian integrator for incompressible fluids, which combines the efficiency gains of dimension reduction, the qualitative robustness to coarse spatial and temporal resolutions of geometric integrators, and the simplicity of sub-grid accurate boundary conditions on regular grids to deal with arbitrarily-shaped domains. In the development of this variational method, we show how to use the link between a Lie group and its algebra in order to move from functional maps to associated vector fields, with the help of a non-holonomic constraint. Scalar- and vector-valued eigenfunctions of the Laplacian operator are used as bases for model reduction. We present examples (including buoyancy, magnetohydrodynamics, and turbulence model) in 2D, 3D and curved domains. The resulting method is versatile, energy-preserving, and computationally efficient.

This work was supported in part by NSF IIS-0953096 and CMMI-1250261


CSE-21  Efficient Broadcast On Fragmented Spectrum In Cognitive Radio Networks

Authors: Pei Huang; Chin-Jung Liu; Xi Yang; Li Xiao

Abstract: To improve spectrum utilization, cognitive radio (CR) is introduced to detect and exploit available spectrum resources autonomously. The flexible spectrum use imposes special challenges on broadcast because different CR devices may detect different available spectrum fragments at different locations. The sender and the receivers have to agree on spectrum fragments that will be used for broadcast. There may not exist a common spectrum fragment that is available to all receivers. Most existing work assumes that a device works only in a single channel and thus the sender has to broadcast multiple times in different channels to reach all receivers. The broadcast problem is studied as a channel rendezvous and minimum latency scheduling problem. Recent spectrum-agile designs have enabled a device to utilize partially occupied spectrum. We thus view a wideband channel as an aggregation of multiple narrow channels that can be evaluated independently. A Spectrum Fragment Agile Broadcast (SFAB) scheme is introduced in this paper to support efficient broadcast on fragmented spectrum. It aims at achieving spectrum agreement between the transmitter and the receivers efficiently and maximizing the channel width used for broadcast regardless of the spectrum availability differences at receivers. We validate the effectiveness of SFAB through implementation on the GNU Radio / USRP platform and use ns-2 simulations to evaluate the performance in large deployments.


CSE-22  Hierarchical Learning For Automated Categorization Of Mobile Apps

Authors: Xi Liu; Han Song; Mario Baldi; Pang-Ning Tan

Abstract: Recent years have witnessed the proliferation of smartphones, tablets, and other mobile devices along with the widespread use of their associated mobile apps, which are either pre-installed on the devices or can be downloaded from online app stores such as Apple iTunes and Google Play. As the number of new mobile apps developed continues to grow at a rapid pace, automatic classification of the apps has become an increasingly important problem to facilitate searching and recommendation. This paper presents a flexible, discriminative learning framework to automatically classify new apps based on the textual information provided by the app developers. The framework employs a semi-supervised matrix factorization approach to learn the dependencies between the textual features and the app categories. Preliminary results suggest that the accuracy of the proposed framework is comparable to L1-regularized logistic regression. However, the proposed framework has an advantage in that it can be trained to learn a taxonomy structure of the classes, with or without side information provided by existing class hierarchies such as the Google AdTree.


CSE-23  Improving Performance Of Online Services In Data Centers

Authors: Ali Munir; Ghufran Baig; Syed Irteza; Ihsan Qazi; Alex Liu; Fahad Dogar

Abstract: Data centers are now being used as a critical infrastructure for high-revenue online services such as web search, social networking, advertisement systems, and recommendation systems. For provisioning such large-scale online applications, data centers face extreme challenges in providing desired user experience. These data center applications have very demanding latency requirements and even a small fraction of a second can make a quantifiable difference in user experience thus impacting the revenue. For example, Google observed a 20% traffic reduction from an extra 500 ms (introduced inadvertently), and Amazon found that every additional 100 ms of latency costs them a 1% loss in business revenue. Keeping this challenge in view many transport layer protocols like DCTCP, L2DCT, PDQ and pFabric have been proposed. Among these, pFabric and PDQ provide near optimal performance but require major changes in the switch software and hardware, thus are not deployable. We propose PASE, a practical data center transport protocol that targets improvement in application throughput. PASE is deployment friendly: it does not require any changes to the network fabric; yet, its performance is comparable to, or better than, the state-of-the-art protocols that require changes to network elements (e.g., PDQ, pFabric). Using simulations and real testbed experiments it is shown that, for typical data center traffic patterns and over a wide range of traffic load, PASE improves application performance by upto 40% - 60% over L2DCT, upto 70% over DCTCP in various scenarios and upto 30% over pFabric in some scenarios.


CSE-24  Efficient Instantiation Of Conceptual Data Models Using Model Transformations

Authors: Matthew Nizol; Laura K. Dillon; R.E.K. Stirewalt

Abstract: Conceptual data models are critical to the specification of complex enterprise software. Instantiating these models with test data that satisfy the model’s constraints facilitates validation: domain experts can review test data to check for invalid or missing constraints, and developers can use generated test data to exercise application logic. Recent research has investigated the generation of test data from conceptual models expressed in the ORM language. ORM provides a rich set of constraints to capture the business rules of a domain; however, instantiating an ORM model with unrestricted constraints is NP-hard. Smaragdakis et al. identified a subset of ORM called ORM− whose models can be instantiated in polynomial time. Unfortunately, ORM− does not include many constraints commonly used in practice. Our research adds support for some of these constraints by using model transformations in conjunction with an augmented version of the ORM− instantiation algorithm. With our extensions, modellers can use a richer set of constraints without sacrificing efficient instantiation, thereby producing models that more accurately capture domain semantics while supporting validation.

This work was supported in part by National Science Foundation Graduate Research Fellowship


CSE-25  An Efficient Approach For Clustering Face Images

Authors: Charles Otto; Brendan Klare; Anil Jain

Abstract: Investigations that require the exploitation of large volumes of face imagery are increasingly common in current forensic scenarios (e.g., Boston Marathon bombing), but effective solutions for triaging such imagery (i.e., low importance, moderate importance, and of critical interest) are not available in the literature. General issues for investigators in these scenarios are a lack of systems that can scale to volumes of images over 100K, and a lack of established methods for clustering the face images into the unknown number of persons of interest contained. As such, we explore best practices for clustering large sets of face images (up to 1 million) into large numbers of clusters (approximately 200 thousand) as a method of reducing the volume of data to be investigated by forensic analysts. Our analysis involves a performance comparison of several clustering algorithms in terms of the accuracy of grouping face images by identity, run-time, and efficiency in representing large datasets of face images in terms of compact clusters. For two different face datasets, a mugshot database (PCSO) and the well-known unconstrained dataset, LFW, we find the rank-order clustering method to be effective in clustering accuracy, and relatively efficient in terms of run-time.


CSE-26  Eyebrows As A Tool For Recognizing Individuals

Authors: Raghunandan Pasula; Arun Ross

Abstract: Eyebrows have along been associated with perceiving the mood of a person and as an indicator of beauty. For example, salons offer services specific to fine tuning the eyebrow contours and accentuating its texture. There are a variety of eyebrow shapes, sizes and textures. In this work, these features of the eyebrow are utilized to recognize an individual. Traditionally given a face image, a matcher may extract face specific features such as location of eyes, nose and mouth to establish a match. In some cases such as people wearing scarves, full face is not visible but only the eye (ocular) region is available. There are methods in literature that are able to match these ocular region using textural descriptors. In these cases, eyebrow shape can also be used and the features can be extracted to assist further matching. Eyebrow match scores can also be used in conjunction with ocular match scores to improve recognition performance. Experiments on the proprietary FaceIris dataset reveal that at the expense of 0.1 percent false match rate, the true recognition rate can be improved from 87 percent using only ocular matching to 91percent using both ocular and eyebrow match scores. A generic eyebrow segmentation is also proposed to automatically detect the eyebrows.

This work was supported in part by Draper Labs


CSE-27  Live Face Video Vs. Spoof Face Video: Use Of Moire Patterns To Detect Replay Video Attacks

Authors: Keyurkumar Patel; Hu Han; Anil K. Jain; Greg Ott

Abstract: With the wide deployment of face recognition systems in applications from border control to mobile device unlocking, the combat of face spoofing attacks requires increased attention; such attacks can be easily launched via printed photos, video replays and 3D masks. We address the problem of facial spoofing detection against replay attacks based on the analysis of aliasing in spoof face videos. The application domain of interest is mobile phone unlock. We analyze the moire pattern aliasing that commonly appears during the recapture of video or photo replays on a screen in different channels (R, G, B and grayscale) and regions (the whole frame, detected face, and facial component between the nose and chin). Multi-scale LBP and DSIFT features are used to represent the characteristics of moire patterns that differentiate a replayed spoof face from a live face (face present). Experimental results on Idiap replay-attack and CASIA databases as well as a database collected in our laboratory (RAFS), which is based on the MSU-FSD database, shows that the proposed approach is very effective in face spoof detection for both cross-database, and intra-database testing scenarios.

This work was supported in part by Center for Identification Technology Research (CITeR)


CSE-28  Cooperation Among Smartphones To Improve Indoor Position Information

Authors: Chen Qiu; Matt Mutka

Abstract: Indoor Positioning aims to locate people inside a building and provide Location Based Services (LBS). Accurate indoor location information remains a challenge without incorporating extensive fingerprinting approaches or sophisticated infrastructures within buildings. Nevertheless, modern smartphones are equipped with sensors and radios that can detect movement and can be used to predict location. Dead reckoning applications on a smartphone may attempt to track a person’s movement or locate a person within an indoor environment. However, smartphone positioning applications continue to be inaccurate. We propose a new approach, CRISP - CoopeRating to Improve Smartphone Positioning, which assumes that dead reckoning approaches have inaccuracies, but leverages opportunities of the interaction of multiple smartphones. Each smartphone computes its own position, and then shares it with other nearby smartphones. The signal strengths of multiple radios (Bluetooth, WiFi, Zigbee) that are used on smartphones estimate distances between the devices. While individual smartphones may provide some positioning (possibly inaccurate) information, accuracy may improve when several smartphones cooperate and share position information through multiple iterations. Via indoor experimentation and simulation, we evaluate our approach and believe it is promising as an inexpensive means to improve position information. The range of errors by applying CRISP is within 1 meter. It also possibly leads to better results for a number of applications, including exercise profiling.


CSE-29  Cybersickness Prioritization And Modeling

Authors: Lisa Rebenitsch; Charles Owen

Abstract: Motion sickness due to visual stimuli, or cybersickness, is a topic of public concern. Three-dimensional movies have reports of "movie theater sickness," and visor type displays such as Oculus Rift are now available to the public. Research in the field has posed over forty potential factors affecting cybersickness. The literature on some features is unclear due to selective reporting of equipment configurations, varying equipment configurations, different measurement methods, and limited studies. Literature regarding individual susceptibility to cybersickness is particularly limited. Therefore, several experiments were performed examining the display and stereoscopic rendering. Displays were selected as the focus since all virtual reality environments require a screen. Stereoscopic rendering was also added due to its renewed interest. All experiments also included a survey to search for possible individual susceptibility features. Features that had sufficient support were fitted to a single-term model, and later used for full predicative models. The final goal of the research was to build a predictive using the models created from the literature, results from the experiments, and individual factor identified from the surveys. New statistical methods to cybersickness research, called zero-inflated models, were examined for a better way to compare features in cybersickness. Zero-inflated have the benefit of having the "non susceptible" or "well" group built into the statistical test, which can be upwards of 50% of the participants.


CSE-30  Edge Impact On Distances In Graphs

Authors: Dennis Ross; Abdol-Hossein Esfahanian

Abstract: The edge impact of an edge e on a simple graph G, where e is not in E(G), is determined by measuring changes in the distances of the all-pairs shortest paths in G+e from G. Here change is usually the reduction in the total distance, or sum of the lengths of all shortest paths between all vertices after edge addition. Given a non-complete simple graph G we rank a subset of missing edges, M(G), to find an ordered set I(G) = {e_1, . . ., e_m}. Edge e_k reduces at least as much total distance in G+e_k as e_p does in G+e_p with k<p. We present the framework to the edge impact problem, determinations of edge impact in several graph classes, and describe open conjectures. In addition, we discuss applications to link prediction algorithms.


CSE-31  Unconstrained 3D Face Reconstruction

Authors: Joseph Roth; Yiying Tong; Xiaoming Liu

Abstract: This paper presents an algorithm for unconstrained 3D face reconstruction. The input to our algorithm is an unconstrained collection of face images captured under a diverse variation of poses, expressions, and illuminations. The output of our algorithm is a true 3D face surface model represented as a watertight triangulated surface with albedo data or texture information. 3D face reconstruction from a collection of unconstrained 2D images is a long-standing computer vision problem. Motivated by the success of the state-of-the-art method, we developed a novel photometric stereo-based method with two distinct novelties. First, working with a true 3D model allows us to enjoy the benefit of using images from all possible poses, including profiles. Second, by leveraging emerging face alignment techniques, a combination of landmark constraints and photometric stereo-based normals drives our surface reconstruction. Given large photo collections and a ground truth 3D surface, we demonstrate the effectiveness and strength of our algorithm both qualitatively and quantitatively.


CSE-32  Context-Aware Implicit Authentication For Mobile Devices

Authors: Vaibhav Sharma; Richard Enbody

Abstract: Smartphones are used for all kinds of personal computing purposes. Current authentication methods use explicit authentication such as the user entering a secret PIN. However, not only is explicit authentication not used by everyone, but there also are times when a device changes hands after authentication. For such cases Implicit Authentication(IA) schemes have been proposed. IA techniques make an underlying assumption that there is sufficient behavioral consistency in usage that can be used to uniquely identify a user. Inputs to IA techniques consist of recordings from sensors on the device such as the user’s touch, accelerometer, proximity to cell tower or a WiFi network.  Some apps have access to more sensitive information than others. For example, the user will be more concerned with a guest or attacker accessing a banking app than a gaming app. Also, a user’s touch patterns depend on the running application and the user’s goal within the running application. This observation provides motivation for development of improved, app-centric IA techniques. While doing this, we propose leveraging not only known sensor-based features but also features that can be derived from a user’s behavior while using the app. For example, the time a user takes to login to an app or the time spent on a particular activity of an app will probably be the similar for the same user but vary for different users, and hence might be a useful feature for doing app-centric IA. We present an evaluation of such an app-centric IA technique.


CSE-33  Correlating Names To Faces: A Preliminary Study

Authors: Thomas Swearingen; Cunjian Chen; Arun Ross

Abstract: Biometric systems use anatomical and behavioral traits such as face, fingerprints, iris and gait, to recognize an individual. Demographic data such as gender and ethnicity can also reveal essential information about an individual. In this work, we investigate the problem of combining non-biometric, demographic data with biometric information and names in order to render better decisions about the identity of individuals. First, we study the correlations between names and faces based on the extracted demographic attributes such as gender and ethnicity. Second, missing demographic attributes are deduced based on a label propagation scheme. Third, we attempt to associate names with faces using demographic data. To address these tasks, we assemble a celebrity database containing face images of subjects along with various demographic information. Experimental results on the celebrity dataset, as well as another previously available dataset, demonstrate that combining demographic data with biometric data can be beneficial in a variety of applications.


CSE-34  Detecting Hashtag Hijacking From Twitter

Authors: Courtland VanDam; Pang-Ning Tan

Abstract: On social media such as Twitter, users generate labels, called hashtags, to provide a topic or context to their posts. A hashtag is any text that follows a # symbol and ends with a space, another # symbol, or the end of the post. Hashtags are typically used to categorize a tweet, to monitor ongoing popular conversations, and to facilitate accurate retrieval using Twitter search. Hashtag hijacking occurs when a group of users starts using a popular hashtag to promote a topic that is substantially different from its original context. Hashtag hijacking usually takes one of three forms: Attention seeking troll, PR campaigns gone wrong, and Politically motivated hijacks. Most of the prior research on hashtag hijacking has focused on manual monitoring of popular hashtags, such as those used by prominent politicians or created specifically for political events. In this study, we propose an online multimodal non-negative matrix factorization algorithm to automatically detect hashtag hijacking based on both the content of the tweets and the users who posted them. Preliminary results are presented showing examples of hijacked hashtags detected by the proposed algorithm.


CSE-35  Crowdsourcing Of Network Data

Authors: Prakash Mandayam Comar; Ding Wang; Pang-Ning Tan

Abstract: A key requirement for supervised learning is the availability of sufficient amount of labeled data to build an accurate predictive model. This paper examines the use of crowdsourcing technology to acquire additional labeled data for classification problems in network analysis. Unfortunately, creating human intelligence tasks (HITs) to enable crowdsourcing is cumbersome for network data and may even be prohibitive for privacy reasons. To overcome this problem, we present a novel framework called surrogate learning that enables the mapping of network data into an equivalent representation (i.e., images) so that the labeling task can be performed even by non-domain experts. We demonstrate the efficacy of our approach in acquiring additional labeled data for node classification and link prediction tasks via crowdsourcing. The code and data sets used for evaluation are available at http://www.cse.msu.edu/~ptan/surrogate.html


CSE-36  Joint Multi-Leaf Segmentation, Alignment, And Tracking From Fluorescence Plant Videos

Authors: Xi Yin; Xiaoming Liu; Jin Chen; David M. Kramer

Abstract: This paper proposes a novel framework for fluorescence plant video processing. Biologists are interested in the leaf level photosynthetic analysis within a plant. A prerequisite for such analysis is to segment all leaves, estimate their structures and track them over time. We treat this as a joint multi-leaf segmentation, alignment, and tracking problem. First, leaf segmentation and alignment are applied on the last frame of a plant video to find a number of well-aligned leaf candidates. Second, leaf tracking is applied on the remaining frames with leaf candidate transformation from the previous frame. We form two optimization problems with shared terms in their objective functions for leaf alignment and tracking respectively. Gradient descent is used to solve the proposed optimization problems. A quantitative evaluation framework is formulated to evaluate the performance of our algorithm with three metrics. Two models are learned to predict the alignment accuracy and detect tracking failure respectively. We also study the limitation of our proposed alignment and tracking framework. Experimental results show the effectiveness, efficiency, and robustness of the proposed method.

This work was supported in part by Chemical Sciences, Geosciences, and Biosciences, Office of Basic Energy Sciences, Office of Science, US Department of Energy (award number DE-FG02-91ER20021, to support fluorescence measurements and software development)


CSE-37  Development Of A Regionalization System Using Contiguity-Based Constrained Spectral Clustering

Authors: Shuai Yuan; Kendra Spence Cheruvelil; Patricia Soranno; Pang-Ning Tan

Abstract: A regionalization system delineates the geographical landscape into smaller, spatially homogeneous units or patches with similar characteristics, thus providing a spatial framework for a wide range of applications, such as land allocation, landscape planning, habitat design, and environmental management. This project investigates a quantitative approach for developing a regionalization system from terrestrial ecology data using constrained clustering algorithms. We demonstrate the limitations of existing algorithms and present a novel contiguity-based spectral clustering method to overcome such limitations.


CSE-38  Using Genetic Programming To Synthesize The Distributed Fault-Tolerant Programs

Authors: Ling Zhu; Sandeep Kulkarni

Abstract: In most applications using genetic programming (GP), objective functions are obtained by a terminating calculation. However, the terminating calculation cannot evaluate distributed fault-tolerant programs accurately. A key distinction in synthesizing distributed fault-tolerant programs is that they are inherently non-deterministic, potentially having infinite computations and executing in an unpredictable environment. In this study, we apply a model checking technique - Binary Decision Diagrams (BDDs) - to GP, evaluating distributed programs by computing reachable states of the given program and identifying whether it satisfies its specification. We present scenario-based multi-objective approach, and each program is evaluated under different scenarios which represent various environments. The computation of the programs are considered in two different semantics respectively: interleaving and maximum-parallelism. In the end, we illustrate our approach with a Byzantine agreement problem, a token ring problem and a consensus protocol using failure detector S. For the first time, this work automatically synthesizes the consensus protocol with S. The results show the proposed method enhances the effectiveness of GP in all studied cases when using maximum-parallelism semantic.


CSE-39  Continuous Authentication of Mobile User: Fusion of Face Image and Inertial Measurement Unit Data

Authors: David Crouse; Hu Han; Deepak Chandra; Brandon Barbello; Anil Jain

Abstract: Mobile devices can carry large amounts of personal data, but are often left unsecured. PIN locks are inconvenient to use and thus have seen low adoption (33% of users). While biometrics are beginning to be used for mobile device authentication, they are used only for initial unlock. Mobile devices secured with only login authentication are still vulnerable to data theft when in an unlocked state. We introduce a face-based continuous authentication system that operates in an unobtrusive manner, including a methodology for fusing mobile device (unconstrained) face capture with gyroscope, accelerometer, and magnetometer data to correct for camera orientation and, by extension, the orientation of the face image. Experiments demonstrate both improvement of face recognition accuracy from face orientation correction and the efficacy of the prototype continuous authentication system.