| Academics | Papers | Data | Resume |
My refereed publications can be downloaded from my personal Auton Lab page. Bibtex entires are provided automatically, along with author and subject links. Many of the papers below duplicate those at that site. Those with "My Page" links may include slides or notes from various talks, and are summarized on their own web page.
Most of the datasets we use in our work are publicly available. Many of these can be found on my datasets page.
My logistic regression and dynamic AD-tree software are publicly available, in binary and source forms. Some of it is nicely packaged and has documentation! Check out my logistic regression page for GPL'd logistic regression software, and the Auton Lab's software page for the nicely packaged stuff. For anything you don't find there, please email me.
All papers and talks are copyright Paul Komarek, or possibly others groups like IEEE. Please respect these copyrights. For IEEE publications, reposting is not allowed without IEEE permission.
Making Logistic Regression A Core Data Mining Tool
With TR-IRLS.
This is a short (4 page) version of my similarly-titled CMU
Robotics tech report (CMU-RI-TR-05-27). Because it is
published at an IEEE conference, I no longer hold copyright
on this work, and the IEEE asks that you not repost it.
You are free to download it, of course!
This paper is the easiest, fastest way to learn about Truncated
Regularized Iteratively Re-weighted Least Squares (TR-IRLS),
my algorithm for fast, parameter-free logistic regression.
Note that TR-IRLS applies to any generalized linear model
(i.e. GLM, GLiM). This paper includes a new, very brief
comparisons to SAS' proc logistic and some
results on a text dataset. IEEE ICDM 2005, pages 685-688.
ps
ps.gz
pdf
Auton page
Logistic regression for fast, accurate, and parameter free data mining. These slides were presented at Google, Inc, on 28 July 2005. The first half of these slides briefly discusses why we like LR for data mining, and how we accelerate parameter fitting. The second half of these slides discusses several interesting and unusual applications of LR, most of which have software or papers available from the Auton lab website. My page
Making Logistic Regression A Core Data Mining Tool: A Practical Investigation of Accuracy, Speed, and Simplicity. This paper summarizes and extends work from my thesis. If you are interested in an accurate, fast logistic regression algorithm that doesn't seem to need tuning, I recommend reading this paper before reading my thesis (further below). It contains all of the important information from my thesis, and a few new details. CMU-RI-TR-05-27. Auton page CMU RI tech report page
High-Dimensional Probabilistic Classification for Drug Discovery. The principal author is Alex Gray. Discriminative probabilistic classifiers have been used successfully on large life-sciences datasets, but high dimensionalities have prohibited the use of nonparametric class probability estimation. This paper explores a method (SLAMDUNK) which addresses this. COMPSTAT 2004. Auton page
Autonomous Fast Classifiers for Pharmaceutical Data Sets. This was an invited talk about my LR work. Midwest Biopharmaceutical Statistics Workshop (MBSW) 2004. My page
Logistic Regression for Data Mining and High-Dimensional Classification. This is my doctoral thesis. It covers everything we knew about fast logistic regression at the time of writing. I recommend you read the more recent logistic regression papers before reading this one. CMU RI Tech Report CMU-RI-TR-04-34. My page Auton page CMU RI tech report page
A Comparison of Statistical and Machine Learning Algorithms on the Task of Link Completion. This paper explores various methods of detecting group membership from arbitrary link data. The learning task is equivalent to collaboritve filtering. My logistic regression fitting method is used in a special multiclass arrangement, and this is compared to several more natural algorithms for link analysis. This is a collaborative work, and the authors include Jeremy Kubica and Anna Goldenberg. KDD Workshop on Link Analysis for Detecting Complex Behavior, 2003. Auton page
Fast Robust Logistic Regression for Large Sparse Datasets with Binary Outputs. This was my first paper covering our logistic regression implementation efforts. It is superceded by later papers. AISTAT 2003, pages 197-204. ps ps.gz pdf Auton page
A Dynamic Adaptation of AD-trees for Efficient Machine Learning on Large Data Sets. This paper discusses a modification to AD-trees to allow incremental and lazy growth. We discuss our implementation of these Dynamic AD-trees and present results for datasets with scores of high-arity attributes and millions of rows. ICML 2000, pages 495-502. My page Auton page
| Academics | Papers | Data | Resume |
| Up to Academics | Home (komarix.org) |
| Created by Paul Komarek, komarek.paul@gmail.com |