PyData NYC 2012 PyData

PyData NYC 2012

October 26–27, 2012

Speaker Bios

Keynote Presenters

Dave Himrod
Director of Optimization and Analytics, AppNexus
Rapid Iteration with Python: Scaling AppNexus

As Director of Optimization and Analytics, Dave Himrod manages a team of analysts, quants, and engineers devoted to crafting world-class algorithms. When Dave joined in 2009, he managed AppNexus' first account, eBay. While building AppNexus' original optimization algorithm, Dave was heavily involved in building out the data-pipeline and defining the data model still in use today. He has since grown his team to more than 20 people and focuses his time on building a world-class scalable optimization system. He and his team continue to improve the tools for optimized pricing and budgeting for the over 27 billion ad impressions their platform sees per day. Dave has a Bachelor¹s Degree in Computer Science from University of Pennsylvania.

Steve Kannan
Engineering Manager, Optimization and Analytics, AppNexus
Rapid Iteration with Python: Scaling AppNexus

As Engineering Manager for Optimization and Analytics, Steve manages software development for AppNexus's best-in-class systems for ad transaction optimization. Since joining AppNexus in 2010, Steve has led the design of distributed systems for scalable computation and data processing and set the technical standards for a team of engineers while iterating on the optimization feature set. Previously, Steve was a software developer at Google working on Google Places for Business and Local Search Quality. Steve has a Master's of Engineering in Electrical Engineering and Computer Science and a Bachelor's Degree in Computer Science from MIT.

Van Lindberg
Remaking the PSF: Data, Developers, and the Next Ten Years of Python

Van is a lawyer at Haynes and Boone, where he spends most of his time helping clients with patent defense and open source questions. For a lawyer, though, he spends an inordinate amount of time working at a Python prompt, trying to automate all the tedious parts of his job and advancing his hobby of computational linguistics.

In the rest of his time, Van works as chairman of the Python Software Foundation where he speaks and writes on open source issues. His first book on open source software and intellectual property law was published by O'Reilly and he is working on a second book about the economics of open source.


Francesc Alted
Software Architect, Continuum Analytics

Mr. Alted has a B.S. in Physics from the University of Valencia, Spain and three M.S. degrees, Electronics and Computer Science, Theoretical Physics, and Mathematical Methods, from the University of Valencia and the University Jaume I. Francesc is also the author of the popular PyTables package, which is a nascent database for NumPy.

Stephen Diehl

Stephen Diehl is a independent Python developer based out of Boston, Massachusetts. He is the author of serveral data and networking libraries and a long time Numpy user. His recent work has been related to building next-generation Numpy infastructure to support out-of-core and distributed workflows.

Didier Deshommes
Engineering Lead,
Wikipedia Indexing And Analysis

Didier spent the first 18 years of his life in Port-Au-Prince, Haiti and moved to North Carolina to attend college. He obtained two bachelor's degrees from North Carolina State University, one in Applied Mathematics and one in Computer Science. He also has a master's in Mathematics. As an advocate of open source technologies, Didier has contributed to a number of projects including the SAGE project (a computer algebra system), python-solr (a Python binding for the Solr open source search engine), and sfpy (a Python Superfeedr client). At, he has worked primarily on systems architecture, large-scale web crawling, natural language processing systems, search technology, and HTTP/JSON APIs.

Brian Eoff
Scikit Random Forest

Gilad Lotan
Network X and Gephi

Gilad Lotan is the VP of Research and Development at SocialFlow, a New York City company that uses science and real-time data to help businesses earn greater attention and engagement on Twitter and Facebook. Previously, Gilad built social data visualization tools at Microsoft's FUSE labs. Past work includes 'Retweet Revolution', visualizing the flow of information during the 2009 #IranElection riots, and a study investigating the relationship between mainstream media and social media channels during the Tunisian and Egyptian revolutions. Gilad’s work has been presented at PDF, TED, SXSW, published at IJOC, ICWSM, HICCS and covered by the New York Times, the Guardian, Fast Company and the Atlantic Wire.

Brian Granger
Cal Poly State University and the IPython Project
IPython Notebook; IPython Parallel

Brian Granger is an Assistant Professor of Physics at Cal Poly State University in San Luis Obispo, CA. He has a background in theoretical atomic, molecular and optical physics. His current research interests include quantum computing, parallel and distributed computing and interactive computing environments. He is a core developer of the IPython project, the creator of PyZMQ, a contributor to SymPy and has a Ph.D. in theoretical physics from the University of Colorado.

Josh Hemann
Getting past the hype: how to connect Data Science to business value

Josh Hemann is a Senior Consultant at FICO in the Marketing Strategy and Analytics group. He works with some of the largest retailers in the world on using Big Data to drive multimillion dollar business decisions. Prior to his role at FICO he was the Group Manager of Marketing Analytics at Sports Authority, where he brought Python into an analytics environment previously dominated by SAS. Before jumping to the dark side of marketing (it's where the money is), he worked at Rogue Wave software, where he was a core member of the team that developed the PyIMSL Python wrappers to the IMSL C Numerical Libraries. Josh has an MS in Applied Mathematics from the University of Colorado, where he has used (and continues to use) Python for peer-reviewed academic work in air pollution modeling.

Andreas Klöckner, Ph.D.
Instructor, Courant Institute, New York University
GPU and Parallel Processing

Andreas Klöckner received Diplom degree in Technomathematik (applied mathematics) from Universität Karlsruhe in 2005, and Sc.M. and Ph.D. degrees in applied mathematics from Brown University in 2006 and 2010. He is currently an Instructor at the Courant Institute of Mathematical Sciences at NYU. His research interests include efficient solvers for hyperbolic and elliptic PDEs as well as computational tools for computation on modern, massively parallel computer architectures.

Wes McKinney
Pandas; R, Hadoop and Python; Statsmodels & Patsy

Wes is the creator and lead developer of the pandas library and the author of the O'Reilly book, Python for Data Analysis. He has served as an expert Python consultant to many financial firms and is actively engaged in industry conferences as a speaker. Prior to co-founding Lambda Foundry, Wes worked at AQR Capital Management researching global macro and credit trading strategies. He holds a degree in Mathematics from MIT, with additional graduate studies in Statistical Science at Duke University.

Zain Memon
GIS in Python Using Shapely

Zain Memon was part of Movity, which became the geo team at Trulia, and has made data-visualizations like Trulia's crime maps, WeePlaces, and tendermaps. He uses Python libraries like Shapely, GeoDjango, and TileStache to create beautiful, useful maps.

Andrew Montalenti
Chief Technology Officer,
NLTK and Text Processing; Web Crawling and Metadata Extraction

Andrew is the Chief Technology Officer at and a technologist with nearly a decade of experience in software engineering. He earned a bachelor's degree in Computer Science (with honors and departmental distinction) from NYU. After graduating, he acted as a technical lead on a small software team within Morgan Stanley. Prior to founding, he built large-scale web applications and systems through Aleph Point, Inc., the software engineering consultancy he owned and operated. His team of expert consultants served large corporate clients and mid-size technology companies. At - the company he co-founded in 2009 - he currently leads the product design and system engineering teams. A dedicated Pythonista, JavaScript hacker, and open source advocate, Andrew is also a published technical author and editor.

Chris Mueller
President and CTO, Lab7 Systems
MapReduce with Disco; Python and JS Web Visualization

Chris Mueller is the founder and CTO of Lab7 Systems, Inc., an Austin, Texas based startup creating solutions for high-throughput genome sequencing. A longtime Python user and advocate, Chris has spent a good portion of the last decade developing Python-based software for data intensive life science applications, including combinatorial drug design, cell system modeling, and genome sequencing. Chris's professional experience also includes stints developing commercial scientific software and visualization tools and large-scale Web applications. Chris has a B.S. from the University of Notre Dame and an MS/Ph.D. from Indiana University, all in Computer Science.

Travis Oliphant, Ph.D.
CEO and Co-Founder, Continuum Analytics
Introduction to NumPy; Introduction to SciPy

Dr. Oliphant has a Ph.D. in Biomedical Engineering from the Mayo Clinic, and M.S. and B.S. degrees in Electrical Engineering (and Math) from Brigham Young University. Travis has worked extensively with Python for numerical and scientific programming since 1997, and was the primary developer of the NumPy package and the author of the definitive Guide to NumPy. He is also the primary founding author of the SciPy package. During his academic career, he has worked in the fields of satellite remote sensing, Magnetic Resonance Imaging (MRI), Ultrasound, elastography, and general inverse problems. He was an Assistant Professor of Electrical and Computer Engineering at Brigham Young University from 2001 to 2007 where he taught courses in probability theory, electromagnetics, inverse problems, and signal processing. In addition, he directed the BYU Biomedical Imaging Lab, and performed research on scanning impedance imaging. He has done consulting work since 1997 in laser scattering off of semiconductors, sparse matrix calculations for search engines, and mesh transformations for fluid dynamics. Dr. Oliphant co-founded Continuum Analytics, Inc. in 2012 and currently serves as its CEO.

Jason Pell
DNA Sequence Filtering and Analysis with khmer

Jason Pell is a Computer Science Ph.D. student at Michigan State University, advised by Dr. C. Titus Brown, an Assistant Professor and PSF member. He is broadly interested in large-scale data analysis, especially with the latest DNA sequence datasets generated by next-generation sequencing technology. His Ph.D. research is centered around characterizing the scalability of data structures for DNA sequence filtering and assembly as well as the development of novel algorithms for scalable sequence error correction.

Davin Potts
Chief Science Officer, Stipple, Inc.
Distributed Image Data Exploration with IPython and Disco

Davin is responsible for driving the science behind Stipple's computer vision and market creation technologies. His work includes discovering algorithms, developing prototype and production code, and identifying new product opportunities through data mining. Prior to joining Stipple, Davin co-founded, GmbH, a 4-year old commercial startup in Zurich, Switzerland, focused on offerings around the open source KNIME data mining and visualization platform.

V. James Powell
Generators & Coroutines for Stream Data Processing

James is a professional Python programmer based in NYC. He has recently become very active in the NYC Python community, and has spoken on advanced python topics at PyGotham, PyTexas, and at the monthly NYC Python Meetup's lightning talks. His interests are in large-scale software development, wherein he considers Python one of the best-engineered languages for industrial application development.

Skipper Seabold
Statsmodels & Patsy

Skipper is a Ph.D. student, studying economics at American University in Washington, D.C. He specializes in applied econometrics, information theory and entropy econometrics, and topics in growth theory. He has been working on statsmodels, a package for estimating statistics models in Python, throughout his studies.

Michael Selik
IPython for Teaching and Collaboration

Michael Selik is an econometrics and machine learning consultant based in New York City. He has worked for Dow 30 corporations and venture-backed startups delivering sophisticated analysis and technology project management services. Recent projects include hyperlocal demographics inference, customer segmentation, and market share forecasting. He received a MS Economics, a B.S. Computer Science, and a B.S. International Affairs from the Georgia Institute of Technology.

Chang She
Timeseries with pandas

Chang is a former quant researcher-trader turned developer of data science platforms and tools. Currently a co-founder at Lambda Foundry providing data science solutions with a financial bent, he is also a core developer of the open source pandas library for data analysis. His previous employers include AQR Capital Management and Barclays Capital. Chang graduated from MIT with degrees in computer science and political science.

Hugo Shi, Ph.D.
Software Engineer, Continuum Analytics
Introduction to SciPy

Dr. Shi has a Ph.D. in Electrical Engineering from the University of Michigan studying statistical medical image reconstruction and a BS from UC Berkeley. Prior to joining Continuum, Hugo was consulting at an investment bank optimizing parallel computing and data distribution problems, and he consulted on quantitative investment strategies for a machine learning and big data focused hedge fund. Hugo has also developed quantitative strategies and real time risk management tools for Chicago Trading Company, an options market making firm. Before leaving the West coast, Hugo worked on embedded systems for ad-hoc multi-hop wireless sensor networks.

Andy R. Terrel, Ph.D.
Researcher, Texas Advanced Computing Center
MPI; GPU and Parallel Processing

Dr. Andy R. Terrel is a High Performance Computing researcher at the Texas Advanced Computing Center. In this role, Andy helps users utilize supercomputers with Python and studies methods for speeding up computational fluid dynamics. He graduated from the University of Chicago with a Ph.D. in Computer Science in 2010 and has been programming in Python for the last decade. He is a contributor to numerous open source projects including FEniCS Project and Sympy.

Joseph Turian, Ph.D.
MetaOptimize LLC
ML, NLP, + Data Science in Python: Patterns, Recipes, and Best Practices

Joseph Turian, Ph.D., heads MetaOptimize LLC, which consults on machine learning, natural language processing, and predictive analytics. He has over a decade of expertise on these topics. Dr. Turian also runs the MetaOptimize Q&A site, where Machine Learning and Natural Language Processing experts share their knowledge. He specializes in large data sets.

Joseph Turian holds a Ph.D. in computer science (with a focus on Machine Learning and Natural Language Processing) from New York University since 2007. He received his AB from Harvard University in 2001.

Stefan Urbanek
Python for Business Intelligence

Stefan is a data analyst, information architect, and knowledge design consultant at Continuum Analytics. Before joining the company, he worked for several years in the mobil telecommunications industry on data warehouse, customer intelligence and analytical CRM projects. He has created several open data and open government applications and architectures. Stefan's work focuses heavily on data quality management, data governance, and data provenance. He is also the author of the Cubes - Lightweight Python OLAP framework and blogs about Data Brewery and Cubes at

Thomas Wiecki
Researcher, Quantopian, Inc.
Simulated Algorithmic Trading with Zipline: Backtesting, Statistics, and Optimization; GPU and Parallel Processing

Thomas Wiecki is a 3rd year Ph.D. student at Brown University. His research domain is computational cognitive neuroscience. He also works as a researcher for Quantopian Inc. His interests include statistical modeling, Bayesian data analysis and scientific and higher performance python programming. Thomas is the author of several open source Python packages including HDDM, a scientific tool used to study decision making, and mpi4py_map, which adds worker-pool and queuing capabilities to mpi4py.

Stefan van der Walt
Scikits-image tutorial
Introduction to NumPy

Stéfan is a lecturer in applied mathematics at Stellenbosch University and a post-doctoral researcher in neuroscience at the University of California, Berkeley. A long time contributor to NumPy and SciPy, he now leads the development of scikits-image, the image processing toolbox for Python. Stéfan is a strong advocate for the use of open source software in science and education. In his spare time, he enjoys hiking, running and photography in the great outdoors.

Jake VanderPlas, Ph.D.
Researcher, University of Washington Survey Science Group
Scikit-learn Tutorial; Matplotlib Tutorial

Jake VanderPlas is a post-doctoral researcher in the Astronomy department at University of Washington. His research involves applying recent advances in machine learning to large astronomical datasets, in order to learn about the Universe at the largest scales.

He is co-author of the upcoming book "Statistics, Data Mining, and Machine Learning in Astronomy", to be published by Princeton Press in 2013, and has presented many technical talks and papers in this subject area. In the python open-source world, Jake is a core maintainer of SciPy, a regular contributor to scikit-learn, and the creator of astroML, a python package for machine learning in astronomy and astrophysics, to be released this October. He occasionally shares his thoughts on his Python blog at

K. Young
R, Hadoop and Python; Hadoop, Python Integration and Pig

K. Young is the CEO and co-founder of Mortar Data. Mortar's mission is to make big data pipelines easy to use. Mortar's PaaS makes working with Hadoop (Pig) and Python seamless —including NumPy and SciPy. Prior to founding Mortar Data, K built software that reaches one in ten public school students in the U.S. He holds a BA Computer Science from Rice University.

Register for Eventbrite API - Office Hours on Eventbrite

PyData NYC 2012 presented by Continuum Analytics

© 2012 Continuum Analytics
Questions about attending or sponsorship? Contact us.