Wednesday, December 26, 2012

Back in MN... for a bit...

Back in Minnesota for a month. Priorities: socialize! and see people I only see in MN; write some things about quantum cohomology; play with data science. Heading for New York state in January...


Some notes about data science primarily for myself, and possibly for others:
  • Spent three hours today getting scipy, numpy, and scikit-learn to work together. These are all nice programs for data analysis and not hard to install individually. However, I had trouble getting my different versions of python to see these various packages. I learned about python's sys.path (how it 'sees' different modules) but in the end learned that easy_install-ing scikit-learn in the appropriate directory was the way to get python to see it (I'm using a Macbook Pro). What did not work: modifying sys.path, trying to combine MacPorts python and Enthought python distributions, messing with .bashrc or .profile files.
  • Be careful about names. There is no such thing as py27-sklearn, no matter what anyone tells you. Even worse, using MacPorts I needed to look for scikits not scikit. Fortunately it has a good search function.
  • Once scikit-learn was working, I downloaded some sample data about molecules from Kaggle.com and made my first submission, following the tutorial provided. Lesson: I can make things happen with no understanding whatsoever. Don't be that guy. So I will read about random forests today to figure out what I actually did.
  • A few days before Christmas I installed R. That was super-easy.
  • I think one first intellectual order of business is learning to "look at" data somehow so that I develop a sense of what to do with it next.
Data science is my winter break diversion this year, so we'll see what happens! Also happy to get back to my kettlebells, which I did not take to CA because they weigh so much. And now that I did all that "fun stuff" with python and scikit-learn, I need to sit down and work through some examples of toric degeneration of Grassmannians. Fun stuff :)

No comments: