Some notes about data science primarily for myself, and possibly for others:
- Spent three hours today getting scipy, numpy, and scikit-learn to work together. These are all nice programs for data analysis and not hard to install individually. However, I had trouble getting my different versions of python to see these various packages. I learned about python's sys.path (how it 'sees' different modules) but in the end learned that easy_install-ing scikit-learn in the appropriate directory was the way to get python to see it (I'm using a Macbook Pro). What did not work: modifying sys.path, trying to combine MacPorts python and Enthought python distributions, messing with .bashrc or .profile files.
- Be careful about names. There is no such thing as py27-sklearn, no matter what anyone tells you. Even worse, using MacPorts I needed to look for scikits not scikit. Fortunately it has a good search function.
- Once scikit-learn was working, I downloaded some sample data about molecules from Kaggle.com and made my first submission, following the tutorial provided. Lesson: I can make things happen with no understanding whatsoever. Don't be that guy. So I will read about random forests today to figure out what I actually did.
- A few days before Christmas I installed R. That was super-easy.
- I think one first intellectual order of business is learning to "look at" data somehow so that I develop a sense of what to do with it next.
No comments:
Post a Comment