Three things you can do with spaCy without knowing anything about machine learning.
Want to apply some machine learning magic to text but not quite ready to rip out your math/statistics textbooks? Or don't think you have enough data to really do anything interesting?
I remember first encountering a lot of these ideas in the NLTK book. It's a great resource, but for anyone just getting into natural language processing, it can be a bit much. There are many ways to train models to work with text, many techniques even outside of machine learning you can use to understand it, lots of corpus names and even more ways of organizing everything. NLTK is a potpourri of tools for working with text, which is great if you're really looking to experiment, but I like a bit more of a roadmap when entering into completely new territory.
One of the things I love most about spaCy is that it's opinionated. It gives you some of the most important techniques and tools for dealing with text and organizes them really well. It does let you go beyond the decisions it makes for you, but if your main interest is just building some more intelligent systems, you may never feel the need to rethink how they set things up.
Let's get what we need installed first. I'll assume that you've got a Python 3 environment where you want to install spaCy and its dependent libraries. If not, Anaconda is a great way to get Python for doing lots of data science stuff, including this, and it's always a good idea to create a separate environment for experimenting, whether it's a conda env or a regular ol' Python venv. As long as that's in order, you'll need just a few commands:
pip install spacy
python -m spacy download en_core_web_sm
# Only if you want to do the 2nd "Word math!" section
python -m spacy download en_core_web_md
Alright then! Let's go! Here are three ways you can do some stuff that would probably require quite a few more if
statements without spaCy, all without needing to understand machine learning enough to do your own training.
RegEx++
If you're trying to validate or extract any sort of information programmatically from raw text, regular expressions can save you from a lot of tedium. Want to check that a user entered a phone number like 555-555-5555?
import re
def check_phone_number(str):
if re.match(r'^\d{3}-\d{3}-\d{4}