I haven't done anything like this on the Windows side, but as a general word of advice, spend a good deal of time up-front researching built-in and third-party libraries. I've done things like this with other languages and spent more time than I would have liked toiling over some nitty-gritty details that could have been handled by an existing library in just a few lines of code.
For example, searching for C# html parser led me to this:
http://htmlagilitypack.codeplex.com/I'd break down all the complex tasks you're planning on performing into a series of basic tasks and spend time researching how others have tackled similar problems. Do whatever you can to avoid reinventing the wheel.
Also, consider using IronPython to integrate Python into your project. Python has many excellent libraries for these types of projects, and it may be easiest to go this route instead of trying to recreate a library that doesn't have a C# equivalent.
Sorry I couldn't offer more specific advice. Good luck, and let us know how the project shapes up.