Worksheet: python

Showing posts with label python. Show all posts

Monday, May 31, 2010

A Buzz API Python Application for Social Networks Applied to My Students' Buzz Networks

Click the image to see the results of a project I've been working on for the last week. My goal was to start to get a handle on how well students in my Web Marketing Practicum class were getting on in Buzz. What I produced was a basic tabular report designed to show how well the students were connecting with each other and with external parties.

Here's how to read the columns:

Participant: Just the class participant whose network is being examined. I'm included in this as I'm part of the network.
Mutually Following: The people the participant is both following and being followed by. For many social network analysts, this column is what constitutes the social network. Other participants in the class are color coded orange and in a bold weight font. Those outside the class are blue and in a regular weight font. A quick perusal of this column reveals that, with the exception of myself, the majority of each of the other participant's networks is composed of other participants.
Not Following Participant Back: Often, this column is not important. It may represent people who the participant is following purely for information. However, it can start to be an issue of poor perception management if no one the participant follows is following them back.
Participant's Other Followers: These are people who follow the participant but are not followed by the participant. They may represent an opportunity for the participant.

I'm sure this analysis seems super simple. What's the upshot?

One issue with Buzz as it stands in May, 2010 is that it does not have an easy way to perceive your network. Who's in? Who's out? Who are potential people to connect with. This analysis begins to provide an answer to all of those questions.
Right now, I'm clearly the most connected node on this network by any measure. Students may be able to feed off of me. Also, some of them have started to grow their networks, and as they do so, they can feed off of each other.

My plan is to do a separate post on the Python code itself sometime in the next 10 days, and I'll include a cleaned up version of the code with that. Suffice it to say that I did not use the Python client libraries for the Buzz API. Rather, I just used the RESTful API. The main reason was that it was chock full of examples for how to get the data I wanted. I did wind up writing a simple Python abstraction layer for it.

Next Steps

To be honest, it may be seeing wether I can duplicate this effort with the twitter API. All reports indicate that my students may be having an easier time there. Tracking is certainly easier. I just created a list for my students.

However, even with twitter, figuring out who is in your active network is hard. As simple as this exercise is, it begins to accomplish that task.

Tuesday, May 25, 2010

A plan of attack on the buzz API using Python + REST

So, as of last Wednesday, the Buzz API is out. Right now, things are early stage still. You can do things, but there are not a lot of refined examples to help you along, and the client libraries for popular web programming languages like Python are not yet fully complete.

By counter, examples for the REST endpoints are more fully developed, and I've decided to just use those. Directly using the REST endpoints in python is a little more difficult because it requires you to do things like marshall your requests into a non-python data format such as xml or json and decode all responses from said format. You also potentially have to handle things like authentication using the oauth protocol. Fortunately for the intrepid, there are a few resources that can shed some light on how to proceed. I'll list them here:

First is the documentation for the buzz REST API. It gives both an overview of the API's philosophy and concrete examples to illustrate it. Often, you can just execute the examples in the address bar of the chrome web browser to get an idea of the kind of information you'll get back.
Second is oacurl. This is a java command line utility that's pretty easy to install on most unix-like systems (I'm on mac os x). It's an easy way to make requests of the API and see the raw response it sends back. When required, oacurl allows you to easily execute actions that require authentication via oauth. The cookbook is invaluable. Executing those examples will give you more than one aha! moment.
Third is the buzz bingo example written in Python and presented last week at Google I/O. The beauty of this example is that it follows the REST strategy I'm laying out for myself here. In other words, it can act as a blueprint in Python for some of the features and hurdles I'll need to overcome when using REST.
The python packages httplib2 and simplejson. The main advantage is that these are in widespread use. Further, I myself have used them, so getting reacquainted should not be too bad. httplib2 facilitates connecting over http to REST endpoints. simplejson is a python package for decoding json.

My plan of attack going forward

My guess is that the developers of the buzz API expect the modal programmer using it to be coding some sort of web app which will ask the user's permission to access data. I plan to get there at some point, but first I'm going to see if I can pull public data for a few users and see if I can interrelate it. That will allow me to cut out a lot of complexity, like hosting the app on the web or needing to manage authentication.

Tuesday, March 16, 2010

Adventures in grading student blogs using gdata python

This post is about how Google's gdata python API helped convince me to use blogger for my student class blog. The screenshot, simple as it is, shows a blog evaluation framework available on no other platform (see the page updated daily here). I created it in about a week starting from scratch with the pretty much only intermediate python development skills. Read on for a narrative about my high level strategy in getting it all to work and where I think I'm going with it.

Introduction

I've been incorporating blogging in my courses since 2004. A typical blogging pattern is for students to generate 3 to five posts per week, with their blogging activity accounting for 30% of their grade. There have been three persistent issues with student blogging:

Hosting support
Student tracking
Analyzing student posts

Motivation for starting with the blogger API

I've used all of the major platforms with a preference toward sixapart's offerings, mainly because I knew them, and movabletype seemed to offer significant customization features. However, about a year ago, I decided I simply had to stop hosting the blogs on my own server. If nothing else, combatting spam and bad guys was getting beyond me. Since then, I've been itinerant across a couple of hosting services.

In my classes, we currently use a group blog (in the end, it's just easier) with between 50 and 70 participants. There is no hosted blogging service that offers a convenient way to track the activities of that many group blog participants (blogger is particularly egregious in this regard but also the easiest for adding participants). Plus, my needs were unique. I wanted to know how many posts students had written and whether these posts met certain criteria.

I had always heard about the blogger api and was becoming increasingly impressed with the gdata python api in general through my brushes with it in various youtube projects. So, last December, as the holiday break was nearing I decided I should take a run at seeing whether the gdata python api would allow me to overcome the limitations of the blogger platform and perhaps go beyond anything I could do on any platform.

Initial Results

My timeframe was one week while my wife and kids were away visiting her father. I downloaded the python gdata V2 archive and started in. I would describe myself as a novice/intermediate python programmer.

Within the week, I had basically achieved what you see here:

http://biggerbuybutton.com/university/results.html

that I'm using to track this blog:

http://winter2010.biggerbuybutton.com/

I'll admit the current approach is rather basic, but it fulfills the role of tracking where students are and indicating posts they may want to work to improve. As shown in this screenshot providing the detail for my own posts, edit links are provided in cases where posts are not up to snuff, and the posts that need work are clearly highlighted with little messages as to what's missing.

Future Plans

What I find most intellectually engaging and spent spring break working on is useful ways to summarize the content of student posts both individually and across the group:

Using semantic analysis services from the likes of opencalais, zemanta, and evri (to give just a sample), what are an individual students most common themes?
What are the themes shared across students?
What is the evolution of themes over time?
Who are connecting the most with their peers and over what topics? (post content is supposed to be 80% class related, and true personal posting is vanishingly infrequent).
Who is discovering the most new external resources?
What are the biggest search keywords and how do they relate to themes raised by the group?

From an operational perspective:

How can I share this code?
When should I share it?
What is the best approach for dealing with unreliable third party services and how should I bullet proof against that?
I need to think of adding unit tests as this grows beyond a quick hacking experiment.

This has been long. Thanks for reading this far if you actually made it. I'm interested to hear your thoughts in comments.