Wiki Revisions History

I recently wanted to get some sample data for some of my nosql trials and decided to search for some wiki metadata. More specifically, the history of the revisions. Very soon I realized that many folks have already built applications on it and that there is extensive API available to get the data.

I could make the data set by running throught the API and getting revisions on all pages. However, scraping isn’t a good idea and media wiki limits the results for the same reason.

For the info : API calls can be made by referring to the documentation here http://www.mediawiki.org/wiki/API

As an example for the API call, if we need to find the revisions for a page named “Geography_of_Afghanistan”, we could use the following call…

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Geography_of_Afghanistan&rvprop=ids|timestamp|user&rvlimit=5000

And the following call would also give us the comments

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Geography_of_Afghanistan&rvprop=ids|timestamp|user|comment&rvlimit=5000

Notice that although we use the rvlimit as 5000 , the results are limited to 500 and we also get the message with the response ..

“rvlimit may not be over 500 (set to 5000) for users” 

To get the complete data set , media wiki provided data dumps that can be downloaded. Refer to this link for the dumps .http://dumps.wikimedia.org/enwiki/20110317/

What we need is the meta history. Once I downloaded the dump I realised that the latest xsd was not available for the data set. The latest xsd doc supplied by media wiki is at http://www.mediawiki.org/xml/export-0.4.xsd ,but, we need the export-0.5.xsd to work with the downloaded dumps.

So, to solve the problem above, I downloaded trang. trang can be used to generate xsd from xml. Here is a good write-up to get an idea.

I will add the export-0.5.xsd that got generated to this blog. Hope it helps other till the xsd is published by media wiki.

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" targetNamespace="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:export-0.5="http://www.mediawiki.org/xml/export-0.5/">
  <xs:element name="mediawiki">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="export-0.5:siteinfo"/>
        <xs:element maxOccurs="unbounded" ref="export-0.5:page"/>
      </xs:sequence>
      <xs:attribute name="version" use="required" type="xs:decimal"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="siteinfo">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="export-0.5:sitename"/>
        <xs:element ref="export-0.5:base"/>
        <xs:element ref="export-0.5:generator"/>
        <xs:element ref="export-0.5:case"/>
        <xs:element ref="export-0.5:namespaces"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="sitename" type="xs:NCName"/>
  <xs:element name="base" type="xs:anyURI"/>
  <xs:element name="generator" type="xs:string"/>
  <xs:element name="case" type="xs:NCName"/>
  <xs:element name="namespaces">
    <xs:complexType>
      <xs:sequence>
        <xs:element maxOccurs="unbounded" ref="export-0.5:namespace"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="namespace">
    <xs:complexType mixed="true">
      <xs:attribute name="case" use="required" type="xs:NCName"/>
      <xs:attribute name="key" use="required" type="xs:integer"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="page">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="export-0.5:title"/>
        <xs:element ref="export-0.5:id"/>
        <xs:element minOccurs="0" ref="export-0.5:redirect"/>
        <xs:element minOccurs="0" ref="export-0.5:restrictions"/>
        <xs:element maxOccurs="unbounded" ref="export-0.5:revision"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="title" type="xs:string"/>
  <xs:element name="redirect">
    <xs:complexType/>
  </xs:element>
  <xs:element name="restrictions" type="xs:string"/>
  <xs:element name="revision">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="export-0.5:id"/>
        <xs:element ref="export-0.5:timestamp"/>
        <xs:element ref="export-0.5:contributor"/>
        <xs:element minOccurs="0" ref="export-0.5:minor"/>
        <xs:element minOccurs="0" ref="export-0.5:comment"/>
        <xs:element ref="export-0.5:text"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="timestamp" type="xs:NMTOKEN"/>
  <xs:element name="contributor">
    <xs:complexType>
      <xs:choice minOccurs="0">
        <xs:element ref="export-0.5:ip"/>
        <xs:sequence>
          <xs:element ref="export-0.5:username"/>
          <xs:element ref="export-0.5:id"/>
        </xs:sequence>
      </xs:choice>
      <xs:attribute name="deleted" type="xs:NCName"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="ip" type="xs:string"/>
  <xs:element name="username" type="xs:string"/>
  <xs:element name="minor">
    <xs:complexType/>
  </xs:element>
  <xs:element name="comment">
    <xs:complexType mixed="true">
      <xs:attribute name="deleted" type="xs:NCName"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="text">
    <xs:complexType>
      <xs:attribute name="bytes"/>
      <xs:attribute name="deleted" type="xs:NCName"/>
      <xs:attribute name="id" type="xs:integer"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="id" type="xs:integer"/>
</xs:schema>
Advertisements

Other Notes from migration to Spring 3

April 4, 2011 Leave a comment

Read my previous blog to see why I got here…

Here are some other things I tripped over when moving the project to Spring 3.0

If your project gets the following exception ..

java.lang.ClassNotFoundException: org.springframework.web.struts.ContextLoaderPlugIn

.. you are probably using Struts 1.1 and not including the struts dependency from Spring.
Include this in your maven dependencies and you should be good to go.

<dependency>
  <groupid>org.springframework</groupid>
  <artifactid>spring-struts</artifactid>
  <version>3.0.0.RELEASE</version>
</dependency>

Struts 1.1 dependency was removed in Spring 3.0 but was reintroduced later in deprecated form. Read more about it here.

If your project get the following exception…

java.lang.ClassNotFoundException: org.springframework.web.context.ContextLoaderServlet

… you were using Spring 2.3 or lower according to this entry.

 

Use ContextLoaderListener instead of ContextLoaderServlet.

Remove the servlet entry from your web.xml for ContextLoaderServlet and add the following listener…

<listener>
  <listener-class>org.springframework.web.context.ContextLoaderListener</listener-class>
</listener>

Maven dependency tree

April 4, 2011 Leave a comment

I was recently moving our project from Spring 2.5.6 to Spring 3.0 when I encountered a ClassNotFoundException. The jar containing the class was present in the war and everything pointed to a jar conflict.
I tried to search in Eclipse but could not find any reference to any other version of Spring. The following command made my life easy …

mvn dependency:tree -Dverbose

Details can be found here .

It gave me a list of the dependencies in a nice tree format and the culprit was found. It was Spring 2.0.7 jar that was being loaded from a module that I had not imported in Eclipse.

[INFO] |  |  +- com.ostermiller:ostermillerutils:jar:20041102:compile
[INFO] |  |  +- (log4j:log4j:jar:1.2.13:compile - omitted for duplicate)
[INFO] |  |  +- (commons-collections:commons-collections:jar:3.2:compile - omitted for duplicate)
[INFO] |  |  +- <strong>org.springframework:spring:jar:2.0.7:compile</strong>
[INFO] |  |  |  \- (commons-logging:commons-logging:jar:1.1:compile - omitted for conflict with 1.0.4)
[INFO] |  |  +- (org.hibernate:hibernate:jar:3.2.1.ga:compile - omitted for conflict with 3.2.6.ga)
[INFO] |  |  \- net.sourceforge.jtds:jtds:jar:1.2:runtime
[INFO] |  +- (log4j:log4j:jar:1.2.13:compile - omitted for duplicate)

Hope this helps someone who runs into a similar problem.

Technorati Tags: , , , , ,

Categories: Eclipse, Java, Maven, Spring

Dynamic Forms with Map-backed ActionForms in Struts 1

September 14, 2010 4 comments

A project that I work on uses Struts 1 and I came across a problem where I intended to create a page that could have any number of text fields (number of fields would be increased by a button click). The idea was to have a way to have name – value text input boxes whose content would be saved in a properties database table when the form was submitted. After some blind google searches I came across “Map-backed” and “List-backed” properties in struts. This write-up is a summary of using Map-backed properties to achieve that goal. This article does not show how to save the values in the database however… it just logs the values received on the server.

As a first step we will create a blank Struts application with maven. Look at this article to create a blank application.
The next step is to create a page that will have our dynamic form. Here is the page(dynamicTextEntry.jsp)..

<%@ taglib uri="http://struts.apache.org/tags-bean" prefix="bean" %>
<%@ taglib uri="http://struts.apache.org/tags-html" prefix="html" %>
<%@ taglib uri="http://struts.apache.org/tags-logic" prefix="logic" %>

<html:html>
<head>
<script type="text/javascript">
	function add(){
		var element1 = document.createElement("input");
		var element2 = document.createElement("input");
		var elementCount = parseInt(document.getElementById("count").value);

		element1.setAttribute("type","text");
		element1.setAttribute("value","Enter Name");
		element1.setAttribute("name","value(name"+ elementCount +")");

		element2.setAttribute("type","text");
		element2.setAttribute("value","Enter Value");
		element2.setAttribute("name","value(value"+ elementCount +")");

		var spanBody = document.getElementById("textBoxes");
		spanBody.appendChild(element1);
		spanBody.appendChild(element2);
		var breakElement = document.createElement('br');
		spanBody.appendChild(breakElement);
		
		document.getElementById("count").value = elementCount + parseInt(1);
	}
</script>
<title>Dynamic Entry Form</title>
<html:base/>
</head>
<body bgcolor="white">

<html:form action="setDynamicTextEntry.do" >
<html:hidden property="count" styleId="count" value="0"/>
<input type="button" value="Add Field (+)" onclick="add()"/>
</br>
<span id="textBoxes"></span>

<html:submit property="submit"> Submit</html:submit>

</html:form>
</body>
</html:html>

We will create two actions in struts-config.xml as follows… the action class is shown later…

	<action
            path="/dynamicInputs"
            forward="/pages/dynamicTextEntry.jsp"/>
		
	<action
            path="/setDynamicTextEntry"
            type="com.wordpress.codesilo.controller.SetDynamicTextEntryAction"
            name="dyamicTextEntryForm"
            scope="request"
            validate="true"
            input="/pages/dynamicTextEntry.jsp">
            <forward name="success" path="/pages/success.jsp"/>
        </action>

We will also add form bean to struts-config.xml

<form-bean name="dyamicTextEntryForm" type="com.wordpress.codesilo.model.DyamicTextEntryForm"></form-bean>

The action class is as follows…

public class SetDynamicTextEntryAction extends Action {

	
	@Override
	public ActionForward execute(ActionMapping mapping, ActionForm form,
			HttpServletRequest request, HttpServletResponse response)
			throws Exception {
		
		DyamicTextEntryForm dynform = (DyamicTextEntryForm)form;
		Map dynformValues = dynform.getValues();
		int count = dynformValues.size()/2;
		
		System.out.println("Map Size: " + dynformValues.size());
		for(int i =0; i<count ; i++){
			String name = (String)dynformValues.get("name"+i);
			String value = (String)dynformValues.get("value"+i);
			System.out.println("Name:" + name + " Value:" + value);
		}
		return mapping.findForward("success");
	}
}

The form is as follows…

public class DyamicTextEntryForm extends ActionForm {

	private final Map values = new HashMap();
	private int count;
	
	public int getCount() {
		return count;
	}

	public void setCount(int count) {
		this.count = count;
	}

	public Map getValues(){
		return values;
	}
	
	public void setValue(String key, Object value){
		values.put(key, value);
	}
	
	public Object getValue(String key){
		return values.get(key);
	}
	
}

Add a success page that is forwarded to once the form is submitted and we are ready to try out the code.
The page looks as follows… (Every time to click the add button a new pair of text boxes are added)
When we add/change text in the boxes and click submit, the values entered in the text boxes are displayed in the console.


Explanation:
The javascript in the jsp adds text boxes to the existing dom in pairs. The elements added would look like the following when rendered in HTML

<input type="text" value="Enter Name" name="value(name0)">
<input type="text" value="Enter Value" name="value(value0)">

Every time the button is clicked the count increments and new text boxes are added. So, on the second click the following elements are added…

<input type="text" value="Enter Name" name="value(name1)">
<input type="text" value="Enter Value" name="value(value1)">

Once the form is submitted, the setValue(String key, Object value) on the form is called for each of these text boxes. The key will be the names we provide (name0, value0, name1, value1 etc) and the value is the value from the text boxes.

Rest is self explanatory. 🙂

References:
http://struts.apache.org/1.x/userGuide/building_controller.html
http://www.manning-sandbox.com/message.jspa?messageID=26953

Ruby IDE : Redcar

July 30, 2010 Leave a comment

It has been sometime that I have been using Ruby and RoR and I have always felt a void when it comes to a good IDE (I use Windows and Ubuntu). I have tried RadRails, Komodo, e editor to name a few. Last year I tried Redcar and liked it over all others. It had problems(and it was not supported on Windows) but there was a lot of active development going on and it looked promising. Then, for the next few months I never got back much to Ruby and never looked for an IDE. Last night I happened to google again and take a look at Redcar. I was amazed at the changes. It had almost all the things that I needed. There had been development to support it on Windows as well. The installation was as simple as running 2 commands..

gem install redcar

redcar install

I created a shortcut on my desktop to call the command using a cmd script and used one of its png images(<ruby-home>\lib\ruby\gems\1.8\gems\redcar-0.3.8.4\plugins\application\icons) as an icon image. That is all the work I needed to do. Best of all it, runs on GPL. (Nothing better)

I decided to write this on my blog as an effort to help Redcar gain more audience. Apple users might call it a rip off or a clone of Textmate but I think Daniel Lucraft has done an excellent job.

Categories: IDE, Ruby Tags: , , , ,

AJAX using JQuery

July 12, 2010 Leave a comment

We follow the same steps as in the previous blog entry except that we use a different javascript framework this time. The only thing that changes is the javascript in the code.

The javascript function now changes to…


$.ajax({
url: "../upperCaseSubmit.do",
type: "GET",
data: $('#textForm').serialize(),
cache: false,
success: function (response) {
$('#result').html(response).fadeIn('fast');
$('#result').html(response).fadeOut(2000);
},
error: function(){
alert("Error in processing the request");
}
});

Note: I have included some animation (fadeIn and fadeOut) which is not there in the code I created with Prototype.
Ajax.Responders in Prototype can be replaced by Jquery’s Ajax Events.

Basic AJAX using Prototype

July 12, 2010 Leave a comment

We start by creating a basic maven project with Struts. The idea is to accept text in a html text box, make a round trip to the server , capitalize the text (upper case it) and send it back as the response to the client. Our action class UpperCaseAction will have the following code in the execute method. The UpperCaseForm will have just one String attribute , text.

UpperCaseForm upperCaseForm = (UpperCaseForm)form;
upperCaseForm.getText().toUpperCase();
PrintWriter pw = response.getWriter();
pw.write(upperCaseForm.getText().toUpperCase());
pw.close();


We will create a jsp with a simple text box and a button. The onclick event on the button will fire a function in javascript that will make the Ajax call. The following is code uses prototype.js that makes that call

new Ajax.Request('../upperCaseSubmit.do', {
 method : 'get',
 parameters: $('textForm').serialize(true),
 onSuccess: function(response) {
 $('result').innerHTML = response.responseText;
 },
 onFailure: function(){
 alert('Error in processing request ...');
 }
 });


The code is self explainatory. It makes a call to the action upperCaseSubmit using ‘GET’. We have one parameter that will be sent in the request, the text field. If the call is a success the function defined by onSuccess is executed and we replace the innerHTML of a div with the upperCase of the text we received back.
If the call fails, an alert message is displayed. The serialize method formats the form data into a string.

Prototype has some listener functionality that we can utilize to know when a AJAX call has been placed.
Using the following code, we can turn on an image if at least one AJAX call is pending.

Ajax.Responders.register({
 onCreate: function() {$('loading').show();} ,
 onComplete: function() {
 if(0 == Ajax.activeRequestCount)
 $('loading').hide();
 }
 });


We can easily create a loading gif that we can use in the image tag from here. http://ajaxload.info/