Archive for June, 2008|Monthly archive page

Bwana Net and Cluvi (possible improvements and comparison)

When we were asked to do a final project for this subject, we started to think about doing something related to the translation corpus. When we were told that translation could not be done, we decided to compare two different corpus: CLUVI and Bwana Net. An article in this blog about CLUVI is already done, so now I will try to introduce Bwana Net to the readers and then make a final analysis on how they can be improved and a brief comparison of them two.

What is Bwana Net?

BwanaNet is an interface developed at the IULA that allows to query the Technical Corpus(CT) of the Institut via Internet. The CT is indexed using the Corpus Workbench, a set of tools developed at the Institut für Maschinelle Sprachverarbeitung of the Stuttgart University.

With BwanaNet people can consult the CT-IULA documents. These are the steps to follow:

1. Select the language document.
2. Select if you want to do a monolingual or a multilingual consult.
3. Select the documents
4. Define the kind of consult
5. Define the consult
6. Visualize the results

One of the good points of this site is that you can choose the domain and the sub domains, in order to have a more specific and close search.

The problem with this corpus, and one of the main differences between this and Cluvi is that one you search what you have chosen the results are quite different.

How can they be improved?

Analysing the side and at the first sight we have realised that the Bwananet side first page hasn’t got clear instructions for the searching. We have a brief description in the main page of what is it based on and the history of the creation.It offers a link to the IULA that describes the project of the corpus while at the first sight of the CLUVI site we can see a more clear presentation of what they offer and a description of all the kind of translations that the user can try.

Bwananet offers the possibility of visiting its page through three languages : English, Spanish and Catalán. CLUVI offers only two: Galego and English; both Corpus offer very few resources in what is referred to language a very few amount of people are able to use it because of the language.Both of them offer a description of the project and the searches you can do and find. CLUVI has got a better and clear description of the information and the way to use it.

CLUVI offers a clear and listed description of all the different languages and translations that can be used while the Bwananet site does not make a very deep description of the possibilities that offers.

Both CLUVI and Bwana Net offer good results; they offer different languages, different kinds of texts (original, translated, legal, literary, etc.). While CLUVI is easier, Bwana Net is more specific.

A brief comparison:

At the end of the researching what we find is that both sites are quite different one from the other. Let’s see some common aspects:

- They both offer multilingual searching.

- They both offer specific research in very specific areas and contexts.

Let’s see now some differences:

- Bwana Net requires more time and a more specific knowledge and searching.

- Cluvi offers more information about the context and the place the information is taken from.

- Bwana Net is much more specific than Cluvi. They offer a closer research and more options than Cluvi.

 

Bibliography:

Our Group’s Project
Bwana Net
CLUVI

 

Corpora – different English corpus we can find surfing on the net

First of all we have to understand what a Corpus is. I am not going to repeat again what is it exactly because I did, long time ago, a post related to Corpus and what were they. What I am going to do in this next article is to introduce the readers to some Corpus pages that you can find on the net, from the British ones to the American ones. This will be a short walk amongst them, explaining the neccessary concepts of its of them. Let’s see then:

From the United Kingdom:

BRITISH NATIONAL CORPUS

The website present the site as: “The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of current British English, both spoken and written”. There are two parts: the written one, that covers the 90%, and the spoken one, that covers the 10% of the whole website. While the written part includes extracts of regional and national newspapers, magazines and so on, the spoken one consists of orthographic transcriptions of unscripted informal conversations. The BNC is monolingual, synchronic, general and sample.

ASK OXFORD

The Ask Oxfor corpus was created by the owners, or makers, of the Oxford University Press and Ocford English Dictionary. As they themselves explain in their site: “The Oxford English Corpus is central to the process and to Oxford’s £35 million research programme – the largest language research programme in the world”. It also contains two billion words from literary resources and they give all kinds of information in the documents: genre, author, publication, and so on. A quite complete corpus we may say.

COLLINS

Collins WordbanksOnline is an online Internet service for accessing language data based on the Collins corpora of modern, written and spoken text. There are a lot of things you can do: access examples of vocabulary and grammar, check concordances; prepare handouts, worksheets, linguistic analyses; check idioms and problems of every kind (grammar, vocabulary, false friends, etc.), synonims, variants between the British English and the American English.

From the USA:

American National Corpus

The site describes itself as: “The American National Corpus (ANC) project is creating a massive electronic collection of American English, including texts of all genres and transcripts of spoken data produced from 1990 onward”. Of course, and as we can image, it is related to American English, since we know that British and American English are quite different, not only because of pronunciation or lexicon (among other things) but also because of the culture surrounding the language. This site is not as complete as the British one, because it is only on its beginnings, but surely it will developed as much as the British one.

Well, there are much more corpora that we could analyse in here, but I just wanted to show some of the most important ones, as they can be the ones that we can use in the future, since we are supposed to be English Philologists as soon as we finish our studies.

Bibliography:

BNC (accessed June 22, 2008 )
Ask Oxford (accessed June 22, 2008 )
Collins (accessed June 22, 2008 )
American National Corpus (accessed June 22, 2008 )

 

del.icio.us

During the course we have been using a page called del.icio.us. It was present to us as a place where we could save all the links of the sites we were interested in. You just had to log in (after making an account) and start posting the links and saved them. But, more accurately, what is del.icio.us?

Del.icio.us is a social bookmarking web service for storing, sharing, and discovering web bookmarks. The site was founded by Joshua Schachter in late 2003 and acquired by Yahoo! in 2005. It has more than three million users and 100 million bookmarked URLs.

After posting the link, the site will require you some information about that link. The description of the site (the name of it), and what is more important, tags, also known as folksonomy. Each link will have its own tags in order to, afterwards, save them in specific bundles and have a quick entrance and search of the links. Del.icio.us also has a hotlist that on the home page, as well as a “popular” and “recent” link pages, which help to make the website a conveyor of internet memes and trends.

The use of this site is absolutely free and its development has grown hugely. If you want to save your links you just have to link them on del.icio.us and there you have them. Apart from that this site also offers the possibility to see other’s peoples links and the related links that may interest you when you post one. This is it, you could see, after saving a site, other sites that may interest you, that are similar to the one you have just posted. This occurs because of the folksonmy-tags tool, which as you can see is useful and quick.

Problems of too many bookmarks saved on your computer are solved: del.icio.us is there for us, giving a service that is quite useful because you can save your link and because you can find so many interesting pages as I have found myself. We have been using it for our lessons, but I’ll continue using it afterwards because it has so many good point and all the bookmarks I am interest in are there, saved and ready to be seen.
 

Bibliography:

http://es.wikipedia.org/wiki/Del.icio.us 

(Last retrieved, 22nd June 2008, at 11:51)

Follow

Get every new post delivered to your Inbox.