PAFIS-2002-SFIS

SFIS 2002 Workshop 2: Data mining

Introduction

The purpose of this workshop is for you to learn about data mining applications in different businesses. The deliverable is a hypertext project paper, or essay, written according to the instructions given here.

Please choose one of the following subjects, according to personal preference, and write a hypertext project paper on it. Include links to relevant resources on the web, illustrate with diagrams as appropriate, etc. You should aim at a paper of around 2-3000 words (equivalent to about 6-10 pages in standard seminar format).

As you can see, the subjects are, in effect, different industries or contexts where data mining can and has been applied. Your paper should deal with the opportunities for data mining - what questions could be asked, and answered, in this industry or context, given the type of data that can be collected and made available? What tools, algorithms, and methods might be more useful, and which ones less? What has, in fact, been done? What could be done in the future? To answer these questions you will have to turn to sources outside the course literature.

Your primary sources would be the Hanken library databases - the full text databases are listed at the end of the page - and the web; a Google search will quickly turn up the main portals and collections. Of course, the Han & Kamber textbook, and other textbooks, might help, too. In case you run into insurmountable trouble, or just get stuck in a less dramatic manner, please feel free to drop in on Oana or Anders to get advice.

Please remember, whatever subject you choose, to provide annotated links to useful sources of further information on the web! But also, please note carefully, right at the outset, that you are expected to write your project paper yourselves – individually, or in groups of two. Copying and pasting material into your pages, except in the case of limited and carefully marked and attributed quotes according to good scientific practice, is absolutely not allowed. Also, generally speaking, copyright rules will almost always prevent you from making copies of interesting papers available from your own web site, so please do not do this.

Deliverable

The deliverable for this workshop is a hypertext project paper, as specified above.

Deadline

The main deadline for this workshop is in six weeks, i.e., Wednesday, April 2nd, 2003. Please try to make it! The workshop submissions will be graded according to the state they are in as per that date. However, for people with serious timetable troubles, there is a second opportunity. The workshops will be graded again on May 30th, 2003 – but according to a stricter scale. As a rough guide, you can expect that the same submission would, in May, get only 50-70% of the points that it would have got in March. In effect, you will get better points with less effort by making the first deadline – but if it is not possible, your workshop need not be a total loss, because there is that second chance.

Grading

Your submission will be graded on a a scale of 0 to 10 points, according to how well it is done. Graphics, layout etc. will not really count, always provided that your paper is easily readable. The number and quality of sources, the quality of the analysis and exposition, and the degree of original (but valid!) thinking really will count.

You are free - and strongly encouraged - to complete and submit the deliverable in groups of two people. Just make sure that you both link to your submission from your respective home pages, even though you need only prepare one set of files.

We remind you, as we like to keep doing, about the ground rules regarding cooperation and collaboration: It is perfectly all right to share ideas, discuss, and plan together, have a look at how other people are doing things, and so on and forth. All of these are, in fact, very good ideas indeed. But it is not all right to copy and paste - we do expect that every submission is prepared by the two (or one) people submitting it. The different groups must not submit different versions of the same file(s) - you are supposed to write them separately. If you have any questions whatsoever about this policy, please ask the faculty directly. For the record: people who copy other people's efforts and submit them as their own will not receive any points, but instead a public warning.

Questions?

If you need help, please try to cooperate – ask your colleagues, help your colleagues! The easiest way to ask questions is probably by e-mail: oana.g.velcu@shh.fi. Good luck!

http://www.pafis.shh.fi/

info@pafis.shh.fi