For a while now, I've been trying to understand what role item descriptions found in the XML feed of your blog should play. Should they contain the full body of your post? A part of it? If so, how long should this part be?
An obvious (yet possibly, an over-simplistic) idea is: the description should be sufficient to let an average reader decide whether to read the full post or not. So, if I'm using an RSS aggregator but value my time enormously, this is what i might want.
A tougher question is, should the description be sufficient so as to let an average classifier successfully classify the entry? I have this feeling that the answer should be "yes" to this question also. It is all about automatic processing. And today, you can pretty much forget about having a bot clicking through to the "actual" page where the entry resides: parsing some of these pages is near impossible. Try Wired, for instance...
Anyone care to comment?
By the way, John Ko has the slides from our JavaOne presentation posted on his blog. The demo code should be there shortly as well.
Another interesting sentiment that I recently heard is that programming Web sites in Java is hard because it involves threading, and threads are hard. I would like to know more as to what the roots of this argument are; but here's a few obvious things to consider:
(a) If you program in JSPs or Servlets as you are expected to, you have no reason to even worry about threading (so the fact that JSPs and Servlets are multithreaded adds no complexity to the application). Recall that your stuff inside JSPs gets mapped to a method call when they are compiled into Java code. By default, this means no static class variables. You can, of course, have static blocks in JSPs, but you need to be an experienced JSP developer to know how to do that (which hopefully ensures that you know what you're doing). With Servlets, it's a bit easier to create static class variables, so you need to pay more attention. But it's not like you have to manage the entire life cycle of the threads; the web container takes care of it for you. So, I don't really see how servlet or JSP applications get more complex because of threading.
(b) Struts is a bit more complex, true, but you don't really have to use Struts, especially if your focus is on performance. I am not saying here that Struts is a bad idea, of course, I am just saying that the framework carries an additional complexity along with it, and it therefore expects a higher level of skills from its exploiter.
(c) There's this related argument that Java supposedly scales through threading, that it pushes you to use one gigantic JVM and so on. Again, I don't fully understand the argument here. The size of the thread pool that you use in your web container (Tomcat, Resin, whatever) is a configurable parameter. So from the thread perspective, your appserver JVM has just as many threads as you want it to have. If, for whatever reason, having just one JVM serving requests out of the physical box is not optimal (which is usually caused by poor quality coding, by the way), it is very easy to vertically clone appservers. This gives you multiple logical appservers (or JVMs) on the same physical box, is supported by practically all appservers, and does not cause problems in most cases. Someone mentioned inter-JVM communication. Vertical cloning typically does not require inter-JVM communication because of session affinity, which again is widely supported today. You may wish to have this communication when you are maintaining a large object cache (which you probably should be doing if you are Friendster), but you'd also want it anyway with horizontal cloning.
Anyways, I would appreciate it if someone could elaborate on what they meant in regards to these threading issues. Clearly, Java threads are lots easier to code to than, say, pthreads. And of course, having a thread pool is faster than forking a new process each time. If you want to beat Java threads, you need to code with pools of pthreads, and I suspect, this is what you will have to do sooner or later in order to support a PHP tier for a site like Friendster. Isn't this what Yahoo! had to do for their middle-tier?
So, according to this article, Friendster is now powered by PHP technology. There
appears to be a lively discussion on the subject, which has taken on somewhat religious overtones...
Jeff Moore has a better overview in his article, which includes some links to the use of PHP at Yahoo! Interesting stuff.
Fairly recently, I was the lead architect for IBM's Intranet portal, which has about 200,000 registered users, and about 100,000 dynamic hits per day. We did that site with JSPs; it can easily run on a single quad 1.5 GHz box. It actually can handle up to about 20 hits/s/server and could be further tuned to well over 100.
While it is possible that we are dealing here with some kind of nosebleed territory traffic that JSPs can't handle, I don't think it is likely. Of course, one
could resort to C ultimately, but we've been down that route and we know where it leads and what its advangates and disadvantages are.
I think the question is that of architecting for performance. You can do that with JSPs and Java, and you can do this with PHPs and/or C++ or C. When you have a complex dynamic site that is likely to be adding features, you have to factor these additional considerations of extensibility, maintainability, flexibility into your architecture. Further, a three-tiered system with smart object caches can almost always outperform a two-tiered one, where business logic is pushed into the database. I think eventually the discussion boils down to whether Java is the most convenient and powerful language to design and write your system in. And if it's not, then many of us are in big trouble.
Probably the single most surprising thing about the recent SuperNova conference for me was the number of Mac laptops there. Not everyone had them; but it seemed that almost all the "cool" people did: Ross Mayfield, Judith Meskill, Valdis Krebbs... So I thought, well, my T40 is all black and it doesn't have an apple that glows in the dark on its cover, but it has 2 gigs of memory.... From the purely virtual reality standpoint, my Thinkpad is adequate. See, if all I care about is what my screen looks like and what this laptop allows me to do, like how fast it is, etc., then the Thinkpad works just fine.
I suppose the Apple laptops tell us, well, you can't actually discard the "real" reality, you geeks, so it does matter what your laptop looks like even when it's not turned on. It's about augmented reality, not the virtual one.
Yet, there's still a gap between, say, personal jewelry and an iBook. Diamonds are forever. In contrast, you'll own the same iBook for maybe a couple of years. Maybe a company like Movado would be able to figure this out.. Maybe we should start embedding our laptops in some kind of jewelry casings, so you could just take any laptop and fit it inside?