2009-06-25

Roadrunner supercomputer remains on top

The list of the top 500 supercomputers was updated and released this week by www.top500.org

Roadrunner remains on top and Cray's Jaguar retained the number two spot.

Both of these systems are powered by AMD Quad-Core Opteron processors.


Rodarunner is clocked at 1.105 and Jaguar at 1.059 petaflops or 10^15, floating-point-operations per second.

Some key statistics include the following:

  • Processor Architecture. Intel and AMD account for 88% of the listed systems: 442 = 399 + 43 or 88%.
  • Operating System. Linux is at 443 systems or 89%, Unix at 4% and MS Windows at 1%
  • Vendors. For the first time Hewlett Packard, HP, surpassed IBM in the number of installed systems. HP has 212 or 42% and IBM 188 or 38%.

2008-09-02

Does the Internet need another browser?

Do we need another browser?

Yes; the Internet needs a well defined standard and reference implementation adopted and contributed by all.

The browser, possibly the most used software of all times, needs a standard to address the several and varied functions it evolved to from the early html-rendering days.

Google is introducing today more than a software product; it is a proposed common platform, reference design and implementation to collectively work and evolve a modern browser.

Key properties of Chrome include the following:

  • WebKit. It uses a well known and well tested web rendering engine, WebKit, used by Safari, Nokia, Google's Android and several others.
  • New Javascipt engine, V8. This is a development on its own, a new JavaScript engine develop by Google's Denmark group. Reportedly, benchmarks show it at the fastest JavaScript engine so far along with the redesign features to enable multi-process and multi-thread processing for browser processes: tabs.
  • Task isolation. Tabs and Plug-ins run on assigned processes. One tends to have tens of tabs open; if one crashes it generally crashes the entire application. Security and process isolation are also welcome byproducts of this design.
  • Offline mode built-in. Gears is built-in resulting in offline operation for applications that use this capability; Gmail, Docs, etc, are likely to be the first applications to use it.
  • Open sourced. Means that FireFox, Safari, Opera, and yes, IE, can incorporate some or all the technology proposed by Google; everyone wins; a standard is needed and one is available for test and development.
Whether Chrome's technology is used by other browsers or not, Google has the cloud and now the cloud's client resulting in a formidable software combination through which to deliver services independent of platform and form factors as Chrome shows up on MS Windows, Macs, Linux, Android, and other wired and increasingly wireless units.

One can easily conceive an upcoming Chrome net-book, named cbook, gbook, Android-book or any other name likely starting with g, running, you guessed it, Chrome, as the graphical user interface and Android as the OS.

References:
  • Google's blog entry re subject is found here.
  • Chrome's comic doc-book.
  • Nicholas Carr has written a blog-entry defining it as the Cloud's Chrome lining.

2008-07-19

parse-cmd: A simple command-line Java parser

Wikipedia defines parsing as follows:

"
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a sequence of tokens to determine grammatical structure with respect to a given (more or less) formal grammar.

A
parser is thus one of the components in an interpreter or compiler, where it captures the implied hierarchy of the input text and transforms it into a form suitable for further processing (often some kind of parse tree, abstract syntax tree or other hierarchical structure) and normally checks for syntax errors at the same time.
"

When writing a Java console application, or an application responding to an input stream, without using a formal parser such as ANTLR - ANother Tool for Language Recognition, one ends up writing custom code each and every time; a generic approach to define and parse few application specific commands is needed.


A quick search on Google shows several tools to handle command-line options but generally I found these solutions complex. Access to a generic and simple parser was the beginning of parse-cmd.

Should you need a simple, less than formal, one-class Java command-line parser, take a look at parse-cmd.

An equivalent implementation using Scala of a command-line parser is available here.

2008-07-03

Roadrunner: a Linux-based petaflop computer

On June 18 the updated list of the top 500-supercomputers was published.

This time the list includes the first petaflop computer, 10 to the 15th floating point operations per second, one thousand teraflops, for a computer built by IBM for the US government.

The introduction of Roadrunner is one more example of the drive away from proprietary technologies towards use of thousands of commodity components clustered and managed by Linux.

This architecture for supercomputer solutions is strikingly similar to that used by Google, Amazon, Yahoo et al to power their services.

Note that IBM did not use its processor technology, Power. Instead, it used the combination of AMD's Opteron and the Cell technology, developed by IBM and Toshiba for Sony's PlayStation 3.

Comparing the updated list to that of June 2000, offers these points.

  • Use of commodity components. Roadrunner uses the processor in Sony's Play Station 3 and AMD Opteron chips. Today 86% of the top 500 supercomputers use conventional Intel/AMD technologies with close to the remaining 14% using IBM's Power. This contrast to using close to 100% RISC-based hardware back in 2000 when there were 4 Intel-IA-32 based computers. The share of IBM's Power went from the top technology offering in 2000, with 143 computers, down to 23 in the current top-500 list.
  • Linux. Roadrunner and 92% of the top 500-supercomputers use Linux for OS compared to Unix at 90% back in 2000. Now, Unix is represented exclusively by AIX with 23 computers or 4.60%.
   +------------- Top-500 Supercomputers ----------+
+------------- Operating Systems ---------------+
Date OS Count Share %

2008.June Linux 460 92.00 %
AIX 23 4.60 %
Mac OS 2 0.40 %
Windows 2 0.40 %
Other 13 2.60 %

2000.June Unix 453 90.60 %
Linux 28 5.60 %
BSD Based 17 3.40 %
N/A 2 0.40 %

+------------- Processor Family ----------------+
Date Processor Count Share %

2008.June Intel EM64T 356 71.20 %
AMD x86_64 107 21.40 %
Power 68 13.60 %
AMD x86_64 55 11.00 %
Intel IA-64 16 3.20 %
Intel IA-32 3 0.60 %
Cray 1 0.20 %
NEC 1 0.20 %
Intel+AMD 430 86.00 %

2000.June Power 143 28.60 %
Sparc 122 24.40 %
MIPS 62 12.40 %
Alpha 56 11.20 %
PA-RISC 53 10.60 %
NEC 25 5.00 %
Fujitsu 19 3.80 %
Hitachi SR8000 10 2.00 %
Cray 6 1.20 %
Intel IA-32 3 0.60 %
Intel i860 1 0.20 %
+-----------------------------------------------+

2008-04-25

Scala: functional programming for Java

Programming is a long and complex effort for projects of any size and consequently we are in a constant quest to find a language to improve productivity: deliver quality results using less time and effort.

I have no statistics about distribution of programming language usage. A thesis is that Java enjoys a lead among languages in use today: Internet, Web Services, finance, telecoms, health, communications, aviation, military, automobile, oil, factory automation, appliances and consumer products, etc.

The topic of alternate languages to Java often brings lively discussions of pros and cons among languages such as Ruby, Groovy, Python, Erlang, Scala, F#, C#, Lisp, PHP, C++, C and JavaScript among several others.


After some reading and limited experimentation with these alternate languages, I come to the conclusion, thesis without proof, that Java is here to stay in the lead. However, Scala profiles as a language that leverages and expands the Java experience, and JVM, and may contribute to an even greater use of Java through Scala.

Scala is a relatively new language; some basic points about Scala include the following:

  • Created. Created by Martin Odersky at L'École Polytechnique Fédérale de Lausanne in 2001.
  • Platform. Runs on the Java Virtual Machine, JVM, and directly uses and produces Java classes, Java byte-code. An implementation for MS .NET is available but appears less developed.
  • Features. The main distinguishing factor of Scala is its support for both object and functional programming models.
  • Object-oriented language. All entities are typed objects. Types are defined by Classes and Traits constructs that offer flexibility in defining class hierarchies.
  • Functional-oriented language. Scala's object orientation mixes with a well defined pattern-based approach to declare what is to be done rather than how; functional versus imperative approach to define algorithms. Erlang is the functional language often mentioned when describing Scala's functional features.
The following points summarize few days experimenting with Scala.

Pros
  • Java. Scala is a natural language progression for Java programmers. Existing libraries can be reused and classes developed with Scala can be incorporated and used in Java.
  • JVM. Scala uses the well tested and cross platform Java Virtual Machine, JVM.
  • Object-oriented. Classes and Traits are designed in a manner that allows flexibility in defining class structures.
  • Typed language. Scala infers type permitting concise expressions, typical of dynamically-typed languages, while type enforcement is done at compile time. Dynamically-typed languages do not require variables to be declared as a specific type; any variable can contain any value or object. Dynamically-typed languages offer great flexibility and concise statements at the expense of undetected potential errors. Scala offers expression conciseness of dynamically-typed languages while the compiler infers and strictly enforces type conformity.
  • Functional-oriented. Scala has an algebraic foundation that as one becomes familiar with it helps understand the less intuitive aspects of the language. Pattern matching and support for higher-order functions, methods accepting functions as parameters, are among the powerful functional features. Thread management under the Actors library appears as a native language feature to the end-user, the programmer.
Cons
  • Scala is new. Although introduced in 2001, Scala has just recently received wider attention. Programmers, students and the rest of us will need much more time to learn, understand and use Scala.
  • Acceptance. Time will tell how Sun, Google, IBM, industry, the Internet and the IT communities accept Scala. A good sign of acceptance would be Google deploying a Scala API for the recently introduced Google Application Engine, which offers a Python only implementation now.
Included below is a quick-sort method as shown in Scala by Example.
  • Application. Defines SortTest as a console application.
  • Method sort. Defines method sort that takes a list of integers, xs, and returns, results is what Scala calls it, in a list of integers, def sort(xs: List[int]): List[int] =
  • Check input list for length 1 or empty. The method results the input list, xs, should it be of length of 1 or less. return statement can be used; it is often unneeded in Scala.
  • Define pivot. Defines a val object assigned to the value of the element at position xs.length / 2 of the input list, xs. Scala differentiates between 'val' and 'var' objects; var objects can be modified; val objects are unmodifiable. Also note that the compiler infers the type for pivot as integer and enforces it. The statement as written appears from a dynamically-typed language since no type is stated. Note how pivot is defined, val pivot = xs(xs.length / 2); the inferenced statement is, val pivot: int = xs(xs.length / 2)meaning a val object of type integer, ': int', assigned the integer value at xs( xs.length / 2)
  • Sort recursively. Defines three sub-lists, less than, equal and greater than pivot, and catenates these lists using Scala's less than intuitive catenate operator ':::' The List object has a method, filter(), that takes a function expression, as shown in, xs.filter( x => x < pivot), and applies that function, x < pivot to all elements of the list, and results (returns) the elements that match the expression.
object SortTest extends Application {

def sort(xs: List[int]): List[int] = {
if (xs.length <= 1) xs
else {
val pivot = xs(xs.length / 2)
sort(xs.filter(x => x < pivot)) :::
xs.filter(x => x == pivot) :::
sort(xs.filter(x => x > pivot))
}
}

// now, define a List, Y, sort it and print
val Y = List(3, 4, 0, 7, 9, -8, 8, -3, 1)
Console.println( sort(Y) )
}

Compile and run as follows:
scalac SortTest.scala
scala SortTest
List(-8, -3, 0, 1, 3, 4, 7, 8, 9)
A second example serves also to show the power and conciseness of Scala. This example is found at the Scala site under the subject: A Tour of Scala: Automatic Type-Dependent Closure Construction.

Here we define a method, myWhileLoop, that takes two parameters, cond, of type function that results Boolean, and body also of type function that results Unit, and the method results an object of type Unit. Unit returns no value, like void in Java, but it represents zero or more lines of code, it is a unit of code. Once defined, the method is used as if myWhileLoop were native to the language or at least it looks so.

The combination of => refers not to equal and greater than characters, but used in Scala to designate a very useful object type, a function. A function type is represented by what is called in Scala a right-arrow; think of it as one symbol instead of two, possibly borrowed from math to denote a function: leads to, becomes, transforms from one equation to another.
object TargetTest1 extends Application {
def myWhileLoop(cond: => Boolean)
(body: => Unit ): Unit = {
if (cond) {
body
myWhileLoop(cond)(body)
}
}

// use 'myWhileLoop' as if defined in the language
var i = 0
myWhileLoop (i < 5) {
println(i); i += 1
}
}

Compile and run as follows:
scalac TargetTest1.scala
scala TargetTest1
0
1
2
3
4
Here is a third example extracted from the Scala site, A Tour of Scala: Mixin Class Composition. It shows the power of Scala to define class hierarchies via class, trait, abstract and with expressions.
abstract class AbsIterator {
type T
def hasNext: Boolean
def next: T
}

trait RichIterator extends AbsIterator {
def foreach(f: T => Unit) { while (hasNext) f(next) }
}

class StringIterator(s: String) extends AbsIterator {
type T = Char
private var i = 0
def hasNext = i < s.length()
def next = { val ch = s charAt i; i += 1; ch }
}

object StringIteratorTest {
def main(args: Array[String]) {
if ( args.length > 0 ) {
class Iter extends StringIterator(args(0))
with RichIterator
val iter = new Iter
iter.foreach(println)
}
}
}

Compile and run as follows:
fsc StringIteratorTest.scala
scala StringIteratorTest abc123
a
b
c
1
2
3

2007-12-31

iMac + Fusion: a great developer's workstation

I've avoided Apple. Apple products are the ultimate proprietary systems; hardware, OS and applications are exclusive of the Apple club in direct contrast to an open source direction I prefer.

It was a Java based project requiring interface to a C-module via Java Native Interface, JNI, that forced me to buy an Intel-based iMac; I had no choice; the project demanded application development and testing not only on Intel-based but also on PowerPC based Macs.

It was a pleasant experience.

The minimalist design approach of Mac products grows on you. Even if the iMac does not work, it is a pleasure to look at it; from an ergonomics and mechanical design perspectives, it is an elegant, simple, functional unit. The aluminum-based keyboard is a good example of excellent ergonomics, minimalist inspiration, Apple-like design.

But it was VMware's Fusion, the virtualization application for the iMac that made the iMac my choice Java and Internet developing station replacing a Vista based laptop.

The combination of Fusion and Leopard's Spaces result in full-screen operating system instances available intuitively via control + arrow-keys. It all works well including instant sleep and restart, effective time synchronization, reliable network persistence, etc, instant on and off when needed quite a contrast to the experience with Vista + VMware's Workstation where 'sleep' and restart did not work or did not work well.

I have now an iMac configured with 4-gigabyte RAM supporting:

  1. Apple's Leopard operating as the host OS. I am unfamiliar with this OS. The console and Unix commands work well and was at home with its operation. Java 6 is not officially supported yet; it is in all guests I have installed. I do not use any of the Leopard tools such as iPhoto and iMovie. If editing pictures or movies I know I have the right environment and I'll try these tools later.
  2. MS Windows 2000 Pro as main desktop development environment. Java 6, GCC, NetBeans 6, JEdit and Ant are the tools that work well in possibly the best OS from Microsoft. I did not have licenses for Vista or XP but I did have an unused Windows 2000 license that operates as a simple, functional and most responsive MS OS uncluttered from the obstructive security alerts and other confetti added to XP and later to Vista that get in the way of productivity.
  3. Ubuntu 64-bit, 7.10, as main Server and development environment. This is the main server environment hosting Java 6, Apache, PHP, Tomcat, MySql, hsqldb, PostgreSql, etc. I could do this work on Leopard but I am familiar with Ubuntu and it works just as responsive and well, possibly better, under Fusion than in a dedicated environment.
  4. Solaris Express Developer Edition 64-bit for general development of the C-module and Java application on a Solaris platform. OpenSolaris is work in progress with the objective to make Solaris easier to use by adopting a Linux-like operation and presentation; it profiles as a good development and server platform.
Each OS has 1-gigabyte RAM. They are all instantly available via control + arrow-keys as mapped by four Leopard Spaces running each OS in full-screen mode.

Each OS has Subversion client installed to access a Subversion server hosted on a dedicated Ubuntu 64-bit environment on the Internet. In this manner, the Subversion server fills the data and version control functions and allows check-out and check-in for projects under work using any of the OSes listed above or a workstation at a client or project office. It nicely separates the tools, e.g. operating systems, from the data, e.g. projects and personal files.

This approach allows me to use one or more environments as workstation development at home or project offices while relying on Subversion for data and project repository. Effectively each OS has only the data for the project under work data which is deleted once project is checked-in.

All is not perfect however. It is mostly a matter of getting used to a new environment rather than functional problems.

Plus
  • Ergonomics. The iMac is an all-in-one desktop computer, simple, elegant, functional.
  • VMware's Fusion. This virtualization tool possibly learned from VMware Workstation for MS Windows and had a fresh clean design and porting it to Leopard resulting in a great implementation hosting different OSes. 'Sleep' and restart works flawlessly including time synchronization with host at restart time. Matching screen resolution with host works well and contributes to the simple and effective transition among OSes via control + arrow-keys.
  • Connectivity. Network settings and overall connectivity, wire and wireless, work well including network printing from guests to defined printer on host OS, Leopard.
Minus
  • Keyboard. PC users will be searching for several keys unavailable on the minimalist wireless keyboard that came with the iMac. The delete key is effectively a backspace key, end, home, page-up and page-down keys are unavailable. There are key-combinations to simulate these; however it takes time to find them and more time getting used to using them productively. Also, some keys have one behaviour under the host OS, Leopard, and different one under Fusion guests. I am now used to these keyboard differences and I am able to operate productively; it is the cleanest looking keyboard; it is functional and I am able, after some time with it, to use it productively in all four operating systems.
  • Mouse. The wireless mouse is the classical one button Mac mouse; I do miss the right-mouse click which is simulated by control + click.
In summary, I will not have selected this environment should not have been for the project requirement to develop and test on an Apple platform. The discovery of Leopard + Fusion made the case for me to switch to this as my primary workstation where I can use any of the four top desktop operating systems as needed while all interacting with the same data and project repository under Subversion.

The result is a proprietary system, iMac + Leopard + Fusion, used as the base platform to host Microsoft and open source OSes, and associated development tools, resulting in a productive, multi-OS self-contained workstation.

I suspect I'll be looking at Apple's MacBook to replicate a mobile equivalent of the configuration described above. The rumored announcement by Apple, on January 15th, of an ultra portable MacBook may be the fully functional multi-OS compact laptop I'd like to use for presentations and for work while away from the office.

N.B.1 MackBook, MacBook Pro and respective successors, hosting Ubuntu via Fusion, profile as compelling alternatives to anyone considering a MS Windows laptop, Win2k, XP or Vista, and/or a Linux mobile unit such as a Dell Ubuntu laptop.

N.B.2 There are at least 3-products that provide virtualization on the Mac: Boot Camp, Parallels and Fusion. This reference offers a benchmark and comparison among these products.

2007-11-13

Android: a story about the phone, Java, Linux and other wars

Google released the API for Android and several first impressions of its impact and contribution to the mobile phone platform are now available.

There are good and not-so-good news.

Various blogs describe well the positive points of Android such as defining a common and relatively 'open' platform for volume development of related applications and services, bringing down the barriers of entry, offering competition on a common infrastructure. A good article describing Android is found here.

These are all good points, timely introduction and needed development to standardize what otherwise is a set of exclusionary and stove-piped technologies vertically aligned across the usual suspects, aka communications cartels.

I installed the SDK first on my Ububtu.x86.64 environment. It did not work. I found that the initial distribution is for 32-bit only; the site should point this out on the download entry.

Under Vista, x86.32, it worked well using Eclipse and standalone via the Ant script generated by the included application generation utility, activityCreator.py.

As I work with the SDK, I like what I see, namely

  • Java. The SDK is Java-based including an Eclipse plug-in.
What concerns me includes:
  • Java. Same language supporting a subset of the JDK. The run-time VM is a custom one, Dalvik, perhaps needed to gain performance on target platforms. Java faces fragmentation and Android is a good example of it. Google's GWT uses a similar architecture and supports also a subset of Java's JDK, a different subset than Android. There is fragmentation for Java even within Google. Perhaps this is the cost of rapid development but certainly they can do better than this. A good article describing Android's Java gambit is found here.
Sun's Jonathan Schwartz noted in a blog entry his dinner invitation, at his place, to Linus Torvalds. It appears now that Jonathan may need to include Sergey and Larry on such invitation, have a good supply of java to digest the vast amount of material accumulated since the initial invite, leave egos at the door and collectively endorse, support and join the OpenJDK community.

A fragmented Java is not in the interest of developers and of the software industry. Unix offers a good example of the perils of fragmentation; it is a good idea avoiding same fate for Java.

2007-11-06

Android: an open platform for mobile communications

The announcement by Google of an open platform for mobile communications marks a milestone in the evolution of the Internet.

The conventional cell-phone is a wireless unit with some data processing capabilities, a phone plus a PDA, as exemplified by Palm's Treo, Apple's iPhone and RIM's Blackberry.

In contrast, Google defines a Linux-based computing platform that can make phone calls.

The difference is significant an offers wide range of options and possible technology combinations by defining a device stack for mobile communications that includes a Linux-based OS, a defined API and associated development tools.

Not much is known since the API will be available on November 12th. What is known can be summarized as follows:

  • Platform. Open source software communications platform named Android. The definition is a software-based API that allows any hardware technology to be able to develop and deploy mobile applications and services.
  • Alliance. Open Handset Alliance, OHA, which includes 34 registered participants. Participants include T-Mobile, Sprint, China Mobile, Telefonica of Spain, Samsung, Motorola, LG, Intel and Texas Instruments.
Not surprising is the absence from the alliance of names such as Microsoft, Verizon, AT&T and Apple.

This move by Google offers also an alternative to proprietary development by proposing instead a common infrastructure for all to participate, use, enhance and compete by delivering value above a common foundation.

The enthusiasm of seen an alliance for an open source communications platform is tempered by the fact that the API and associated development tools are unavailable now. Unclear also is the language, or languages, supported for development.

However, first impressions are that Google has once more outfoxed the usual suspects and proposed an approach for the evolution of mobile communications in a manner and culture earlier responsible for the volume adoption of the Web, Apache, Linux and Firefox.

References
  • Google. Here is an entry in Google's blog re subject announcement.
  • BBC. Q&A article re proposed mobile platform at BBC.
  • API. Android's software development kit, SDK, is available here.
  • Nokia. Nokia does not rule out participating in the OHA alliance.

2007-09-29

Opportunity for Apple to sell Leopard as a standalone OS

Leopard, the next version of Apple's operating system, is to be released in October.

The timing for such introduction highlights the opportunity for Apple to offer the option of a standalone Leopard; e.g. the OS unbundled from Apple's hardware.

Two reasons contribute to an excellent timing for Apple to offer an unbundled operating system.

  • Vista's troubles. Vista's failure to deliver on the expectations created by Microsoft is such that end-users prefer Vista's predecessor, XP.
  • Increased use of virtualization. Virtualization has reached volume adoption and is now a needed function for software development and for other applications on the desktop and on the server. It is conceivable that a year from now consumers may buy computers able to run more than one operating system. In fact such computers may have no operating system other than a virtualization function as part of the hardware, bios or flash configuration.
These developments, problems with Vista and availability, acceptance and use of virtualization, result in a great opportunity for Apple.

Should Apple sell Leopard as a standalone operating system, for installation on a computer or under a virtual environment such as VMware, Virtuozzo, Parallels, Linux-KVM, XenSource, and Sun's xVM, will benefit consumers and Apple.
  • Consumers. Consumers have an alternative to Microsoft even if a standalone Leopard license is supported by Apple for operation on a virtual environment only.
  • Apple. Apple benefits by expanding adoption of its operating system beyond the present niche and enabling agile 64-bit Leopard to run on dual and quad Intel and AMD platforms directly or through virtualization.
Supporting vast combinations of hardware configurations is indeed a problem for any OS. However, offering unbundled Leopard for operation under a virtual environment is an attractive option.

Licensing Leopard for operation under a virtual environment simplifies support significantly given that video, disk, network and other devices are
virtualized, defined by each virtualization tool, thus avoiding support for vast number of devices, associated drivers, and contributing to stability.

We need now to develop in more than one OS. It is not a nice-to-have function; it is needed for several areas including cross-platform development.

I trust Apple will see the benefit of licensing Leopard standalone so I can install it on my mobile development environment: MS Vista, VMware and Ubuntu-64 as guest. I was skeptical, but VMware Workstation, on Vista-32-bit, supports 64-bit guests; it performs well and it is stable.

2007-09-01

Of open source, open minds and open culture

I came across the blog referenced below identifying 10-principles proposed by the author from experience in software work.

It took more than software to evolve towards the Internet, Web, GPS, associated applications and services, the 'open' communications world of today and the results and culture of 'open source'.

Open source is about people, ideas, creativity, tools and technology, working with others through ongoing peer review to achieve results thought impossible in context of time, resources and conventional methods, processes, organizations and management.


The result is communities of dedicated people, members of projects, tasks, applications, and recently corporations, formal and mostly informal, working together productively to define and resolve problems, deliver results, effectively and in a responsive manner.


Often it is not about cost. It is rather about the cost of not doing it, of not delivering results in a timely manner, of missing a window of opportunity, of failing to offer solutions, of failing to offer value, of becoming irrelevant.

Open source brings to mind the potential application of same culture, communication and tools to define and address challenges in fields other than computing where intellectual property, patents and copyrights represent serious obstacles to development.

Here is a definition of Open Source, from opensource.org, that may be applied to fields other than information technology:

Open source is a development method that harnesses the power of distributed peer review and transparency of process.

Areas that come to mind to apply open source methodology and culture include:

  • Education
  • Health
  • The environment
  • Energy
  • Government
I list below the 10-principles referenced in the subject blog.
  • Adoption precedes monetization.
  • Lots of customers is a greater barrier to entry than lots of intellectual property (IP).
  • A business' brand is its greatest asset in driving sales. Not its IP.
  • Lower barriers to evaluating and using your product.
  • Sell customer value, not vendor value.
  • Product use should breed re-use and further sharing.
  • A collective product best serves a collective market.
  • Invest in service and your product, not sales.
  • Transparency breeds trust, and trust breeds revenue.
  • People make a business.
Updates

2007-08-28

GridGain: Grid computing for Java

Grid computing platforms are used more often than we think.

Most supercomputer implementations, Google, Amazon, Yahoo, Microsoft, eBay, etc, are effectively computing nodes networked to operate under software control to schedule, dispatch and manage defined workloads.

These are forms of networked, grid, computing solutions often based on proprietary software.


Among open source grid computing solutions, I came across GridGain, a Java-based library that is simple to install, deploy and use.

In minutes I was able to define few computing nodes and run the examples provided with the distribution.

Should you be looking at a grid computing implementation, have a look at GridGain; salient features include the following:

  • Implemented as an open source library under a LGPL license
  • Java based
  • Simplicity. Simplicity is a property reflected in the design and implementation of the Java library. GridGain is easy to understand and to use. The simplicity theme is evident in the concept, design and documentation. A quote attributed to Antoine de Saint-Exupéry is included in the product page: "A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away."
  • Documentation. Documentation is clear and the product site includes associated information and examples; see product blog here.
GridGain represents also another example of a business model based on an open-source product. GridGain offers support, professional services, education, analysis and development for companies interested in using GridGain.

An emerging list of companies operating on a business model based on open-source products include:
  • GridGain. GridGain is both the company and product offering grid computing solutions.
  • MySQL. MySQL AB develops, markets and offers services for MySQL.
  • Interface21. Interface21 offers consulting services and training for the Spring framework.
  • Canonical. Canonical offers consulting and support services for Ubuntu the Linux distribution.
  • Alfresco. Alfresco is a company offering solutions for Document Management, Collaboration, Records Management, Knowledge Management, Web Content Management and Imaging using their open source product, Alfresco.
  • Openbravo. Openbravo is both the the company and open source product focusing on enterprise management systems, materials requirement planing, sales and customer relationship management often called ERP, MRP, CRM.
  • Liferay. Liferay offers Portal, Journal and Colaboration Suite as open source products.
  • Sun. Sun is a good example of a classic IT corporation redefining its business model largely through the use of an open source strategy, open culture and associated solutions. Examples of Sun's open source direction include OpenSolaris, OpenJFX, OpenJDK, ZFS File System, DTace, NetBeans and GlassFish.

2007-07-21

700 Mhz spectrum: call for an open wireless network

The announcement by the US FCC chairman, Kevin Martin, to call for an open broadband network represents an opportunity within the US and abroad to show support for a policy that benefits all.

Inevitably there are several interests, and well represented in Washington, lobbying for preserving a fragmented, exclusionary, proprietary and closed model as the subject spectrum is reallocated from frequencies currently used by analogue television channels 52-to-69.

It is refreshing to read a call by the FCC advocating a policy contrary to the existing closed model and challenge industry towards offering an open, competitive and universal one.

The existing closed model sees a franchise to the benefit of few selected companies, characterized by a fragmented, exclusionary and proprietary services where units, such as cell-phones, and applications work only within each offering,
country-club like.

In contrast, an open model sees it as a needed common, competitive service to benefit the community for a function that is not a luxury. It is a needed service in todays interconnected world with the added property that devices, phone-sets and applications should plug-and-play as required across competing offerings.


By having industry, interest-groups and ordinary citizens support an open access where any device, any application and any network provider offers phone, gateway to the Internet among other functions, via this spectrum, will ensure competition and universal availability of a common service grandchild of the lessons and experience of the Internet.

It is timely to have the support of Google for the service in the form of 4.6 billion dollars bid and challenge the industry to comply with the characteristics of an open model. From Google's blog re subject, we have the following:

In the U.S., wireless spectrum for mobile phones and data is controlled by a small group of companies, leaving consumers with very few service providers from which to choose. With that in mind, last week, as the federal government prepares for what is arguably its most significant auction of wireless spectrum in history, we urged the Federal Communications Commission (FCC) to adopt rules to make sure that regardless of who wins the spectrum at auction, consumers' interests are the top priority. Specifically, we encouraged the FCC to require the adoption of four types of "open" platforms as part of the auction:
  1. Open applications: consumers should be able to download and utilize any software applications, content, or services they desire;
  2. Open devices: consumers should be able to utilize their handheld communications device with whatever wireless network they prefer;
  3. Open services: third parties (resellers) should be able to acquire wireless services from a 700 MHz licensee on a wholesale basis, based on reasonably nondiscriminatory commercial terms;
  4. Open networks: third parties (like Internet service providers) should be able to interconnect at any technically feasible point in a 700 MHz licensee's wireless network.
It will be equally interesting to have IBM, Microsoft, AT&T, Verizon Wireless, Sprint, et al to support the call for an open common service by the US FCC.

What will be even more important is that this call be an international one from Brazil, to Venezuela, Spain, France, Germany, Russia, India, China, Australia, Canada, etc, to join in a call supported by respective Governments, industry and citizen groups and request a service available everywhere common and universal as the Internet and GPS.

The availability of such a common connectivity spectrum will be of most benefit to inner-cities, displaced, rural and and remote communities. Content can be targeted to these communities to bring education and certification, health services, and monitoring and employment opportunities for police, forests, water quality and levels, roads, weather and environmental sensors, should be of special interest to countries with vast remote areas such as Canada, Russia, China, Brazil.

For Canada this service should be of special interest to the Federal Government to bring participation and presence to the north along with services to over 600 remote and native communities across the country. Contact Industry Canada and the CRTC and your local Provincial and Federal government representatives to solicit their view and support for an open common service as proposed in the US by the FCC - see related FCC upper 700 Mhz Band auction page here.

Updates
  • AT&T endorses FCC's call for an open wireless service.
  • Google bets on mobile market: CNet.
  • FCC approves some open wireless requirements: CNet.
  • FCC approves revised 700 MHz band plan and service rules: FCC.
  • Band plan chart: FCC.
  • Kevin Martin's statement: FCC.
  • Why it matters; good comments re subject.
  • The battle of models, open versus closed, is best exemplified by the running discord between Google and Verizon; have a look here and here.

2007-07-17

iGoogle: a custom selection of Gadgets

Browser Bookmarks become indispensable Web-references; they are more than just nice-to-have information and knowledge pointers.

However, when using different Browsers on same or different computers, likely using different Operating Systems, the aggregate working set of Bookmarks is unavailable and eventually fragmented, unused and lost.

I found iGoogle a useful tool to define, catalogue and quickly access Bookmarks from any Browser and from any computer at home, at project offices, at client locations and while on the road.

iGoogle offers a way to configure your classical Google search page by using defined small utility panels, dubbed Gadgets, permitting custom configuration of your default Google search page by selecting a set of predefined Gadgets.

One feature I like is the ability to switch instantly between the classical Google search page and iGoogle, back and forth, via a link located on the top right corner labelled 'Classic Home' and 'iGoogle'. There are times when I prefer the simple unadorned but useful classic search page and switch as needed to the portal view, custom view, of iGoogle.

You can configure several Gadgets in your iGoogle page. The ones I use include:

  • Bookmarks
  • Calendar
  • Time and Date
  • Wheather
  • Gmail
  • Wikipedia
  • News
There are Gadgets for every interest, age and taste. There are jokes and cartoons of the day, games, stock portfolio, finance, sports, etc.

Gadgets work also on the Desktop. I do not use these since what I am looking for is a set of tools that I can access from anywhere independent of Operating System and Browser combination.

Beyond configuring Gadgets for personal use under iGoogle, Gadgets can be developed for use on web applications and pages for custom use or published for general availability.

I had a quick look at the API, xml-based, and I found it simple and a couple of applications came to mind which I'll explore shortly.

References
  • Google. Information in general for Google Gadgets is found here, and the API for development of custom Gadgets is here. For iGoogle find general information here and also at Wikipedia. And there is an API developers guide.
  • Microsoft. Microsoft has a Gadgets-based technology also. I have not worked with this API referenced at microsoftgadgets.com

2007-07-07

Ubuntu: the power of free software

There are hundreds of Linux distributions from which Red Hat and SuSE are likely the most popular in corporate deployment; and there is Ubuntu.

Why is Ubuntu, a relatively recent entrant into the crowded Linux distribution space, so successful to be selected by Sun and by Dell as an alternate certified and supported Operating System (OS) for their respective products, and selected also by the French Government to power desktops for Parliament and servers for the Ministry of Agriculture?

Possible elements contributing to Ubuntu's success include the following.

  • Mark Shuttleworth. Mark Shuttleworh articulated the free software commitment for Ubuntu and provided the funding for software development directly and through Canonical a UK based company focused on the promotion and support of free software projects.
  • Unique culture. Shuttleworth managed to attract talented personnel as well as define and promote a unique culture behind Ubuntu the product, the process for continued innovation and related support services.
  • Quality. Principle of least surprise is evident in Ubuntu. The user interface and selected packaged applications 'just work'. The OS is well integrated and tailored for use for desktop and for server functions.
  • Free software only. Products bundled with Ubuntu are those that are free of charge and free to distribute.
  • Multilingual support. This is the Ubuntu's statement re language support: "Ubuntu aims to be usable by as many people as possible, which is why we include the very best localization and accessibility infrastructure that the free software community has to offer."
  • Commercial technical support. Via Canonical and associated partners, paid support and technical assistance is available. Canonical global support site is based in Montreal, Canada.
  • Partnership program. Through Canonical, and associated certification program, there are Ubuntu support organizations across the world. Also, several Ubuntu-based value-added distributions benefit from the relationship and partnership evident through distibutions such as Kubuntu among several others. There is always a need to customize a Linux distribution for a company, for a Government, for an application. Ubuntu facilitates such work, everyone wins and maintains the continuity and presence of Ubuntu's core distribution.
  • Collaboration with the open-source community. Tight collaboration and use of common tools permit open-source participants to work together productively with Ubuntu and partners. This approach helps individual participants identify and schedule priority work, bug fixes and new development, efforts that contribute improve the quality of Ubuntu.
  • Free of patent agreements with Microsoft. While Novell's SuSE and few less known Linux distributions have signed patent/liability 'agreements' with Microsoft, Shuttleworth has stated the policy to maintain Ubuntu free of any such 'deals' that effectively compromise rather than contribute to keep software free for use and free for distribution.

2007-07-03

OpenID: single sign-on for the Web

OpenID is a distributed, decentralized, identity management and authentication service that offers a simple way to sign-on to several sites using a unique ID.

This is an old problem for which centralized solutions are available. What is interesting about OpenID is that it is distributed, it is simple in concept and in implementation, and it is an open-source project.

  • Offers single sign-on. OpenID offers a way to sign-on to different sites without creating separate userName and password for each. The participating sites must support OpenID for the authentication service to work.
  • Uses a url as identifier. It uses an Internet Resource Locator, a url, such as charlieBrown.peanuts.com, for identification. A user registers one or more urls with a site offering OpenID identity management services. Each url is claimed, owned, by a registered user.
The OpenID specification includes the ability for any organization, individual user, company, government department, service provider etc, to offer the registration and authentication service.

How does it work
? When prompted for userName and password, sites supporting OpenID offer it as an alternate way to sign-on. The user enters the OpenID url in place of userName and password. The site redirects the authentication to the site managing OpenID identity, the OpenID site validates the identity and in turn it redirects back to the calling site indicating authentication success or failure.

By using 2-and-3-factor authentication, OpenID can be used for transactions were Strong Authentication is needed.

Why is it needed? Each user must manage separate online identities using same or separate userNames and passwords. OpenID addresses this proliferation of userNames and passwords.

What is needed for OpenID to succeed? There have been several attempts to address this problem. For OpenID to be successful it needs universal adoption, enhancement and support as an open standard by industry in general. What is needed is for Google, Amazon, Yahoo, Microsoft, phone companies, banks, credit cards, retailers, et al to adopt it, and offer it as providers and consumers of the service.

For information go to openID.net, and kiwipedia.org.

References

2007-06-28

Supercomputers: interconnected servers running Linux

The list of the top-500 Supercomputers was released today at top500.org.

The list shows a number of interesting statistics. I am including some comments for changes observed from 2000 to 2007.

  • Operating Systems. The stats show a significant change in 7-years. Unix went from 90-to-12 percent while Linux went from 5.60-to-77.80 percent.
  • Processor Family. A significant change is apparent also where RISC-based technologies, namely IBM's Power, Sun's SPARC, MIPS, Alpha and PA-RISC, lost to the combination of Intel and AMD processor technology. The combined Intel and AMD offerings went from 4.00-to-78.80 percent.
These changes indicate that in contrast to earlier vector-based technologies, 80% of today's top-500 supercomputers are configured as thousands of garden-variety, conventional Intel and AMD microprocessors, e.g. scalar computers, running Linux and interconnected, clustered, by several network technologies.

The use of thousands of clustered Intel and AMD based servers for supercomputers may explain why tier-one vendors continue to devote much R&D and marketing to this segment. It may be regarded as a niche but apparently today it represents 19% of the server market and growing at 9% per year.

The industry seems to have found that there is much in common between the technology components needed for supercomputers and those needed by Google, Amazon, Yahoo, YouTube, Microsoft and others to power the growth in network-based services.

A key component that often calls less attention than real estate, power, storage, servers, OS and application software is the technology used to interconnect, to cluster, servers.

Core and leaf Switches and accompanying cables represent a huge expense, room, weight and computing capacity limiting factor.

Sun announced Constellation a system offering the building blocks around a connectivity technology that promises to simplify configuring supercomputers, should I say web services, scaling from tera-to-peta-flops.

The heart of Sun's Constellation is Magnum, a High-density 3456-port InfiniBand switch, that contributes to simplify the configuration and logistics for interconnecting large numbers of servers.
+------------- Top-500 Supercomputers ----------+
+------------- Number of Processors ------------+
Date Processors Count Share %

2007.June 1 1 0.20 %
33-64 3 0.60 %
65-128 5 1.00 %
129-256 2 0.40 %
257-512 81 16.20 %
513-1024 126 25.20 %
1025-2048 176 35.20 %
2049-4096 53 10.60 %
4k-8k 33 6.60 %
8k-16k 14 2.80 %
16k-32k 3 0.60 %
32k-64k 2 0.40 %
64k-128k 1 0.20 %

+------------- Interconnection Technology ------+
Date Technology Count Share %

2007.June Gigabit Ethernet 206 41.20 %
Infiniband 128 25.60 %
Myrinet 46 9.20 %
SP Switch 36 7.20 %
Proprietary 35 7.00 %
NUMAlink 15 3.00 %
Quadrics 11 2.20 %
Crossbar 10 2.00 %
Cray Interconnect 9 1.80 %
Mixed 4 0.80 %

+------------- Operating Systems ---------------+
Date OS Count Share %

2007.June Linux 389 77.80 %
Unix 60 12.00 %
Mixed 42 8.40 %
BSD Based 4 0.80 %
Mac OS 3 0.60 %
Windows 2 0.40 %

2000.June Unix 453 90.60 %
Linux 28 5.60 %
BSD Based 17 3.40 %
N/A 2 0.40 %

+------------- Processor Family ----------------+
Date Processor Count Share %

2007.June Intel EM64T 231 46.20 %
AMD x86_64 107 21.40 %
Power 85 17.00 %
Intel IA-32 28 5.60 %
Intel IA-64 28 5.60 %
PA-RISC 10 2.00 %
NEC 4 0.80 %
Sparc 3 0.60 %
Alpha 2 0.40 %
Cray 2 0.40 %
Intel + AMD 394 78.80 %

2000.June Power 143 28.60 %
Sparc 122 24.40 %
MIPS 62 12.40 %
Alpha 56 11.20 %
PA-RISC 53 10.60 %
NEC 25 5.00 %
Fujitsu 19 3.80 %
Hitachi SR8000 10 2.00 %
Cray 6 1.20 %
Intel IA-32 3 0.60 %
Intel i860 1 0.20 %
+-----------------------------------------------+

An open source content management solution

Alfresco is an open source Enterprice Content Management (ECM) alternative to closed products such as Documentum, IBM DB-2 Content Management, Filenet, Opentext, Interwoven, Vignette and Microsoft's Sharepoint among others.

Alfresco was developed using exclusively Open Source components such as Spring, Hibernate, Lucene. It represents a standards-based alternative to expensive, closed, commercial ECM products. Referenced standards include JSR-168, JSR-170 and JSR-283

Alfresco represents a good example of a business based on open source components and open development culture, Bazaar model, resulting in a content management solution that can be tailored to small and large organizations. Alfresco is licensed under GPL.

The list of customers using it for ECM, collaboration, workflow, document, web, records and image management is impressive for a relatively new product.

It is worth mentioning that Alfresco produced a full functioning, scalable, open ECM alternative to conventional products in less than a year of development effort using the open source components listed below. It represents a very good example of a collaborative, open and successful business developed by reusing existing components.

Alfresco's selection of open source components is a valuable reference, a list of chosen components among a large selection, as well as an excellent example of application development using existing well tested components free of proprietary licenses, royalties, patents and other demons.

Most components with the exception of Spring, OpenOffice, Hybernate and Lucene, are small cleverly crafted software, prepackaged unique functions, chosen by Alfresco to deliver the resulting functional integration in place of custom code.

  • Spring. Spring is an application framework for Java.
  • Open Office. OpenOffice.org is a multiplatform and multilingual office suite.
  • Hibernate. Hibernate is a high performance object/relational persistence and query Java library.
  • Lucene. Apache Lucene is a text search engine library written in Java.
  • MyFaces. Java Server Faces is a web application framework.
  • FreeMarker. FreeMarker is a template engine to generate text output based on templates.
  • Rhino. Rhino is an implementation of JavaScript written in Java typically embedded into Java applications to provide scripting to end users.
  • EHCache. Ehcache is a java-based distributed cache for general purpose caching.
  • ACEGI. Acegi Security provides applications with authentication, authorization and access control.
  • Log4j. Offers logging behaviour to Java applications.
  • jBPM. Workflow and business processes library for Java applications.
  • Axis. Apache Axis is an XML based Web service framework.
  • POI. Apache POI project consists of APIs for manipulating various Microsoft file formats using Java.
  • Xfire. XFire facilitates use of Web Services, via SOAP, for a Java application.
  • Quartz. Quartz is a job scheduling system for Java applications.
  • PDFBox. PDFBox is an open source Java PDF library for working with PDF documents.
  • TinyMCE. TinyMCE is a web based Javascript HTML WYSIWYG editor.
  • Jaxen. Jaxen is a Java to search and extract information from XML documents - an XPath Engine.
  • JCR RMI. Apache Jackrabbit JCR-RMI is a Remote Method Invocation (RMI) layer for the Content Repository for Java - Apache Jackrabbit implements JSR-283.
Other open source content management systems, include:

2007-06-26

Service Oriented Architecture - SOA

Service-Oriented Architecture - SOA - is a key technology for developing network-based services. I was pleased to find a reference to a talk by Patrick Steger at the International Conference on Java Technology.

Just reading the Abstract tells me that the subject is detailed and well structured accompanied by a working code sample. I'll post the video reference if available; please post it as a comment should you find it; thanks.

On the subject of network-based services, here is a reference to the Economist's article: A battle at the checkout.

Abstract - Standards for an interoperable, secure and flexible SOA

"SOA (Service-Oriented Architecture) is becoming the central strategy for more and more companies and therefore getting business critical. An enduring SOA has to provide very high grades of security and availability combined with good interoperability and usability to protect both, the valuable assets and the often tremendous investments of the company.

Based on WSIT (Web Services Interoperability Technologies, SUN Microsystems) and WCF (Windows communication foundation, Microsoft) an interoperable, secure and flexible SOA is feasible today. This talk will provide you with the theoretical background of the standards you need to know when aiming for that target.

During the talk we will create a simple yet secure and interoperable SOA system centred on the well known Calculator service.

The SOA is built on a step by step basis and introduces the following major security relevant WS-Standards:

  • XML Encryption
  • XML Signature
  • WS-Security
  • WS-MetadataExchange
  • WS Secure Exchange
  • WS-Trust
  • WS-SecureConversation
  • WS-SecurityPolicy
  • Security Assertion Markup Language (SAML)
  • eXtensible Access Control Markup Language (XACML)
For each Standard you will learn its purpose, status and relationship to the other standards.

The final SOA system supports a scenario where a client application requests metadata from a Calculator Service and uses that metadata to obtain the SecurityPolicy of that service. In addition the location of the Authentication Service issuing the required SAML Token to access the Calculator Service is retrieved from the metadata.

The client then authenticates with the central Authentication Service and receives a SAML Token in return. Using the SAML Token the client calls the Calculator Service's add operation.

The Calculator Service validates the SAML Token and asks the central Authorization Service to check the authorization of the client to use the add operation with the given parameters."

2007-06-19

Of Open-source, the Web and Hybrid cars

What do Open-source, the Web and Hybrid cars have in common?

They are cool.

Young and not so young, students, professors, geeks, retirees, movie-stars, are using, talking, reading, writing about Open-source software, the Web, Hybrid cars among other topics. They are participating and relating to day-to-day aspects, events of impact to their lives, to their families, to their neighbors, to their surroundings, to the environment.

Open-source stood against the country-club and gated community approach to software development; against an exclusionary way to develop function and value earlier thought available to selected few.

Open-source started with the premise that software is priceless and that all should benefit from it; like water, forests, rivers, oceans, fisheries, fauna, flora, atmosphere, etc. These are primordial resources, they are priceless, they must be managed and preserved in such a way that we can truthfully say we left them better than we found them. Same with software source code; you can use it and you can enhance it as long as such enhancements are known and available to all to see, to adopt, to improve, to distribute, to preserve for use by future developers.

The Web fostered also an idea of equal participation to not only consume but to produce information, privilege earlier franchised to selected few. The Web was to be MSN and AOL available only via paid membership. The Internet and later the Web, thanks Sir Tim Berners-Lee, changed that and as with Open-source it created a culture, a much more universal and open model, a way to look at primary resources in this case applied to data, information, and knowledge.

The use and evolution of Hybrid automobiles, alternate sources of energy in general, seem also to be largely influenced by concerned individuals, by concerned groups and communities. The technology has existed for decades with little if no leadership by governments and industry. The success of Hybrid cars and the ongoing effort to go beyond manufacturers designs, go beyond the 'I have a Hybrid car' statement, to enhance the efficiency of these vehicles by adapting better batteries and plug-in to the grid, use of solar energy, modified driving habits, etc, resulting in twice the efficiency of manufacturer's design.

It was not Toyota, Honda, Ford, GM et al that made available
plug-ins for hybrids. It was consumers that demanded it and relatively small companies offered better batteries and adapters for connection to the grid and to solar panels resulting in 17-to-29 percent more fuel efficiency. These are not industry or government initiatives; these are concerned individuals, asking, trying, experimenting, using, measuring, enhancing. Does it sound like Open-source; It does. Also, it helps when movie-stars and politicians get involved. It is cool.

Have a look at the links below for selected references re subject.

Open-source generally implies adoption of an 'open culture', model, for software development only. It could very well apply to other domains. Have a look at this article in the Economist postulating use of Open-source culture/model to Health Care.
CAN goodwill, aggregated over the internet, produce good medicine?
The current approach to drug discovery works up to a point, but it is far from perfect. It is costly to develop medicines and get regulatory approval. The patent system can foreclose new uses or enhancements by outside researchers. And there has to be a consumer willing (or able) to pay for the resulting drugs, in order to justify the cost of drug development.

Pharmaceutical companies have little incentive to develop treatments for diseases that particularly afflict the poor, for example, since the people who need such treatments most may not be able to afford them.

2007-06-18

Server form factors

Pedestal, Rack-Mount and Blades are the common configuration form-factors for server units.

In the quest to maximize computing capacity and minimize floor-space, energy consumption and cooling requirements, Rack-Mount units are generally preferred to Blades. Blades may offer better packaging but they exhibit one distinct shortcoming

  • Shared computing resources. A Switch is included to share Network and I/O among packaged Blade-computing units. When hosting large applications and/or offering Virtual-Hosting, this resource sharing is a problem for which sites generally prefer Rack-Mount units.
The recently announced Sun Blade 6000 offers the packaging advantages of Blade format while configuring ten completely independent servers - no sharing of computing resources among participating server units.

The salient points of this design include the following:
  • More memory and more I/O. Up to double the memory and I/O capacity of competing Blades and Rack-Mount configurations is claimed by Sun
  • Reduced energy requirements. Shared power and cooling saves energy when compared to equivalent Rack-Mount servers
  • Processor choice. AMD Opteron, Intel Xeon and SPARC processor-based server modules, as well as support for Linux, Solaris, and MS Windows operating systems
  • Standard I/O: PCIe Express including support fo 1 and 10 Gigabit Ethernet technology.
A good test is to see if Google, known to use close to 1/2 million Rack-Mount units, switches instead to Blades for use in its many new Datacenters.

The Web: is it the future Datacenter ?

Yahoo, Amazon and Google among others are effectively forging an emerging computing model as an alternative to setting and operating conventional computing facilities for a company.

This emerging model offers vasts amounts of computing resources available to small and not so small companies to host their applications and services using a common 'computing cloud' as described in this Economist's article in reference to the partnership between Google and Salesforce.com.

Examples of this 'computing in the cloud' include services such as:

  • network.com. Sun's $1/CPU-hr, pay-per-use computing service, offers a catalogue of registered applications as well as the ability to develop, test and operate custom applications across the Internet.
  • Amazon. Amazon's Simple Storage Service - s3 - offers unlimited storage via a programable interface priced at $0.15 per GB-Month of storage used. Elastic Computing Cloud (EC2) offers virtual on-demand Linux images that live in S3 for booting and stopping; root access is provided.
  • salesforce.com. "Planning and implementing customer relationship management (CRM) solutions can be a significant undertaking. Salesforce.com's Successforce helps you succeed by unlocking the power of our business solutions and providing you with the greatest value from your investment."
  • Google. Several Google services and APIs as referenced here.
This trend does not mean that Datacenters will be replaced by Web-based services. No; but what will likely happen is that 'computing in the cloud' will be increasingly attractive to at least two groups of applications/services:
  1. New companies/services. Avoiding a large capital expenditure and logistics and cost of operating a computing facility(ies) will be very attractive for new business and services.
  2. New applications within large corporations. Often in-house IT organizations are unable to respond rapidly to new applications/services and corporate end-users may look to external providers for solutions.
References

2007-06-07

Sun's Zettabyte File System - ZFS

A weak point of Unix and Linux operating systems is the File System.

Defining and managing disks is difficult and requires expertise to configure and manage external storage; it is a complex and error prone task. But what is most serious is File System corruptions, inability to rebuild and/or excessive time to recover, problems present even when using newer File Systems - Journal/Log-based File Systems.


Sun's Zettabyte File System, ZFS, is a fresh approach at how data is structured and stored on a disk, or set of disks, while addressing ease of use and File System integrity. The salient features of ZFS include:

  • Integrity. The system does not overwrite data, it saves new data first and then deletes the data it replaces. It includes also several built-in checks to prevent data corruption.
  • Capacity. Find at wikipedia.com a good reference to the capacity metrics of ZFS. "ZFS is a 128-bit file system, so it can store 18 billion billion (18.4 x 10^18) times more data than current 64-bit systems." The limitations of ZFS are such that they will unlikely be encountered in practice. The capacity limit for one ZFS storage pool, zpool, is 2^128 = 3.4 x 10^38 bytes. The limit of zpools is 2^64 = 1.8 x 10^19.
  • Snapshot. In-place-copies of the File System can be taken at anytime, with minimum overhead, and available concurrently as required thus facilitating access to on-line previous File copies and for Backups. This is such a useful feature and it appears uses the same/similar approach as the highly successful Netapp's storage units using WAFL File System.
  • Quotas. The system supports quotas at a File System level.
  • Management. ZFS uses the concept of pooled storage; simply plug in additional drives, without worrying about storage parameters such as volumes or partitions. This approach significantly reduces the labour required to define, expand as needed and manage storage.
  • Architecture. ZFS implementations exist for SPARC and for Intel/AMD x86.
  • Implementations. Initially it was available as part of Solaris, but now a Linux implementation is available and most recently it is rumored to be included in the upcoming Mac OS X 10.5, Leopard.
Here is a link to Jeff Bonwick's blog. Jeff is the project lead for ZFS at Sun.

Should you be interested in more detail, read it here and see it here.
+---------- quantities   of   bytes -------------+
name prefix +----standard---+ historical

kilobyte kb 1000^1 = 10^3 1024^1
megabyte mb 1000^2 = 10^6 1024^2
gigabyte gb 1000^3 = 10^9 1024^3
terabyte tb 1000^4 = 10^12 1024^4
petabyte pb 1000^5 = 10^15 1024^5
exabyte eb 1000^6 = 10^18 1024^6
zettabyte zb 1000^7 = 10^21 1024^7
yottabyte yb 1000^8 = 10^24 1024^8

2007-06-05

Google's infrastructure: scalability++

Google has introduced a vast array of products where search remains the main focus.

It is interesting to learn how Google does it and what are the internal IT services behind it.


Here is a list of Google products and of programing APIs.

A good place to start is Google's mission statement:


"To organize the world's information and make it universally accessible and useful."

Wow; this is a formidable challenge and offers a good insight at the underlaying data volumes, storage, computational complexity, network and computing topology across the world to design, build and deliver Google's work.

I found this video of much interest re subject. The video describes essential building blocks that offer a glimpse at the technology behind Google's services; they include:

  • Computers. Google uses hundreds of thousands conventional computers, plain vanilla Intel/AMD x86 units, powered by a custom Linux OS.
  • GFS - Google File System. GFS is a distributed file system, a basic unit of storage to save abstractions such as BigTable.
  • BigTable. BigTable is a storage abstraction for managing structured data designed to scale to petabytes, 10^15 bytes, of data.
  • MapReduce. MapReduce: simplified data processing in large Clusters. A model to define a given programming task across a large data set using a Map, a key and value pair programming abstraction, and associated computing function(s).
This reference is about An Economic Case for Chip Multiprocessing.

I found this blog offering more detail about Google's infrastructure.
And this one I found recently has much more information re subject.

Also this one has a summary report from Google's Scalability Conference:
At Google they do a lot of processing of very large amounts of data. In the old days, developers would have to write their own code to partition the large data sets, checkpoint code and save intermediate results, handle failover in case of server crashes, and so on as well as actually writing the business logic for the actual data processing they wanted to do which could have been something straightforward like counting the occurence of words in various Web pages or grouping documents by content checksums. The decision was made to reduce the duplication of effort and complexity of performing data processing tasks by building a platform technology that everyone at Google could use which handled all the generic tasks of working on very large data sets. So MapReduce was born.

2007-06-01

Google Gears

Google released Gears a browser extension that enables people to access Web applications when working off-line.

This is a significant development in defining software and associated APIs and charting the evolution of the Web-platform. You can read about Gears at Google's code.google.com/apis/gears and a good summary is available also at CNet.

Salient points include:

  • API. A defined API for the Web-platform; it defines access to 1) a local Web-Server - yes, an embedded Web server to cache and serve html, JavaScrip, images, etc.; 2) local Database, SqLite, to store and search data locally; 3) WorkerPool: allows asynchronous operation resulting in responsive operation e.g parallel work.
  • Open-source. BSD open-source license.
  • The Google culture. Geek-friendly culture fostering creativity by Google employees and by outside developers. The first Google application using Gears, Google Reader, was done as part of the company's program in which employees can work on their own projects for 20 percent of their work week.
  • Participation of other companies. A number of companies working on Ajax-based applications are working with Google to define and use Gears; these include Adobe, Opera, Mozilla and Dojo Foundation creators of Dojo Tolkit.
The amount of work, products, services, APIs that Google is releasing is shaping the direction of the Web-platform which in this case defines the way a Web Browser and Web-Applications can work in a disconnected, standalone, manner. Products such as Palm's Foleo may indeed use this technology as Adobe and others likely will use it to offer their applications in connected and disconnected mode.

Google's Gears blog is located here.

I found this blog with some very interesting thoughts on the impact of a universally accepted method, a standard, for content synchronization across the Web.
The Web platform's promise is access to content anytime, anywhere and on anything—as long as the user has Internet access. Google Gears could bring some of that information off-line, further extending that promise. Universal synchronization would be game-changing, however; it would be a paradigm shift for digital devices, desktop software and the Web.

2007-05-31

Linux Fedora 7

RedHat introduced Fedora 7 offering a build capability that enables custom distributions, in an open-source process, thus attracting new products/developers to RedHat to develop custom Linux configurations.

It offers a number of key features as described in Red Hat's Fedora site. The most prominent enhancements include:

  • Open build process. Custom Linux configurations can use the same build process and tools used by Red Hat; these tools and processes are open-sourced.
  • Architectures. Supports Intel/AMD x-86 32-and-64-bit, and Power PC, ppc.
  • Virtualization. Virtualization is supported via KVM (Kernel-based Virtual Machine) and Qemu technologies, and by Xen.
  • Virtualization manager. The Fedora graphical virtualization manager can be used to manage Virtual Machine instances.
  • Kernel. Fedora 7 is built on top of 2.6.21 Linux-kernel.
Along with Ubuntu, Fedora and other Linux distributions have a great feature that differentiates these products from Novell's SuSE and Xandros: free of patent agreements with Microsoft.

Palm's Foleo

Palm's Foleo is a small Linux-based laptop introduced as a companion to a cell-phone.

I would buy one, and see wide acceptance and use in a number of fields such as education and health, should it have Ubuntu, Ethernet in addition to WiFi, supporting c, perl, php and java. A number of generic and custom applications will find this unit and form-factor useful.

As is, Foleo may not gain wide acceptance designed and marketed as an additional gadget to a cell-phone; why use a Laptop that depends on the availability of a Cell-phone for its operation?

The upside of Foleo may be that Palm will evolve it to offer standalone operation, e.g without requiring a Cell-phone, Ubuntu, or Ubuntu-like Linux 'presentation', possibly using tools such as Google's Gears, resulting in a powerful compact unit while connected to the Internet, corporate network and as a standalone computing device.