This is an article I have wanted to write for a while...
The first impression is that it is using advanced human-computer interaction theories and principles to guide the overall design. The final implementation of various gadgets doesn't look like the creation of individual developers, but looks like a rare solution to a set of very complicated criteria. What I mean is, it is new, strange, but it is not new and strange because someone just wanted something new and fancy, but it is a difficult solution to satisfy multiple design principles and human factors together.
This being my first impression, an effective competitive UI system would probably need to consider almost all the background critieria and factors used to restrict and motivate the iPhone/iPad UI. Then, propose another solution which satisfies all the criteria too, but looks significantly different. That's a path to creating a strong UI which may immediately obtain customer approval and popularity as iPhone did. iPhone's immediate success lies in the difficulty of solving the UI problem.
This explains why there is only one iPhone -- it may be an outstanding "local maximum" in the huge design landscape. Android's UI so far is quite primitive, including useful features to be extended by OEMs. There is a very good chance that the openness of the implementation will generate competitive and sleek phones -- the more it tries, the more probable that something great and stunning will come out. Linux GUI took more than a decade to finally rise to a similar level as Windows and Mac OS in Ubuntu. Having the Linux success to borrow from and given the relative simpleness of the system, Android has already achieved equal level as WinMobile and RIM. Its only competitor is iPhone.
A likely approach to defeat iPhone is indeed to consider its openness advantage as good for a "numbers game" -- a game which Apple's perfectionist and unique solution will not follow. When a lot of phonemakers are trying new ideas, something cool and satisfying may come out even though the background secrets of iPhone are not re-discovered and religiously followed.
From my impression, some of the hidden secrets or driving factors of iPhone UI design, can be listed below (though a more complete list will require some more formal study and research):
Goal of design (it's just mathematics):
1. Minimize a new user's (or all potential users' total) learning time (expaned in 3 below).
2. Maximize the user's experience of productivity and pleasure
a. This includes minimize the total time of user working-memory overload. Given that people's working memory span is 7 plus/minus 2 items, all fixed menus that need to be selected by the user should contain no more than 7 items. If it is more than 7 items, they should be broken down to groups of 7 or so. If long lists are needed, minimize user interaction time with these lists (scroll them out ASAP). If the lists are long by nature, provide a search box that takes partial or (even better) incremental input.
b. Besides reducing the taxation on user's working memory, it should also reduce energy spent on (selective) attention and movement. This includes making most frequently used options big (easy to see without squinting (squinting takes energy), and position them near the thumb's default positions (reduce moving distance, and muscle fatigue). The bigger something is, the less squinting energy is spent -- the outcome of this principle is the unprecedentedly large combo boxes (a half-screen pop-out) instead of an inline pain.
c. Both a and b need to be done by cutting off non-essential operations. Or create bigger umbrella operation to reduce a long procedure and to hide further details (thus minimize interaction time and again energy spent). The less energy spent, the more happy the user is -- this is the outcome of evolution. The umbrella operation idea is similar to the advent of Wizards in Windows Applications to avoid people from getting lost in big menu trees or being reluctant to try all the menu-dialog complexes. But it's a further simplification. A good example is the WiFi selector -- you either choose one of the listed, or just enter the name (SSID if it is not broadcast). You don't need to know what an SSID is (this is a headache). You don't have to worry about WEP vs WPA2, it will try it for you.
3. The last but not the least is to maximize the relevance of prior knowledge and experience. You haven't seen things immediately disappear after choosing it in your real life, so you don't see it in iPhone either. Though opening and closing views of objects (Windows) is now common, thanks to the MVC model in Mac OS and Windows, we shouldn't take it for granted, and it does take extra mental efforts/energy to understand the computer's model of its internal objects (even though we do so unconsciously). Why won't the UI use our model of the world -- things don't disappear, but they can be moved away, or we can come close to or farther from them (or scaling their appearances). This is a key principle driving iPhone UI design -- our existing model of the worlds already provides us a lot of "rules of thumb" to operate on things. If you show me something that looks like something I know and like, I know what to do with it. If you make use of what I want to do with it, you can use the idea to create a gadget that doesn't take anytime to learn. So one way of making a collection of learning-free gadgets is to exploit things we like and are used to: buttons, light switches, pieces of paper, books, films, "radio buttons" (bulleted selections), etc. The most important is the piece of paper model (attached to some rubber band because paper is too breakable and unpleasant)-- it is used to motivate almost everything that is big -- long lists, main apps menus, etc. We may think it's weird to aggressively model the physics of rubber bands or springs, but the physics is what we naturally bump into when we don't track of the sizes or boundaries of big things -- it simply will not move when you stretch too far, but you don't have to spend any energy watching for it's size to avoid breaking it. Such a design is a very good combination of a touch-and-feel, minimizing memory/attention taxation, and exploiting intuitive physics that everyone has. With this, we don't need to then learn scrollbars, the difference between scrolling by one line vs. scrolling by one page and even don't need to know where you are in the list -- it doesn't matter until you hit the boundaries -- then the physics will tell you.
The above is some very simple principles (all of them human-centered) that I strongly believe are used to motivate the designs: minimizing working memory/attention requirement, minimizing finger/eye movements, minimizing distractions and maximizing effortless long-term world knowledge and well-established habits of responses to familiar things.
The only exception of the design principles to make things easier -- is the power down operation -- you have to press the power key for a while, then carefully slide a handle held by an imaginary spring to the end position... That makes it harder to power down. Everything else is easy for you -- so enjoy.