WEEK 13

WEB SERVER HARDWARE & SOFTWARE


Types of Web Sites

*Web site planning is first step
– Determine site goals
– Estimate visitors, types of files
– Assess existing information technology staff

Five Web site categories:
1. Development sites: evaluate Web designs
2. Intranets: house internal information
3. Extranets: allow outside party access
4. Transaction-processing sites: commerce site
5. Content-delivery sites: deliver news, histories, summaries, digital content


Web Clients & Web Servers

*Client/server architectures– Client requests services from server

*Client computer– Uses Web browser software (Web client software)

*Server computer– More memory and larger, faster disk drives

*Platform neutral Web software
– Various computers communicate easily, effectively
– Critical ingredient for rapid spread, widespread Web acceptance


Dynamic Content

*The server performance is affected by the Web page mix & type delivered to client

*Dynamic page– Client Web page content shaped by program

*Static page
– Unchanging page retrieved from disk
– Sometimes stored in Web server’s active memory

Static versus dynamic page delivery:
– Static page requires less computing power
– Servers delivering mostly static pages perform better
*Dynamic content
– Nonstatic information constructed in response to Web client’s request
– Example: order inquiry with unique customer number

*Server-side scripting
– Used by first Web sites providing dynamic pages
– Also called as “Server-side includes” or “Server-side technologies”

* Web server programs create Web pages before sending pages back to client
*Server-side technologies are slow
*Large online business Web sites alternative

*Dynamic page-generation technologies
– Popular tools to generate dynamic Web pages and make them interactive
a. AJAX (asynchronous JavaScript and XML): creates interactive Web sites looking like                applications
b. Ruby on Rails: creates dynamic Web pages with interface looking like application
c. Python scripting language

Dynamic page-generation technologies examples:

• Microsoft Active Server Pages (ASP)
• Sun Microsystems JavaServer Pages (JSP)
• Open-source Apache Software Foundation Hypertext Preprocessor (PHP)
• Adobe Cold Fusion

*Dynamic Web page creation (server-side scripts mix with HTML tagged text)

*Java servlets (server-side programs created using Java programming language (Sun))

The future of dynamic Web page generation
*Criticisms of previous approaches
– Do not solve problem of dynamic page generation
– Shift dynamic page creation from HTML coders to ASP (JSP, PHP) programmers

*Apache Cocoon project initiative
– Query XML formatted data and generate output in multiple formats
– HTML output: useful for dynamic Web page creation
– May apply style sheet to data: tailored response
– Portable Document Format (PDF) file, Wireless Markup Language (WML) file

*Latest Cocoon version
– Divides work into four areas of concern
– Limits area interactions to five specific contracts
– Breaks direct connection between logic and style
– Future dynamic Web page design easier

*Other initiatives
– Microsoft: Microsoft.NET Framework
– Oracle: including explicit PHP support (other scripting languages) in its database                     products


Web Client / Server Communication

*Web browser requests files from Web server
*Transportation medium: the Internet
*Request formatted by browser using HTTP
*Request sent to server computer
*Server receives request
   – Retrieves file containing requested Web page
   – Formats using HTTP
   – Sends back to client over the Internet
*Client Web browser
   – Browser displays information if it is an HTML page
   – Graphics can be slow to appear


Three-Tier & N-Tier Client / Server Architectures

*Three-tier architecture
– Extends two-tier architecture
– Allows additional processing before server responds to client’s request

*N-tier architectures
– Higher-order architectures; more than three tiers

*Third tier supplies information to Web server (databases and related software application)

Four, five (or more) tiers include:
– Software applications (like three-tier systems)
– Databases, database management programs

Example: catalog-style Web site
– Search, update, display functions
*Track customer purchases stored in shopping carts, look up sales tax rates, keep track of                      customer preferences, query inventory databases, keep company catalog current


Operating Systems for Web Servers

*Operating system tasks
– Running programs, allocating computer resources, providing input and output services
– More responsibilities (large systems)
– Tracking multiple users, ensuring no interference

*Microsoft Windows Server products
– Considered simple to learn and use
– Raise security concerns

*Linux-, UNIX-based products
– Popular
– Considered secure as Web servers 17

*Linux (open-source operating system)
– Fast, efficient, installs easily
– Open-source software
– Developed by community of programmers
– Software available for download (free)
– Others use it, improve it, submit improved versions

*Companies selling Web server computers includes Linux in default configurations
*Companies may buy Linux through commercial distributors

*Sun Microsystems
– Sells Web server hardware
– Solaris: UNIX-based operating system


Web Server Software

Commonly used Web server programs:
– Apache HTTP Server
– Microsoft Internet Information Server (IIS)
– Sun Java System Web Server (JSWS)

*Netcraft networking consulting company Web survey– Measures Web server software’s relative popularity

Web server performance differences:
– Workload
– Operating system
– Web pages served

*Choose right server for each business need

*Apache HTTP Server
– 1994: Rob McCool developed Apache
– Original core system with lots of patches
– Known as “a patchy” server
– Ongoing group software development effort
– Dominated Web since 1996
– Free, performs efficiently
– In IBM WebSphere application server package
– Zeus based on Apache open-source code
– Most widely installed Web server software package
– Runs on many operating systems, hardware

*Microsoft Internet Information Server
– Bundled with Microsoft Windows Server operating systems
– Runs on Windows server operating system (by design)
– Used on many corporate intranets
– Adopted Microsoft products as standard products
– ASP, ActiveX Data Objects, SQL database queries
– Microsoft FrontPage Web site development tool, reporting tools
– HTML pages, ActiveX components, scripts can be combined to produce dynamic Web             pages

* Sun Java System Web Server (Sun ONE, iPlanet, Netscape)
– Original NCSA Web server program descendent
– Former names: Sun ONE, Netscape Enterprise Server, iPlanet Enterprise Server
– AOL-Sun Microsystems partnership called iPlanet
– Agreement expired March 2002
– iPlanet became part of Sun
– Not free: reasonable licensing fee
– Runs on many operating systems
– Web server use one percent of all Web servers
– Web server use busiest and best-known Internet sites: BMW, Dilbert, E*TRADE, Excite,        Lycos, Schwab
– Web server use more than 30 percent of all public Web sites
– Web server use more than half of top 100 enterprise Web sites
– Supports dynamic application development
– Provides connectivity to database products


Electronic Mail (E-Mail)

Electronic commerce important technologies

*Web– interactions between Web servers and clients

*E-mail
– Gather information, execute transactions, perform other electronic commerce-related         tasks
– 1970s origin: ARPANET
– Most popular form of business communication
– Far surpassing: telephone, conventional mail, fax (in volume)


E-Mail Benefits

Reason many people attracted to the Internet:
1. Conveys messages in seconds
     – Simple ASCII text, character formatting
2. Useful e-mail feature
    – Attachments: most important message part

E-mail uses:
– Confirm receipt of customer orders
– Confirm shipment of items ordered
– Send information about a purchase to buyer
– Announce specials and sales
– Keep in touch with customers


E-Mail Drawbacks

Time spent answering e-mail:
– Managers: five minutes per e-mail
– Average person:  two hours a day
– Creating resentment

• Computer virus (through attachments)
– Program attaches itself to another program
– Causes damage when host program activated
– Cost for e-mail convenience
– Virus protection software, dealing with security threats

*Spam (unsolicited commercial e-mail), the most frustrating and expensive e-mail problem


Spam

*Magnitude of spam problem; 24-hour period in 2008 (220 billion spam e-mail messages sent)

*Researchers believe that more than 98 percent of all e-mail messages will be spam before effective technical solutions implemented

*Spam leveling off (approaching 100 percent); absolute spam e-mail numbers could continue to grow rapidly

*AOL active has taken active role limiting spam through legal channels

*Antispam efforts limit spam annoyance and cost, & E-mail server computer software


Solutions to the Spam Problem

Some solutions require:
– Passing of new laws
– Technical changes in Internet mail-handling systems

Other approaches:
– Implemented with existing laws and current technologies (requires cooperation from           large numbers of organizations and businesses)

Individual user anti spam tactics:
– Focus: Limit spammer’s access to (use of) e-mail address
– Use complex e-mail address: xq7yy23@mycompany
– Control e-mail address exposure; software robots: Discussion boards, chat rooms, other       online sources
– Use multiple e-mail addresses: Switch to another if spammers uses one


Solutions to the Spam Problem: Content Filtering

Basic content filtering
– Requires software: Identifies content elements in incoming e-mail message
– Content-filtering techniques differ in terms of content elements examined,looking for         message spam indications, & how strictly message classification rules applied
– Basic content filters examine e-mail headers
– Filtering task software location
    a. Client-level filtering: individual users’ computers
    b. Server-level filtering: mail server computers
    c. Basic content filtering (cont’d.)

*Black list spam filter– looks for known spammers From addresses in incoming messages

*White list spam filter– looks for good sender From addresses in incoming messages

*High false positives rate: messages rejected (should not have been)

*Challenge-response content filtering
– Compares all incoming messages to a white list
*If sender not on white list, automated e-mail response sent (challenge)
*Challenge asks sender to reply to e-mail (response)
*Reply must contain response to a challenge presented in the e-mail
– Designed so human can respond easily
– Drawbacks
*Victim bombarded; perpetrator includes victim’s email
*Doubles amount of useless e-mail messages sent

*Advanced content filtering
– Uses indicators (Words, word pairs, certain HTML codes, information
    about where word occurs)
– Looks for spam indicators (entire e-mail message)
– Approach based on branch of applied mathematics (Bayesian statistics)

Problems:
– Spammers stop including defined indicators
– Challenge creating effective content filters
– Filtering “sex” may delete valid e-mail with “Essex”

*Bayesian revision statistical technique– additional knowledge used to revise earlier probability estimates

*Naïve Bayesian filter
– Software begins by not classifying messages
– User reviews messages
– Message type indicated to software: spam (not spam)
– Software gradually learns message element

POPFile success
*Initially caught 30 percent of spam messages
*After two weeks: caught more than 90 percent
*Eventually: caught more than 99 percent
*False positives: small rate

*POPFile magnet feature: implement white and black list filtering

*Naïve Bayesian filters’ effectiveness (very effective client-level filters; major drawback: users must update filters regularly)


Legal Solutions to the Spam Problem

*January 2004: U.S. CAN-SPAM law went into effect
*Spammers slowed down activities immediately

Seeing no threat of broad federal prosecution:
– Spam rates increased
– Spam estimate: over 80 percent of all e-mail messages

*CAN-SPAM
– Regulates all e-mail messages
– Regulates messages advertising or promoting commercial product or service
– Includes messages promoting Web site content
– Prohibits misleading e-mail message address header information, facilitating agreed-         upon transaction or updating customer in existing business relationship
– Successful prosecution: fines ($11,000) and imprisonment


Techical Solutions to the Spam Problem

– Internet design not intended for today’s needs
*E-mail: incidental afterthought
*No mechanisms ensuring e-mail sender identity

– Internet’s polite set of rules
*Send and wait for acknowledgement (fast)

-Slowing down acknowledgment messages
*Originating computer will slow (must continue to scan for acknowledgment)
*Will not send more messages (to that address) until acknowledgment received
*Requires defending company to develop way to identify computers sending spam

*IBM software: access to large database tracking such computers

*Other vendors: software identifying multiple e-mail messages from single source in rapid                succession
*Once identified: software delays sending message acknowledgment

*Teergrubing: launching a return attack
– Sending e-mail messages back to computer originating suspected spam

Teergrubing objectives:
– Ensure computer sending spam is trapped
– Drag down ability to send spam
– Concern: counterattack might violate laws

*Ultimate spam problem: new e-mail protocols providing absolute verification of e-mail                   message source


Web Server Hardware

*Hosting electronic commerce operations
– Use wide variety of computer brands, types, sizes
– Some small companies run Web sites on desktop PCs
– Most Web sites operated on computers (designed for site hosting)


Server Computers

*Use more capable hardware elements (usually more expensive than workstation PCs)
*Price range of Web server computer (between $3000 and $200,000)
*Companies selling Web server hardware provide Web site configuration tools

– Housing Web server computers
*Freestanding cases
*Installed in equipment racks

*Blade servers: servers-on-a-card
– Small: 300 installed in single 6-foot rack

*Fundamental Web server job
– Process and respond to Web client requests (sent using HTTP)

*Virtual server (virtual host)
– Maintains more than one server on one machine
– Different groups have separate domain names


Web Server Performance Evaluation

*Benchmarking– testing to compare hardware and software performance

Elements affecting overall server performance:
– Hardware, operating system software, server software, connection speed, user capacity,       type of Web pages being delivered
– Connection speed (T3 faster than T1)
– Number of users server can handle
    a. Important
    b. Hard to measure

*Throughput– number of HTTP requests hardware and software combination can process

*Response time– time that server requires to process one request

Choosing Web server hardware configurations:
– Run tests on various combinations, consider scalability, compare standard benchmarks
– Use independent testing labs: Mindcraft

*Run benchmarks regularly
*Provide site visitors with best service possible


Web Server Hardware Architectures

*Electronic commerce Web sites use tiered architecture to divide work of serving Web pages that       may use more than one computer within each tier

*Server farms– large collections of servers that are lined up row after row

*Centralized architecture
– Uses a few large and fast computers
– Requires expensive computers
– More sensitive to technical problems
– Requires adequate backup plans

*Distributed architecture (decentralized architecture)
– Uses a large number of less powerful computers
– Spreads risk over large number of servers
– Servers are less expensive
– Requires additional hubs or switches to connect servers to each and the Internet
– Requires cost of load balancing

*Load-balancing systems ($5000 – $50,000)
– Network hardware monitoring; server workloads which assigns incoming Web traffic to       server with most available capacity

Simple load-balancing system:
– Traffic enters through site’s router
– Encounters load-balancing switch
– Directs traffic to best Web server

More complex load-balancing systems:
– Incoming Web traffic enters from two or more routers
– Directed to groups of dedicated Web server


 

Leave a comment