
Industry: Agricultural Processing
Location: Decatur, Illinois, USA
Headquartered in Decatur, Illinois, Archer Daniels Midland is a world leader in agricultural processing. The company is the world’s largest processor of soybeans, corn, wheat, and cocoa. ADM is also a leader in soy meal and oil, ethanol, high fructose corn syrup, flour, and in ethanol production. In addition, ADM is building a position in such value-added products as food additives and nutraceuticals such as vitamin E and sterols. ADM has over 23,000 employees, 368 processing plants, and net sales for the fiscal year ending June 30, 2001 of $20.1 billion.
The Challenge
Rob Goings, System Programmer for ADM, said that TCP/IP traffic involving the
company’s IBM 9672 R65 enterprise system running OS/390® Version 2.6 had dramatically
increased in recent years. The vast majority of the company’s users now access the
mainframe from PCs that provide the flexibility to talk to other platforms such as
NT servers and e-business systems. These systems are all connected using the TPC/IP
protocol running on Ethernet networks. Beyond that, the mainframe is increasingly
connected to printers that run on the LAN. The FTP protocol is used extensively to
transfer files to and from mainframes and handle heavy email loads as well. During
working hours, there are typically more than 3000 TCP/IP connections running on the
mainframe.
In the past, the technical services group was limited to utilities that were supplied with the operating system, such as NetStat™, to monitor TCP/IP activity. One of the limitations of these tools is that they monitor only S/390 TCP/IP stacks, not LAN stacks, which means multiple tools must usually be used for troubleshooting, making it difficult to identify the source of the problem. Another difficulty is that users find their command line interface tedious to use and that results are presented in a non-intuitive fashion.
The Solution
In an effort to get a better handle on TCP/IP services, Jim Langen, Manager of ADM’s
IBM Technical Services Group, evaluated three competing tools designed to provide
visibility to both S/390 and local area network TCP/IP stacks. He selected
ASG-TMON™ for TCP/IP.
“We liked TMON’s user interface, which displays the number and severity of exceptions on a single screen for all S/390- and LAN-based TCP/IP stacks while delivering built-in drill-down capabilities to quickly access details for any stack," Langen said. "I also like the trace tool, which simplifies application identification, network performance, and integrity issues by providing context sensitive individual pocket or socket trace information. Finally, the fact that the interface is the same as the other ASG-TMON products for CICS, DB2®, etc. makes it easy to move from one program to another.”
TMON for TCP/IP gives a true picture of the S/390 impact on network performance by providing visibility into S/390 subsystems, applications, and resources that impact the TCP/IP stack and overall network performance including FTP, Telnet, CSM, and ICMP.
Entity monitors provide in-depth monitoring of supported SNMP-enabled entities including S/390 TCP/IP stack, Cisco routers, and even non-S/390 TCP/IP stacks. Application exception monitoring assesses the business impact of network performance by defining and mapping exception conditions with business application service levels.
“I was impressed with how easy the product is to install,” Goings said. “I had it up and running in less than two days. All that is required is uploading the install file from a tape and filling in a few blanks in a REX executive that comes upon the screen. The script rolls through the installation library and modifies all install jobs with the parameters that were entered. If an error is made entering a parameter, then all you have to do is rerun the installation program.”
Goings configured the program to send out a ping to all servers that the mainframe normally talks to every 5 minutes. The program triggers an alarm if a response does not arrive from any of the servers within 500 milliseconds.
“This gives a me a general view of the overall health of the network and alerts me to problems long before I would be likely to hear about them from a user,” Goings said. With the tools that he previously used, he would have no way of knowing that a connection to a server was down unless he manually pinged that specific machine.
Measurable Results
Soon after he installed the program, Goings was scanning the open TCP/IP connections
and discovered that a much larger number of connections were open than he expected.
Further investigation showed that a minor programming logic error that had been propagated
across a large number of programs was leaving TCP/IP connections open after transactions
were completed. This caused a drain on systems resources and, if it had continued, could
have used up all of the available ports, resulting in outages. Goings closed out the
unneeded connections and provided the programming staff with diagnostic information
that made it easy to fix the problem.
Goings also used the tool to investigate a situation where an entire CICS region was locking up, hanging the sessions of anyone using that region. By closely monitoring that region with TMON for TCP/IP, he discovered a problem with a CICS sockets program that was abending and hanging all connections within the region. He used TMON for TCP/IP to drill down in that region and within a minute or two had identified the offending connection from the 3,000 that were open at that time. He then quickly dropped the connection using the online section of the interface. To solve this type of problem in the past, it would have been necessary to use batch NetStat to find the problem, which could take hours.
Whenever a user is unable to log into the system, they call the technical services group. Often, the problem is that their telnet session never properly terminated so the system will not accept another session from that particular machine. When that happens, Goings pulls up the monitoring tools and moves to the window that shows all current telnet sessions. If there are relatively few sessions running, he can simply scroll down and find the one from the user’s machine and end it. More often, there are many telnet sessions so he applies a filter that limits the sessions shown on the screen to the specific IP address tied to the user’s terminal. He can then kill the connection within the monitoring software. Using NetStat, it would take at least ten minutes to solve this common problem, while with TMON for TCP/IP, Goings can now solve it in a minute or two.
“The bottom line is that we can solve most any enterprise networking problem considerably faster than we could in the past,” Goings concluded.