StatSoftIndia

STATISTICA Base

STATISTICA Base offers a comprehensive set of essential statistics in a user-friendly package with flexible output management and Web enablement features; it also includes all STATISTICA graphics tools and a comprehensive Visual Basic development environment.

Descriptive Statistics, Breakdowns, and Exploratory Data Analysis
Correlations
Interactive Probability Calculator
T-Tests (and other tests of group differences)
Frequency Tables, Crosstabulation Tables, Stub-and-Banner Tables, Multiple Response Analysis
Multiple Regression Methods
Nonparametric Statistics
Distribution Fitting
Enhanced graphics technology
Powerful query tools
Flexible data management
ANOVA [supports 4 between factors and 1 within (repeated measure) factor]

STATISTICA Advanced/AXA

Includes the functionality of all of the following:

STATISTICA Base

STATISTICA Multivariate Exploratory Techniques offers a broad selection of exploratory techniques, from cluster analysis to advanced classification trees methods, with a comprehensive array of interactive visualization tools for exploring relationships and patterns; built-in complete Visual Basic scripting.

Cluster Analysis Techniques
Factor Analysis and Principle Components
Canonical Correlation Analysis
Reliability/Item Analysis
Classification Trees
Correspondence Analysis
Multidimensional Scaling
Discriminant Analysis
General Discriminant Analysis Models
STATISTICA Visual Basic Language, and more.

STATISTICA Advanced Linear/Nonlinear Models contains a wide array of the most advanced linear and nonlinear modeling tools on the market, supports continuous and categorical predictors, interactions, hierarchical models; automatic model selection facilities; also, includes variance components, time series, and many other methods; all analyses include extensive, interactive graphical support and built-in complete Visual Basic scripting.

Distribution and Simulation
Variance Components and Mixed Model ANOVA/ANCOVA
Survival/Failure Time Analysis
General Nonlinear Estimation (and Logit/Probit)
Log-Linear Analysis
Time Series Analysis, Forecasting
Structural Equation Modeling/Path Analysis (SEPATH)
General Linear Models (GLM)
General Regression Models (GRM)
Generalized Linear/Nonlinear Models (GLZ)
Partial Least Squares (PLS)
STATISTICA Visual Basic Language, and more.

STATISTICA Power Analysis and Interval Estimation is an extremely precise and user-friendly research tool for analyzing all aspects of statistical power and sample size calculation.

Power Calculations
Sample Size Calculations
Interval Estimation
Probability Distribution Calculators, and more.

STATISTICA Quality Control (QC)

Includes the functionality of all of the following:

STATISTICA Base

STATISTICA Quality Control Charts offers versatile presentation-quality charts with a selection of automation options, customizable features, and user-interface shortcuts to simplify routine work.

Quality Control Charts
Interactive Quality Control Charts including:
Real-time updating of charts, automatic alarm notification, shop floor mode, assigning causes and actions, analytic brushing, and dynamic project management
Multivariate Quality Control Charts including: Hotelling T-Square Charts, Multiple Stream (Group), Multivariate Exponentially Moving Average (MEWMA) charts, Multivariate Cumulative Sum (MCUSUM) Charts, Generalized Variance Charts
STATISTICA Visual Basic Language, and more.

STATISTICA Process Analysis is a comprehensive package for process capability, Gage R&R, and other quality control/improvement applications.

Process Capability Analysis
Weibull Analysis
Gage Repeatability & Reproducibility
Sampling Plans
Variance Components, and more.

STATISTICA Design of Experiments features the largest selection of DOE, visualization and other analytic techniques including powerful desirability profilers and extensive residual statistics.

Fractional Factorial Designs
Mixture Designs
Latin Squares
Search for Optimal 2**k-p Designs
Residual Analysis and Transformations
Optimization of Single or Multiple Response Variables
Central Composite Designs
Taguchi Designs
Desirability Profiler
Minimum Aberration and Maximum Unconfounding 2**k-p Fractional Factorial Designs with Blocks
Constrained Surfaces
D- and A-optimal Designs, and more.

STATISTICA Power Analysis and Interval Estimation is an extremely precise and user-friendly research tool for analyzing all aspects of statistical power and sample size calculation.

Power Calculations
Sample Size Calculations
Interval Estimation
Probability Distribution Calculators, and more.

STATISTICA Automated Neural Networks

STATISTICA Automated Neural Networks contains a comprehensive array of statistics, charting options, network architectures, and training algorithms; C and PMML (Predictive Model Markup Language) code generators. The C code generator is an add-on.

Fully integrated with the STATISTICA system.

A selection of the most popular network architectures including Multilayer Perceptrons, Radial Basis Function networks,Â and Self Organizing Feature Maps.
State-of-the-art training algorithms including:
Conjugate Gradient Descent, Levenberg-Marquardt, BFGS, Kohonen trainingÂ
Forming ensembles of networks for better prediction performance
Automatic Network Search, a tool for automating neural network architecture and complexity selection
Best Network Retention, and more.
Supporting various statistical analysis and model predictive model building including regression, classification, time series regression, time series classification and cluster analysis for dimensionality reduction and visualization.
Fully supports deployment of multiple models

STATISTICA Data Miner

Includes the functionality of all of the following:

STATISTICA Advanced

STATISTICA Automated Neural Networks

STATISTICA Data Miner contains the most comprehensive selection of data mining solutions on the market, with an icon-based, extremely easy-to-use user interface. It features a selection of completely integrated, and automated, ready to deploy "as is" (but also easily customizable) specific data mining solutions for a wide variety of business applications. The product is offered optionally with deployment and on-site training services. The data mining solutions are driven by powerful procedures from five modules, which can also be used interactively and/or used to build, test, and deploy new solutions.

General Slicer/Dicer Explorer
General Classifier
General Modeler/Multivariate Explorer
General Forecaster
General Neural Networks Explorer, and more.

Specialized Data Mining Modules

A large portion of analytic functionality used by STATISTICA Data Miner is driven by the computational engines of modules that are included in various other STATISTICA products:

Neural Networks techniques (the largest selection of architectures available, automatic problem solver tools, advanced feature selection techniques).
All STATISTICA Graphics Tools and interactive exploration/visualization tools; Descriptive statistics, breakdowns, and exploratory data analysis; Frequency Tables, Crosstabulations, Tables and Stub-and-Banner Tables, Multiple Response Analysis; Nonparametric Statistics; Distribution Fitting; Power Analysis Techniques.Â Â
General Linear Models (GLM); General Regression Models (GRM); Generalized Linear Models (GLZ); General Partial Least Squares Models (PLS); Variance Components and Mixed Model ANOVA/ANCOVA; Survival/Failure Time Analysis; General Nonlinear Estimation with Logit and Probit Regression; Log-Linear Analysis of Frequency Tables; Time Series Analysis/Forecasting; Structural Equation Modeling/Path Analysis (SEPATH).
Cluster Analysis Techniques; Factor Analysis; Principal Components & Classification Analysis; Canonical Correlation Analysis; Reliability/Item Analysis; Classification Trees; Correspondence Analysis; Multidimensional Scaling; Discriminant Analysis; General Discriminant Analysis Models (GDA).Â Â
Optional modules for Quality Control Charts techniques, Process Analysis, and Experimental Design (DOE) procedures.

However, several modules include selections of highly specialized data mining and data mining modeling techniques that are offered only as part of STATISTICA Data Miner. The following these modules

Feature Selection and Variable Filtering (for very large data sets)
Association Rules
Interactive Drill-Down Explorer
Generalized EM & k-Means Cluster Analysis
Generalized Additive Models (GAM)
General Classification and Regression Trees (GTrees)
General CHAID (Chi-square Automatic Interaction Detection) Models
Interactive Classification and Regression Trees
Boosted Trees
Random Forest
Support Vector Machines (SVM)
K-Nearest Neighbors
Multivariate Adaptive Regression Splines (MARSplines)
Goodness of Fit Computations
Rapid Deployment of Predictive Models

STATISTICA Text Miner

STATISTICA Text Miner is an optional extension of STATISTICA Data Miner. The program features a large selection of text retrieval, pre-processing, and analytic and interpretive mining procedures for unstructured text data (including Web pages), with numerous options for converting text into numeric information (for mapping, clustering, predictive data mining, etc.), language-specific stemming algorithms. Because STATISTICAâ�?#8482;s flexible data import options, the methods available in STATISTICA Text Miner can also be useful for processing other unstructured input (e.g., image files imported as data matrices, etc.).

Accessing Documents

The program contains numerous options for accessing text documents in different formats, including .txt (text), .pdf (Adobe), .ps (PostScript), .html, .xml (Web-formats), and most Microsoft Office formats (e.g., .doc, .rtf).

Flexible user interface options (and automation functions) are provided for selecting large numbers of files via wild-cards (e.g., to select all documents in a particular subdirectory structure).

The program supports full "Web-crawling" capabilities, so that documents can be extracted from the Web, starting at a particular root Web page (URL). All documents linked to that particular page will be included, as well as the documents linked to those sub-documents, and so on, up to a user-specified level or depth.

File names and URLs can also be stored in text variables, in STATISTICA data files. In this manner, the program can not only process actual text stored in text variables, but also properly interpret references to text documents or URLs. Thus, numeric information and textual information (large documents) can be stored on a per-case (observation) basis and meaningful analyses can be performed on data files where for each observation numeric as well as (voluminous) unstructured textual information is available (e.g., patients' age, height, weight, along with physicians narrative description of symptoms).

Options are provided to flexibly import such lists of filenames or URLs into the columns of a STATISTICA spreadsheet.

Processing Documents

Documents can be preprocessed, prior to (actually concurrent with the) indexing of all documents. Exclusion rules and stub-lists can be applied to remove common but not useful words like "a", "the", "to", "is". Then a stemming algorithm is applied so that English words like "traveled", "traveling" both count as instances of "travel".

Next, the program will index the "stubbed-and-stemmed" documents, to create a frequency count of all words and for all documents. This "raw-data" (count) information is the basis for all subsequent numerical analyses.
Before creating a STATISTICA Data File containing the counts (etc.) to summarize the documents, various additional filters may be applied. For example, the counts for particular (most frequent) words per document can be:

normalized based on the length of each document
transformed (e.g., log-transformed)
optionally "compressed" by, for example, applying various feature extraction algorithms such as SVD (singular value decomposition, specifically optimized to operate on large sparse matrices)

The resulting data file with numeric information (e.g., SVD dimensions, raw counts, relative counts, most-frequent-word counts, and so on) is then ready for further analyses.

Various options are provided for writing the information extracted from text into the input data file, or directly into external databases (see also the description of STATISTICA In-Place Database Processing technology).

Analyzing Documents

All statistical analysis methods can be applied to the numeric summaries representing the texts. Simple summary statistics may extract the most common words used in the documents.

By mapping the documents into the SVD dimensions (e.g., via PCA), dimensional maps of documents can be created, to evaluate the similarity of documents, etc.

By mapping documents into dimensions based on original (transformed) word counts, simultaneous maps of documents and words can be created. This reflects the "meaning" of documents.

Clustering techniques (such as EM or k-Means) can be applied to identify clusters of similar documents.
Predictive data mining techniques can be used to relate the numerical summaries of documents to other indicators of interest, e.g., fraudulent intent, medical diagnosis, and so on.

Key analytic components requiring extensive data processing are implemented via multi-threaded computing technology, to extract optimum performance from advanced multiple-processor server hardware.

WebSTATISTICA - Web-based Analytic Software

WebSTATISTICA is offered as a complete solution that includes the analytic functionality of the respective selected STATISTICA product or any combination of STATISTICA products.

One of the clearest advantages offered by the WebSTATISTICA technology is that it makes the power of any of the STATISTICA family of products conveniently available anywhere by any workstation equipped with an industry-standard Web browser. Thus, WebSTATISTICA add a new dimension and an endless array of new possibilities and applications to the entire line of STATISTICA Data Analysis, Data Mining,Â Quality Control, and Six Sigma software.

WebSTATISTICA supports multiprocessor environments and works with load balanced environments, making WebSTATISTICA suitable for internal cloud computing environments.

Two Common Categories of Web-based Analytics

1) Custom Web-based applications

WebSTATISTICA support one or more customized Web-based analytic applications to suit an organization's specific needs. Users log in and see a highly-targeted user interface customized for the particular application needs. Users have single-click access to the desired set of queries, analysis results, and reports, all displayed within their Web browser.

2) Interactive Statistical Application Deployed Enterprise-wide (across a Wide Area Network)

The full power of STATISTICA analytics1 is available via the server-based, Wide Area Network (WAN) architecture, providing all of the advantages of no client software to install, central configuration and ongoing management, increased scalability and performance, and highly-interactive user experience.

For example, the most recent data and reports (e.g., updated via queries to the specific parts of the corporate data warehouse) - with options to interactively drill down into the results and interactively obtain additional, specific insights about the business - can now be made available to authorized employees wherever they are and regardless of the type of computers to which they have access. Wherever there is the Internet (which means virtually ...everywhere), there is now also access to the query, reporting, and analytic tools of the most comprehensive data analysis system available.

Enterprise-wide CollaborativeÂ Web-based Products

WebSTATISTICA Server acts as a core of an enterprise-wide network system allowing the participants to work collaboratively, quickly share results (reports), as well as scripts of analyses or queries. User or group permissions can be used by the administrators to manage access of specific groups of users to specific data or reports. The accessibility of its tools via the Internet makes WebSTATISTICA Server a perfect system to facilitate collaborative projects of employees working at different locations or branches of a corporation (even on different continents), or employees who are telecommuting or traveling.

WebSTATISTICA Knowledge Portal - is a powerful, Web-based, knowledge-sharing tool that allows your colleagues, employees, and/or customers (with appropriate permissions) to log in and quickly and efficiently get access to the information they need, by reviewing predefined reports.

WebSTATISTICA Interactive Knowledge Portal - offers to the portal visitors all the functionality of the Knowledge Portal and additional options.Â These options include allowing the user to define and request new reports, run queries and custom analyses, drill down and up, slice/dice data, and gain insight from all resources that are made available to them by the portal designers or administrators.

STATISTICA Enterprise Web Viewer provides the ability to view analyses and reports that were generated within STATISTICA Enterprise or STATISTICA Enterprise / QC. This allows companies to protect their data and reports with the STATISTICA Enterprise security model.