Spark

=Data Analysis with Spark= =Programming with RDDs= =KeyValue Pair=
 * ==Spark Core==
 * ==Spark SQL==
 * ==Spark Streaming==
 * ==MLlib==
 * ==GraphX==
 * 1) ==RDD Basics==
 * 2) ==Creating RDDs==
 * 3) Transformations
 * 4) Actions
 * 5) Lazy Evaluation
 * 6) ==Passing Functions to Spark==
 * 7) Python
 * 8) Scala
 * 9) java
 * 10) ==Common Transformations and actions==
 * 11) Basic RDDs
 * 12) Converting between RDD types
 * 13) ==Persistence (caching)==
 * 1) ==Motivation==
 * 2) ==Creating Pair RDDs==
 * 3) ==Transformations on Pair RDDs==
 * 4) Aggregations
 * 5) Grouping Data
 * 6) Joins
 * 7) Sorting Data
 * 8) ==Actions Available on Pair RDDs==
 * 9) ==Data Partitioning (Advanced)==
 * 10) Determining an RDD's Partitioning
 * 11) Operations that benefit from Partitioning
 * 12) Operations that affect Partitioning
 * 13) ==Custom Petitioners==

=Loading and Saving Data=
 * 1) ==Motivation==
 * 2) ==File Formats==
 * 3) Text Files
 * 4) JSON
 * 5) Comma-Seperated Values and Tab-Separated values
 * 6) sequenceFiles
 * 7) Object Files
 * 8) Hadoop input and output formats
 * 9) File Compressions
 * 10) ==File Systems==
 * 11) Local/Regular
 * 12) Amazon S3
 * 13) HDFS
 * 14) ==Structured Data with Spark SQL==
 * 15) Apache Hive
 * 16) JSON
 * 17) ==DataBases==
 * 18) Java Database connectivity
 * 19) Cassandra
 * 20) HBase
 * 21) Elastic Search

=Advanced Spark Programming=
 * 1) ==Introductions==
 * 2) ==Accumulators==
 * 3) accumulators and Fault Tolerance
 * 4) custom Accumulators
 * 5) ==Broadcast Variables==
 * 6) optimizing broadcasts
 * 7) ==Working on a per-partition Basis==
 * 8) ==piping to external programs==
 * 9) ==Numeric RDD operations==

=Running on a Cluster= =Debugging Spark= =Spark SQL= =Spark Streaming= =Machine Learning with MLlib=
 * 1) ==Configuring Spark with SparkConf==
 * 2) ==components of Execution: jobs, tasks, and stages==
 * 3) ==Finding Information==
 * 4) Spark Web UK
 * 5) Driver and executor logs
 * 6) ==Key Performance Considerations==
 * 7) Level of Parallelism
 * 8) Serialization Format
 * 9) Memory Management
 * 10) Hardware provisioning
 * 1) ==Linking with Spark SQL==
 * 2) ==Using Spark SQL in applications==
 * 3) Initializing Spark SQL
 * 4) Basic Query Example
 * 5) Schema RDDs
 * 6) Caching
 * 7) ==Loading and Saving Data==
 * 8) Apache Hive
 * 9) Parquet
 * 10) Json
 * 11) From RDD's
 * 12) ==JDBC/ODBC Server==
 * 13) Workign with Beeline
 * 14) Long-Lived Tables and queries
 * 15) ==User-Defined Functions==
 * 16) Spark SQL UDF's
 * 17) Hive UDFs
 * 18) ==Spark SQL Performance==
 * 19) Performance Tuning Options
 * 1) ==A Simple Example==
 * 2) ==Architecture and Abstraction==
 * 3) ==Transformations==
 * 4) Stateless Transformations
 * 5) Stateful Transformations
 * 6) ==Output Operations==
 * 7) ==Inputs Sources==
 * 8) Core Sources
 * 9) Additional Sources
 * 10) Multiple Sources and Cluster sizing
 * 11) 24/7 Operations
 * 12) Check pointing
 * 13) Driver Fault Tolerance
 * 14) Worker Fault Tolerance
 * 15) Receiver Fault Tolerance
 * 16) Processing Guarantees
 * 17) ==Streaming UI==
 * 18) ==Performance Considerations==
 * 19) Batch and Window sizes
 * 20) level of parallelism
 * 21) Garbage collection and Memory Usage
 * 1) ==Overview==
 * 2) ==System Requirements==
 * 3) ==Machine Learning Basics==
 * 4) Spam Classification
 * 5) ==Data Types==
 * 6) Working with vectors
 * 7) ==Algorithms==
 * 8) Feature Extraction
 * 9) Statistics
 * 10) Classification and Regression
 * 11) Clustering
 * 12) collaborative Filtering and Recommendation
 * 13) Dimensionality Reduction
 * 14) Model Evaluation
 * 15) ==Tips and Performance Considerations==
 * 16) Preparing Features
 * 17) Configuring Algorithms
 * 18) Caching RDDs to Resue
 * 19) Recognizing Sparsity
 * 20) Level of Parallelism
 * 21) ==Pipeline API==

= =

= = = =

= =