Bioinformatics

25 July 2022
4.7 (114 reviews)
69 test answers

Unlock all answers in this set

Unlock answers (65)
question
Bioinformatics was coined by who and when and for the study of what?
answer
Coined by Paulien Hogeweg in 1979 for the study of informatics processes in biotic systems
question
What is the field of science in which biology, computer science and information technology merge into a single discipline called?
answer
Bioinformatics
question
What is used in bioinformatics to store, retrieve, and assist in understanding biological information?
answer
Computer databases
question
What are some ways in which bioinformatics can be applied? (9)
answer
1. medicine - understand life processes in health / disease states 2. genetic disease databases 3. genetic maps - to identify genes for heritable traits and unravel patterns of genome organization 4. comparisons of genome sequences - between different organisms can help to clarify evolutionary relationships 5. forensics 6. designing primers for DNA amplification (PCR) 7. provides a complete list of candidate genes for drug discovery 8. pharma companies - search for better/new drugs 9. agriculture - develop disease and drought resistant plants and develop higher yield crops
question
What is an application of bioinformatics helping to revolutionize medicine by collecting clinical information on patients with particular genetic diseases which are used for investigations?
answer
Genetic disease (SNP) databases
question
What are bioinformatics used for in forensics?
answer
To store information (DNA profiles) of convicted offenders making their identification easier in subsequent crimes (CODIS) - combined DNA index system
question
In DNA bioinformatics is used for ... (5)
answer
1. simple sequence analysis 2. gene finding 3. regulatory regions 4. whole genome annotations 5. comparative genomics
question
In RNA bioinformatics is used for ... (6)
answer
1. spice variants 2. tissue specific expression 3. structure 4. single gene analysis (cloning techniques) 5. experimental data w/ thousands of genes simultaneously 6. DNA chips, micro-arrays and expression array analysis
question
In protein bioinformatics is used for ... (5)
answer
1. Homology (protein families) 2. conserved domains/regions 3. structure determination (molecular modeling) - 2D, 3D, quaternary 4. protein function 5. analysis often involved 2D gels and mass spectometry
question
What are the 3 major nucleotide sequence databases in bioinformatics?
answer
1. GenBank - National center for biotech info (NIH genetic sequence database part of the international nucleotide sequence database collab) 2. EMBL - european molec bio lab - euro equivalent to US gen bank 3. DNA data bank of Japan - Japan's national institute of genetics, 3rd in trio of major nucleotide sequence databases
question
What are the 4 major protein sequence databases?
answer
1. UniProt - United protein database 2. PIR - protein information resource database 3. Swiss-prot 4. ExPASY - expert protein analysis system
question
What is a sequence either amino acid or nucleotide chosen by the user to use in a BLAST search and can be typed or pasted into window?
answer
Query sequence
question
A BLAST search requires a minimum query sequence length of ....?
answer
15 nucleotides or amino acids
question
A query sequence can either be _____, _____ or ____
answer
FASTA bare sequence identifier (accession number or gene info ID)
question
What is the presentation of 2 compared sequences that show he regions of greatest statistical similarity?
answer
Alignment
question
What does BLAST stand for, for what is it widely used and what is its main function?
answer
Basic Local Alignment Search Tool widely used software in bioinformatics research main function - to compare a sequence of interest, the query sequence, to sequences in a large database
question
What is a measure of the QUALITY of the alignment between the query sequence and the search results?
answer
Score value **the higher the score the better the alignment**
question
What is the number of different alignments with scores equivalent to or better than alignment score that are expected to occur in a database search by chance?
answer
E-value (expectation value) **the lower the e-value the better the match**
question
What is the method of obtaining biological information from unprocessed sequence data?
answer
genome annotation
question
What is the ultimate goal of a genome annotation and what are the two types?
answer
GOAL - to create a labeled genome, where biological information is linked to sequence Structural and Functional annotations
question
What type of annotation do the identification of genomic elements (genes and other important sequence)
answer
Structural annotation
question
What are found using structural annotations?
answer
ORFs and their localization gene structure coding regions location of regulatory motifs
question
What are found using functional annotations?
answer
biochemical function biological function involved regulation and interactions expression
question
What is the basic level of annotation?
answer
Using BLAST for finding similarities and then annotating genomes based on that
question
What is the type of annotation that consists of attaching biological function to genomic elements?
answer
functional annotation
question
The circular display of a genome can show ...?
answer
A list of all the genes identified and the proteins and enzymes that are encoded by them - activities of this set of proteins (REGULATION) accounts for the cell structure and behavior on its environment
question
How can the circular display be used to find the DNA data?
answer
It can be explored by clicking and sliding - info about every gene and possible genes that are available - arrows represent genes that can be selected
question
What are some functions/characteristics of gene prediction software? (6)
answer
1. To identify genes within a long DNA sequence 2. DNA sequence that codes for amino acids should not contain any stop codons 3. each DNA strand can be read in 3 reading frames - there are 2 DNA strands 4. computer must analyze a given DNA sequence in 6 different reading frames (3x2) = 6
question
What is the coding region of the DNA sequence called?
answer
open reading frame
question
An example of a tool that can be used as gene prediction software is...?
answer
Open reading frame finding (ORF finder) at NCBI and GENSCAN
question
The diagnostic features of a gene includes the presence of ...? (10)
answer
Open Reading Frame (ORF) Start codon (met - ATG) Stop codon (TGA, TAG, TAA) Terminator sequence (Prok) Shine Delgano (Prok) TATA box (Euk) Kozak sequence (Euk) Poly A addition signal (Euk) Intron/exon boundaries CpG islands
question
What are open reading frames (ORF) effective for and not effective for?
answer
Effective for the analysis of bacteria DNA genomic sequences NOT effective for analyzing the DNA for eukaryotes
question
Why are ORFs not effective for analyzing DNA for euks?
answer
the Intron/exon structure - a more sophisticated gene identification software is required for euks
question
What are ORFs also referred to as and why is this?
answer
Ab Initio they attempt to predict genes based only on the knowledge and understanding gene structure
question
What is the criteria for judging a good ORF?
answer
1. Should begin with a start codon (met) 2. must be of reasonable size (longer the better) 3. Must end with an in-frame stop codon
question
Why are longer ORFs better and why should it end in a stop codon?
answer
Long ORFs are unlikely to occur by chance and thus signify potential genes - short AAs sequences are prob not ORFs many stop codons close together suggest and ORF is not present
question
What is employed to determine if a newly sequenced gene is similar to that already known and stored in a database?
answer
Computer analysis - Sequence alignment software
question
What are the 2 most popular sequence alignment programs?
answer
BLAST - basic local alignment search tool FASTA - FAST ALL
question
What program searches a nucleic acid or amino acid database to find matching or similar sequences to that being tested?
answer
BLAST
question
The BLAST approach looks for what, 1st, 2nd and 3rd?
answer
1 - look for similar segments *high-scoring segment pair (HSPs) between query sequence and database sequence 2 - evaluate statistical significance of any matches that were found 3 - report only those matches that satisfy a user-selectable threshold of significance
question
What is the emphasis of sequence alignment software?
answer
To find regions of sequence similarity
question
What can the regions of sequence similarity yield clues about?
answer
The structure and function of this novel sequence and its evolutionary history and homology with other sequences in the database
question
Regions of similarity found in sequence alignment software can be .... or ....
answer
local - where the region of similarity is based in 1 location or global - where regions of similarity can be detected across otherwise unrelated genetic code
question
What is used to search a NUCLEOTIDE database using nucleotide query?
answer
BLAST N
question
What is used to search PROTEIN database using a protein query?
answer
BLAST P
question
What is used to search a protein database using a translated nucleotide query?
answer
BLAST X - compares the 6 frame translations of DNA query to protein databases
question
What is used to search translated nucleotide databases using a protein query?
answer
tblastn
question
What is used to search translated nucleotide database using a translated nucleotide query?
answer
tblastx - compares the 6 frame translations of DNA query to 6-frame translations of a DNA database (each sequence is comparable to BLASTP searches!)
question
What compares a DNA query to DNA database, or a protein query to protein database?
answer
FASTA
question
What compares a translated DNA query to a protein database?
answer
FASTX
question
What compares a protein query to a translated DNA database?
answer
TFASTA
question
What are 3 forms of BLAST query input?
answer
FASTA Bare sequence Identifiers
question
Which input format begins with a single-line description followed by lines of sequence data where the description line is distinguished from the sequence data by a > symbol? and what are its rules?
answer
FASTA - recommended that all lines of text be shorter than 80 characters in length - blank lines not allowed in the middle - sequences are expected to be represented in the standard IUB/IUPAC amino acid & nucleic acid codes - single hyphen or dash can be used to rep gap of indeterminate length
question
What is the format that may be just lines of sequence data without the FASTA definition line and can also be a sequence interspersed w/ numbers and/or spaces?
answer
Bare sequence - ex - Gen bank / gen pept flatfile report - BLANKE LINES NOT ALLOWED
question
What input format is a simple accession or accession version or gis that could also have a bar-separated NCBI sequence identifier?
answer
Identifiers
question
The NCBI sequence identifiers have a specific ... and may consist of only 1 ...?
answer
specific syntax may consist of only one token (WORD) spaces between letters will cause it to be a bare sequence - spaces before/after identifier are not allowed
question
What is the analysis of DNA sequences of homologous genes that provides clues to the evolutionary relationships between organisms?
answer
Sequence Homology
question
Two closely related species will have DNA that is ...
answer
more similar to each other than if they are more distantly related
question
Sequence analyses can be used to construct
answer
family trees of organisms
question
Why is it more difficult to establish relationships among bacteria based on their DNA sequences?
answer
Bacteria from different species can exchange DNA sequences (horizontal transfer)
question
Multiple alignment software such as .... and .... are used by scientists to study the phylogenetic relationships between species
answer
CLUSTAL W and COBALT (constrained-baed multiple alignment tool)
question
What shows the inferred evolutionary relationships among various biological species or other entities?
answer
Phylogenetic tree (or evolutionary tree)
question
What is the phylogenetic tree based on and what does it imply?
answer
Based upon similarities and differences in the species' physical and/or genetic characteristics implies that the taxa joined together in the tree (nodes) have descended from a common ancestor
question
Within the tree the ancestor is the ... and the organisms that have arisen from it are ...
answer
ancestor - tree trunk organisms that have arisen from the ancestor - at the tips of tree branches *closely related groups are located on branches close to one another
question
What national resource for molec bio creates public databases, conducts research in computational biology, develops software tools for analyzing genome data and disseminates biomedical info?
answer
NCBI *all for the better understanding of molec processes affecting human health and disease
question
What is the powerful database search engine that integrates all the proven methods of database searching?
answer
MASCOT search engine from matrix science
question
The MASCOT search engine from matrix science uses mass spectrometry to identify proteins from peptide sequence databases and uses what methods?
answer
Peptide fingerprint Sequence Query MS/MS ion search Nucleic or amino acid database in FASTA format
question
What are some challenges in bioinformatics?
answer
Biological redundancy and multiplicity - different sequences w/ similar structures -organisms with similar genes -multiple functions of single genes -grouping of genes in pathways sequence redundancy in genomes significance of relationships & similarities signal vs. noise (you want signal not noise) lack of data
question
What are the hands on sessions that familiarize students with details and the use of the most commonly used online tools & resources?
answer
bioinformatics laboratory *practices query, retrieval and analysis of sequences*